The other day I watched “San Andreas”, a movie including a lot of heroism, a portion of patriotism and plenty of natural disasters. For corporations disasters, in the different size and shapes they might have, is subject to Continuity Management, discovery recovery and for the prepared and proactive company this is an own discipline in their organization, securing the continuous business for the whole organizations based on risk and a lot more. But what about those really bad incidents that might happen, that are not disaster but causes a lot of problems, a corporate-wide service stopping or something similar, how do we treat those? There are no disaster recovery plans for these unfortunate events, because there is no disaster, just a really bad incident, or normally a set of bad incidents.
Ladies and gentlemen, I present to you, the glorious Major Incident Management (MIM) routine, the magic wand that will save you, be your life vest, when in the middle of an incident storm.
Let us first be sure about the difference between a Major incident and a disaster. The later normally means some sort of event, outside our control, sometimes called “Force majeure”, such as large power failure, earthquake, flood or other event that effectively will impact our business in a number of ways. A major incident is not a disaster but there are several similarities in regards to how we treat this different type of events.
For those of you that are not familiar with IT Service Management, what is referred to as an incident is something that disturbs or disrupts the normal operations of one/more IS/IT service(s), informally called a support ticket, issue or, well there are many names, that’s why we in ITSM decided to use the term Incident. Anyway, there is a process for managing incidents, including a set of activities to collect, manage and resolve the incident, normally reported from a business user. An incident can be more or less critical, depending on impact (how serious) and urgency (how much are we bleeding), which sets the temperature for this specific incident.
A major incident quickly passes through all defined levels of impact and/or urgency and basically triggers a specific routine created just to manage major incidents, to really focus on what has to be done.
Consider for a moment what happens when there are some big outages, with big impact, often perhaps the whole company is affected and/or very important business functions is with a limp or even totally stopped. People get mad, there are heated arguments and everyone are involved. In short, it becomes a circus, which is exactly the opposite of what we need in this very moment. The Major Incident routine is triggered and one of the already defined and trained MIM leads, that immediately pulls the right resources and forms a task force, it really doesn’t matter what these people were doing, the MIM procedure works like a sledgehammer, what the MIM lead ask for the MIM lead gets, it’s that simple.
The MIM routine also includes the communication procedures to use, whom to inform, when and so on. Simply to make sure that everyone is kept in the loop. Of course, the Service Desk is kept well informed to manage questions. All in all it creates the circumstances we need to efficiently resolve this incident and also manage the tense situation in general. Keeping people informed is the key to be left alone.
The MIM procedure should normally not be needed very often, if so you need to evaluate the robustness of your IT infrastructure. However, when these incidents happen we will know what to do, the roles and responsibilities are defined and this, luckily, rare event will be managed as effective as possible. As the organization matures there can even be specific MIM procedures for specific services, service areas or likewise, to be even faster to resolve this terrible situation. All MIM routines are always terminated by performing a lesson learned to document findings, and in most cases, a problem record is created to start digging why this even happened and try to find and eliminate the root cause.
So, the MIM routine is not frequently used but will surely save you a lot of problems when these bad incidents happens. Frankly, I would say that an IT service provider that does not have at least a general MIM procedure needs to get in the game, stay proactive people!