Downtime: The Urgency of Response

Every unplanned interruption of service—an incident in IT service management terms—should be prioritized using two factors: Impact and Urgency. While impact includes considerations like the importance of the service to the organization and perhaps the number of affected users, urgency is also dependent on time and deadlines.

The clock is ticking on an incident the second IT becomes aware of it, whether through an alert displayed on a monitoring screen, an automatically logged ticket triggered by an event, or—much worse—a user or customer calling the service desk.

Something stops working. Let’s say it’s your company’s customer relationship management application, or CRM. A salesperson tries to log in, but the login doesn’t work. The login page is unresponsive. After several tries spanning eight minutes, the salesperson gives up and picks up the phone to call for support. The service desk is very good at answering quickly; they meet their service level agreement (SLA) every month by answering 90 percent of calls in 20 seconds or less. A support analyst is on the phone almost immediately. It takes a few minutes for the analyst to gather all the information from the caller, categorize and log the ticket. Fifteen minutes have now elapsed since the discovery of the failure.

Next, the analyst tracks down and checks a matrix to determine who is responsible for the CRM system. The analyst calls the CRM administrator, but gets an “out of office” message on that extension. The analyst starts searching for someone else to contact, finally (18 minutes have passed now) reaching the system administrator covering the application while the CRM admin is out.

“Didn’t you see that the application is down?” asks the analyst.
“Well, I’m just covering. I don’t have that alert on my screen,” the sysadmin replies.

How do you determine impact and urgency?

Every organization should have a short list—usually three things—that trigger prioritization as High Impact. Your organization’s list may vary, but usually the list is something like:

  • People are at risk
  • Money is at risk
  • The company’s product or service is at risk

 

In our example, money is certainly at risk and product may very well be at risk because of the inability to check the CRM records. It’s also imperative that the service be restored as soon as possible—high urgency. High impact/high urgency means a Priority 1 incident.

When you are an end user of either an interrupted service or of a piece of hardware that fails, it feels like it should be Priority 1. Your job likely depends on your ability to communicate—via tools like email and your company’s collaboration platform (Slack, Yammer)—and also to access information in document systems and databases. When you can’t do those things, your ability to do your job comes to a halt, and it feels like the end of the world.

The take-away is that every interruption carries a sense of urgency for the affected user(s).

What criteria should be used in determining Urgency? In general, we are talking about deadlines. If a scientific grant application is due tomorrow and the grant-writing application goes offline today, the urgency is high (and the impact may be high as well, if that’s where your organization gets its money). If the CEO’s laptop dies and she/he is due to travel to conclude a major business deal, again, impact and urgency are both in play, even though this is not a Priority 1 incident affecting the whole business directly and immediately.

In a lean, modern business where there is little duplication of work roles, the urgency of any interruption has an effect, and that is where the emotional component—empathy—of technical support comes into play.

Of course, the job of incident management is to restore service as rapidly as possible, and in order to correctly prioritize, your organization needs to consider some objective criteria to get the work done in the right order and return your business to optimum function.

In my next post, I’ll look at one way to measure the impact of interruptions on end users.


 

About the Author

Roy Atkinson is one of the top influencers in the service and support industry. His blogs, presentations, research reports, white papers, keynotes, and webinars have gained him an international reputation. In his role as senior writer/analyst, he acts as HDI‘s in-house subject matter expert, bringing his years of experience to the community. You can find Roy on Twitter: @HDI_Analyst / @RoyAtkinson

 

By | 2017-09-07T09:33:35-05:00 Sep 7th, 2017|

You resolve incidents.
Leave the communication workflows to us.

In the event of an IT issue, IT Alerting quickly connects the right on-call personnel with the right information using phone, email, SMS and mobile app alerts. Rules based automation, dynamic on-call scheduling and automatic escalation ensures that someone will respond and take ownership of the incident, regardless of time, day, location or device.

Learn why 3,000+ organizations trust Everbridge. Request a demo.