Measuring the Impact of IT Interruptions

When we in IT think about the interruptions caused by incidents—unplanned interruptions of service—we tend to think about the time to resolve or time to restore service. While these metrics are good and useful, they do not tell the business side of the story in many cases.

One of the ways to look at the real impact of an interruption is through Interrupted User Minutes, or IUM. It’s a simple formula, but can get a bit complicated in terms of the details.

IUM = Number of users interrupted x Length of interruption (in minutes)

If you have 5,000 email users and your email system goes down for 10 minutes, that’s 50,000 interrupted user minutes, or over 833 interrupted user hours (50,0000 ÷ 60). That sounds like a much heavier impact than “10 minutes” doesn’t it?

Where things get complicated is when we start to think about the fact that an email outage does not prevent 100 percent of the users’ work—in fact, it probably represents only a portion of that work. According to a Canadian university study, workers spend about 30 percent of their time reading and handling email. A more realistic IUM number, then, would be 30 percent of 833 hours, or 250 hours. That is still a lot of impact, and still gives a much better picture of the interruption than “10 minutes” does.

Here are two more scenarios to give you an idea how this measure can change your thinking:

  • The 20-person sales department can’t connect to Salesforce for 45 minutes because of a firewall configuration issue. They use Salesforce 70 percent of the time (that is, the time they aren’t spending reading and sending email). 45×20 = 900. 70 percent of 900 is 630 IUM. To reinforce the impact, remember that each of those salespeople is normally producing a certain number of dollars per minute, but not if they can’t get to their customer records.
  • Your company’s Active Directory goes offline for 37 minutes. Your organization has 65,000 employees, 27,000 of whom are authenticating off the instance that is down. That’s 27,000×37, or 999,000 IUM, or 16,650 hours of interruption because people can’t log in if they aren’t logged in, can’t print to network printers, can’t access network shares, etc.

 

You can start to see that IUM is closely related to the cost of downtime (CoD), but unlike CoD does not contain a financial component that can be extremely difficult to obtain if you are not a C-level executive.

How can you figure those partial interruption percentages? The first way is to make an educated guess. We said that 30 percent of work time is taken up by email, so let’s discount that time unless email is affected. If the data warehouse is unavailable, you can bet that the data analysts are very highly affected—let’s say 40 percent of their work cannot be done. On the other end of the spectrum, those pesky salespeople we talked about earlier might be pulling information from the data warehouse through an application, and are only 5 percent affected.

“Well that’s not a valid way of measuring anything,” you might say. I’d suggest you take the time to read How to Measure Anything, and see how so-called educated guesses can lead to much more accurate insights. If you have absolutely no information about the actual interruptions caused by incidents, any reasonable information is good.

You can work at making that information better over time by doing things like finding out the main services and applications used by various areas in your business, and how much time users are spending consuming those services and using those applications. It’s great information to have to help you assess and prioritize the impact and urgency of incidents (see the first post in this series).

Next in this series, we’ll talk about the challenge of distributed teams. When there are interruptions, you need to get people involved fast so that those IUM’s don’t add up to millions of dollars.


 

About the Author

Roy Atkinson is one of the top influencers in the service and support industry. His blogs, presentations, research reports, white papers, keynotes, and webinars have gained him an international reputation. In his role as senior writer/analyst, he acts as HDI‘s in-house subject matter expert, bringing his years of experience to the community. You can find Roy on Twitter: @HDI_Analyst / @RoyAtkinson

 

By | 2017-09-21T09:00:53-05:00 Sep 21st, 2017|

You resolve incidents.
Leave the communication workflows to us.

In the event of an IT issue, IT Alerting quickly connects the right on-call personnel with the right information using phone, email, SMS and mobile app alerts. Rules based automation, dynamic on-call scheduling and automatic escalation ensures that someone will respond and take ownership of the incident, regardless of time, day, location or device.

Learn why 3,000+ organizations trust Everbridge. Request a demo.