ITIL and the SMB Part 4 – Problem Management

May 20, 2008

The next piece for ITIL in the small business / medium business space covers Problem Management. Again, this will be a quick summary of some key points. As I stated in the earlier posts, all of the ITIL processes are tightly linked, as each process generally accepts information from, and provides information to, the others. However pieces can be lifted and customized for your environment. Again, the goal here is to demonstrate how the basics of ITIL can assist organizations of all sizes reduce costs.

As mentioned in the previous post on Incident Management. The ITIL Incident Management process and Problem Management process can seem to be the similar at first glance, however there are key differences; Incidents are deviations from expected operations, and Problems are the root cause of those deviations.

An incident can be any IT related issue such as; “Can’t log in”, “No E-Mail”, “Can’t Print” etc. The tendency for internal or external IT staff is to simply fix that service incident – “There now you can print….” without looking at the underlying root problem. These root problems can be an important source of information.

The Examples

To use a non technical example; A customer of yours complains that they did not receive their proper order. This would be an Incident. You then re-send the proper product, the Incident is now closed – the customer has their order.

I don’t know about you, but I see something missing here. Why did the customer not receive their order? This “why” is the Problem. or root cause of the incident. You investigate and you find out that the packing staff boxed the incorrect order. Now we have documented that we understand the root cause, or the Problem that led to the incident. If you receive 4 complaints in a month that customers did not receive their proper order, and all 4 times it was the packing staff boxed the incorrect order, you now have actionable information to rectify the issue.

Documenting these root cause problems also enables the build up of a knowledge base of Known Issues, or Known Errors that also assist in reducing the the time required to rectify these incidents when they occur.

Again using my non-technical analogy, One incident of a customer not receiving their order is investigated. It is identified that the customer has two logistics centers and will only accept delivery from you at one of the centers. Unfortunately, you delivered to the other one. You now have “knowledge base” information, the next order for that customer you can remind staff that the shipment must be delivered the correct logistics center.

The same concept holds in your technology infrastructure. Here is a technology example I had to deal with. We were using a hardware and software based fax system that allowed our employees to send faxes right from their desktops. These faxes dealt with confirmations for training and are time sensitive and critical enough to cause significant disruption if they fail.

Incident 1: Employee couldn’t fax. I found that the fax hardware had crashed. I also could not figure out why. (OK, no one is perfect!) So I reset the fax hardware, which fixed the incident, and had to set the Problem as an unknown issue with the hardware fax devices. Maybe it was a one time issue.

Incident 2: Employee couldn’t fax (again) – I could simply have repeated reseting the fax hardware, it would have resolved the incident, but it would not do nothing to resolve the root cause problem. So it would just keep happening, it would keep causing incidents, and keep causing costly downtime. (again these faxes were time sensitive) Further investigation and testing finally revealed that the Problem was due to my region moving from 7 digit to 10 digit local area phone dialing. Dialing just 7 digits not only failed to send that one fax, it brought down the fax hardware. This knowledge base information allowed me to provide more training (use 10 digits) and if another incident did occur, the resolution was easy and quick. No further research required.

The Benefits

Utilized with Incident management, an effective Problem Management process provides the following benefits;

a) Identify the root cause. Allowing the root cause to be fixed so it does not re-occur, or documented as a known issue.
b) The known issues have their work-around or methods to solve quickly, without extensive research required.
c) More pro-active prevention of further incidents. Possibly identifying process weaknesses or devices that are failing and should be replaced.

These benefits can reduce costs by reducing the amount of time that your staff cannot produce through failure of some IT component, and reduce the time and cost of rectifying incidents as they occur.

UPDATE – Part 5 on Change Management is now here

You can subscribe to this blog by clicking the RSS icon on the Home Page!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s