Elements of testing and reporting of the ITIL/ISO 20000 IT Service Continuity Plan

As with many other situations in life, so it is in IT Service Management (ITSM) – no one asks too many “What if?” questions until something actually happens. But, once it does – the company needs to be ready. You need an action plan in case of disruptive events.

Both ITIL and ISO 20000 consider IT Service Continuity to be a vital part of a company’s ITSM activities. In my previous article: IT Service Continuity Plan – Why do you need it? we explained the IT Service Continuity Plan as the cornerstone of continuity management inside the organization. But, that’s not the end of the story. In order to be sure that the IT Service Continuity concept works – you need to test it and (particularly, according to ISO 20000) create appropriate test reports.

Why do you need it?

First things first: IT Service Continuity supports business continuity (read the article: IT Service Continuity Management – waiting for the big one to learn more about IT Service Continuity and its relation to business continuity). In order to support business continuity, IT needs to have a (formal) plan for how to react in case IT Service Continuity is affected. Having a plan is all well and good, but if you don’t know whether this plan really works – that’s another story.

To explain this (and the reasoning behind IT service Continuity tests), let me give you an example that I once witnessed. A large company focused its IT Service Continuity mainly on backup and restore. Although it was not a small company, at that time – that was an ITSM continuity concept. On several occasions, backup/restore proved to be OK, so no one raised any concern. Once, when storage was affected (replacement of the complete hard disk set), everyone thought: “OK, we have back-up tapes, hard drives are almost installed, and we will be up and running in no time.” Theoretically that’s OK, but it took several days until the last file was restored.

The point is that restore (as an IT service Continuity concept) was never fully tested. And, this proved to be a very important element of the continuity that was omitted. So, if you want to know whether your continuity plan works, the plan needs to be tested.


What are the requirements?

ISO 20000 (in clause 6.3.3) formally requires that continuity requirements be determined and documented, and that IT continuity plans be tested against those requirements. Results should be recorded and analyzed, and any deficiency that is found needs to initiate corrective action. Also, the continuity plan needs to be tested after any major change, which is logical because a major change can (easily) impact the continuity concept.

ITIL is pretty descriptive with regard to continuity planning and testing, and it suggests four types of basic tests:

  • Walk-through tests – a simulation of the complete plan
  • Full test – that’s the test that will tell you how good your plan is; it includes a test of the complete recovery scenario
  • Partial test – a test focused only on, e.g., one service
  • Scenario tests – this is the test of the continuity plan on one specific scenario

What’s the content?

The content of the IT Service Continuity Plan testing (and report, afterwards) depends on what kind of tests are performed. But, there are many elements that need to be recorded in the test report in order to provide the full picture of the continuity plan’s suitability. So, the usual content of such a report is:

  • Generalities of the test – for example, which service, what kinds of tests, who is responsible, when and where the tests took place, etc.
  • Test activities, i.e., test scenarios – description of the testing activities, involved parties (including third parties), organizational units affected, etc.
  • Results of particular test activities – this is where you will document issues that are detected during testing activities
  • Corrective actions – list them all, assigning responsibilities and deadlines for implementations; this way, you keep control until issues are resolved
  • Improvements – testing can reveal elements of the plan that could be improved; document them, and follow up on their implementation

The last two items (corrective actions and improvements) are particularly important because ISO 20000 requires that they be acted upon and that reports about implementation exist.

Once you have the report, it’s important that there is a person who is responsible for ensuring subsequent follow-up, particularly on corrective actions and improvement initiatives. Then, you can say that the process is complete.

And, now what?

So, you have a continuity plan and you regularly (e.g., once a year) perform testing of the plan and create appropriate reports – excellent. This is your starting point towards improvement of the plan. Analysis of test results should reveal both strengths and weaknesses of the plan. Strengths should be used as your best practice, and weak points should have corrective actions as a consequence.

Once you have an IT Service Continuity Plan in place, and you do regular testing and improvements of the plan – you can expect fewer surprises once a disruptive event takes place. That doesn’t mean you are “bulletproof,” but rather that you can be sure you did all that was needed to be prepared. And, you know that it works.

Use this free IT Service Continuity Plan Test and Review Report template to see what such a report looks like.

Advisera Branimir Valentic
Author
Branimir Valentic
Branimir is an expert in IT service management (consultancy, training and tools), IT governance (training and consulting), project management and consultancy in IT and telecommunication. He holds the following certificates: ITIL Expert, ISO 20000, ISMS Lead Auditor and PRINCE2.