Availability Management: A Complete Guide Tutorial | CHECK-OUT
Availability Management Tutorial

Availability Management: A Complete Guide Tutorial For FREE | CHECK-OUT

Last updated on 08th Jul 2020, Blog, Tutorials

About author

Brindha (Database Engineer )

Brindha is an industry expert and subject specialist, who are database specialists and offers the best teaching to pupils. She is a qualified professional with more than seven years of experience. She is familiar with Prometheus, Proxmox, MySQL, Rabbit MQ, Oracle to PostgreSQL, AWS, Git, and Docker.

(5.0) | 18966 Ratings 2244

ITIL Availability Management aims to define, analyze, plan, measure and improve all aspects of the availability of IT services. It is responsible for ensuring that all IT infrastructure, processes, tools, roles etc are appropriate for the agreed availability targets. Part of: Service Design.

The Objectives of Availability Management process are to:

  • Produce and maintain an appropriate and up to date availability Plan, that reflects the current and future needs of the business.
  • Provide advice and guidance to all other areas on availability related issues.
  • Assist with diagnosis and resolution of availability related incidents and problems.
  • Asses the impact of all changes on the Availability Plan, and the performance, and capacity of services and resources.
  • Ensure that proactive measures are implemented to improve the availability of services wherever it is cost justifiable. In short, Availability management should always ensure that the agreed level of availability is provided. The measurement and monitoring of IT availability is a key activity to ensure availability levels are being met.
Subscribe For Free Demo

Why Should I Implement Availability Management?

There are plenty of good reasons to implement ITIL availability management. For starters, it’s essential for making sure services are available for use during the timeframes specified in SLAs. Availability management is also helpful for making sure services are provisioned on the right infrastructure for their needs. This ensures you avoid unnecessary costs. You wouldn’t want to provision services with longer recovery times on more expensive high availability platforms.

Another great way to use availability management is to identify and correct issues—before they impact service. Availability management processes go hand-in-hand with the other four areas of ITIL service delivery. In fact, it’s often used as a support for service level management, capacity management, IT service continuity management, and incident management.

How to Do Availability Management?

Performance management software goes a long way to helping you get ITIL availability management implemented at your organization. For instance, you can use capacity prediction software to perform what-if analysis. This type of analysis involves running multiple scenarios to show the impact of certain decisions. What-if analysis is a great way to determine the performance levels needed to meet your business goals. You can even track and report on these metrics on an ongoing basis.

You can also use infrastructure monitoring software to keep an eye on availability across your systems. You’ll get detailed diagnostic capabilities, so you can quickly determine the probable cause of an outage. Plus, you’ll find out how to keep an outage from happening again. You can even capture historical data, so you’ll be able to report on trends over time. This type of monitoring makes it easier to predict potential issues and address them before they become problems.

If you need more reporting, you can enlist performance management software. This takes the effort out of reporting by automatically publishing reports with availability metrics across your systems. You can customize them for different audiences, too.

What is the ITIL Availability Management Process?

Availability Management is one of the well-defined main processes under Service Design process group of the ITIL best practice framework.According to the definition, ITIL Availability Management is used to ensure the availability of services whenever needed. This usually means making sure every service is up for use under the conditions of service level agreements (SLAs).

According to the definition, ITIL Availability Management is used to ensure the availability of services whenever needed. This usually means making sure every service is up for use under the conditions of service level agreements (SLAs).To achieve this, the Availability Management team periodically reviews business process availability requirements. And then, they make sure that the most cost-effective contingency plans are in place. These plans are tested on a regular basis to make sure that it meets the business needs.

Roles And Responsibilities

Availability Manager – Process Owner

  • The Availability Manager is responsible for defining, analyzing, planning, measuring and improving all aspects of the availability of IT services. He is responsible for ensuring that all IT infrastructure, processes, tools, roles etc. are appropriate for the agreed service level targets for availability.

Responsibility Matrix: ITIL Availability Management

ITIL Role | Sub-ProcessAvailability ManagerService Owner[3]Applications Analys[3]Technical Analyst[3]IT Operator[3]
Design Services for AvailabilityA[1]R[2]RRR
Availability TestingARR
Availability Monitoring and ReportingAR– 

ITIL Availability Management Scope: 

Availability Management is concerned with the design, implementation, measurement and management of IT Infrastructure availability in order to ensure stated business requirements for Availability are consistently met. It should be applied to all new IT Services and for existing services where Service Level Requirements (SLRs) or Service Level Agreements (SLAs) have been established, and, should be applied to those IT Services deemed to be survival or business critical, regardless of whether a formal SLA exists. Suppliers (internal and external) should be subject to the same Availability requirements in providing their services.

As stated in ITIL V3, Availability Management process plays a leading role in component failure impact analysis (CFIA) and service outage analysis (SOA) initiatives.Typically, the Availability Management team determines the cause of the problem, analyzes any related trends, and then takes the steps to ensure service availability according to SLAs.

Availability Management (ITIL V3) is tightly bound with other ITIL processes.

The ITIL Availability Management process works jointly with Capacity Management,  Service Level Management, and IT Service Continuity Management to plan for the infrastructure requirement needed to meet the targeted service level and quality.

It also works closely with Incident Management and Event Management processes to help them meet the operation level service targets and quality standards.

ITIL Availability Management Activities: 

According to ITIL V3, Availability Management process includes two types of activities:

  • Reactive
  • Proactive.

Reactive Activities:  Reactive Availability Management includes activities such as monitoring, measuring, analysis and management of all events, incidents, and problems causing service unavailability. These activities are generally performed by operational roles.

Proactive Activities:  Proactive Availability Management includes proactive planning, design, and monitoring of services to improve the availability. These activities are typically performed by design and planning roles.

Proactive activities can be further divided into two categories: Service Availability & Component Availability.

Activities performed under this proactive category are:

  • Participate in IT infrastructure design.
  • Monitor actual IT availability achieved.
  • Create, maintain & review Availability Plan.
  • Schedule Availability Testing.
  • Attend CAB meetings.
  • Assessment & Testing after a major business change.
  • Assess & Manage Risk in an economically viable way. 
itil availability management

ITIL Availability Management Sub-Process: 

According to ITIL v3, Availability Management Process has three sub-process operating under it.

The objectives and descriptions of those sub-processes are given below, followed by a diagram illustrating the ITIL Availability Management Process Flow:

Course Curriculum

Learn On-Demand Availability Management ITIL Certification Training Course from Real Time Experts

Weekday / Weekend BatchesSee Batch Details
  • Design Services for Availability: As the name suggests, this sub-process is responsible for designing the procedures and technical features required to fulfill the agreed availability levels.
  • Availability Testing: This sub-process is responsible for scheduling and arranging for regular testing of all availability, resilience and recovery mechanisms.
  • Availability Monitoring and Reporting:Used to monitor the current availability achievements of services and components, compare that result with the agreed availability benchmarks, identify the improvement areas, and prepare a detailed report. It also circulates the report to other Service Management processes and IT Management for decision-making purposes.
ITIL Availability Management Process Flow

Important Terminologies and Definitions: 

Below lists describes the important terminologies and definitions used in ITIL Availability Management:

  • Availability Design Guidelines: It draws the guidelines from a technical point of view, that how the required availability levels can be achieved, including specific instructions for application development and for externally sourced infrastructure components.
  • Availability Guidelines for the Service Desk: Guidelines for Service Desk on how to manage Incidents causing unavailability.The goal of this guideline is to prevent minor incidents from becoming major Incidents.
  • Availability Management Information System: A virtual repository of all Availability Management data, typically stored in multiple physical locations.
  • Availability Plan: The Availability Plan contains detailed information about initiatives taken for improving the availability of service or component.
  • Availability/ ITSCM/ Security Testing Schedule: A schedule for the periodic testing of all availability, continuity and security mechanisms, jointly regulated by Availability Management, IT Service Continuity Management (ITSCM), and Information Security Management.
  • Availability Report: A Report containing information related to service and infrastructure component availability. This Report is then circulated to other Service Management processes and IT Management for decision making purpose.
  • Event Filtering and Correlation Rules: Rules and Criteria are used to determine if an Event is significant and to decide upon an appropriate response. Event Filtering and Correlation Rules are typically used by Event Monitoring Systems in the Event Management process. But some of those rules are defined during the Availability Management process of Service Design Stage, to ensure that events are triggered when the required service availability is endangered.
  • Maintenance Plan/ SOP: Define the frequency and scope of preventative maintenance.
  • Recovery Plan: It is jointly created by Availability Management and IT Service Continuity Management. This recovery plan contains specific instructions for restoring specific services or components to a working state from a major failure.
  • Technical/ Administration Manual: A document detailing the required procedures to run and maintain application or infrastructure components.
  • Test Report: A Test Report provides a summary of testing and assessment activities performed by any ITSM process.
  • Vital Business Function (VBF):VBF refers to business-critical elements that are supported by an IT service.
  • Service Failure Analysis (SFA): It is a structured approach to identifying causes of service interruption.
  • Availability management: Considers all aspects of the IT Infrastructure and supporting organization which may impact Availability, including training, skills, policy, process effectiveness, procedures and tools anticipates and minimizes the impact of failures through the implementation of predefined, pre-tested, documented recovery plans and procedures:

Ensures that recovery procedures are in place:

  • For batch processing: includes recovery procedures from hardware, software and environmental failures affecting batch applications. In addition, Recovery Management will coordinate problems with batch recovery procedures.
  • Network or on-line outages: will coordinate problems with procedures in the event that network outages occur
  • Application restart: includes procedures to restart servers, operating systems, database products, middl eware, transaction processors in the proper sequence ensure that sufficient capacity is available to accommodate the peak loads experienced during recovery procedures and that system availability satisfies client needs. All changes to CIs are reviewed for proper back-out procedures to allow for timely recovery in the event of an unsuccessful change installation. Additionally, all changes are reviewed for the potential impact to existing recovery procedures and to determine if any new or additional procedures are required as a result of the change. Information pertaining to the backout of recovery procedures are documented and reviewed in the change record as required. Counter-measures to reduce or eliminate the threats posed by security risks to the infrastructure.

The expanded Incident ‘lifecycle’

A guiding principle of Availability Management is to recognize that it is still possible to gain Customer satisfaction even when things go wrong. One approach to help achieve this requires Availability Management to ensure that the duration of any Incident is minimized to enable normal business operations to resume as quickly as is possible. Availability Management should work closely with Incident Management and Problem Management in the analysis of Unavailability Incidents.

A good technique to help with the technical analysis of Incidents affecting the Availability of components and IT Services is to take an Incident ‘lifecycle’ view. Every Incident passes through several major stages. The time elapsed in these stages may vary considerably. For Availability Management purposes the standard Incident ‘lifecycle’ as described within Incident Management has been expanded to provide additional help and guidance particularly in the area of ‘designing for recovery’. Figure 8.20 illustrates the expanded Incident ‘lifecycle’.

From the above it can be seen that an Incident can be broken down into stages which can be timed and measured. These stages are described as follows:

  • Incident start: the time at which the Customer recognises a loss or deviation of service or the time at which the Incident is first reported, whichever is the earliest
  • Incident detection: the time at which the IT organisation is made aware of an Incident
  • Incident diagnosis: the time at which diagnosis to determine the underlying cause has been completed
  • Incident repair: the time at which the failure has been repaired/fixed
  • Incident Recovery: the time at which component recovery has been completed
  • Incident restoration: the time normal business operations resume.

The Availability Manager role

While the job title Availability Manager isn’t one that stands out in today’s age (though organizations do still recruit for this role), the role of managing availability is part and parcel of ITSM environments, particularly those of an operational nature.

Interestingly, the European e-competence framework does not list ‘Availability’ in any title of its 40 reference dimensions or in the 30 European ICT Professional Role Profiles. A quick search, however, reveals that availability knowledge is required in several roles and activities:

  • Architecture design
  • Problem management
  • Information security strategy development
  • Information security management
  • The data administrator role
  • The DevOps expert role

Whether you’re a solution architect, software developer, systems administrator, or service desk support specialist, availability management will always be critical to your KPIs or OKRs. An excellent example is the site reliability engineer (SRE): availability is among the role’s top elements as it is essential to protecting, providing, and progressing software and systems.

PMP Sample Resumes! Download & Edit, Get Noticed by Top Employers! Download

The primary purpose of the Availability Management process is to ensure that the Availability requirements agreed with the business for IT Service(s) are consistently met. It is the responsibility of Availability Management to ensure that corrective actions are being progressed to address any shortfalls in meeting the levels of Availability required and expected by the business.

Availability Management can also play a key role in further optimisation of the existing IT Infrastructure to provide improved levels of Availability at a lower cost when Availability requirements change. The Availability Management process should wherever possible contribute activities to support an overall SIP.

To help achieve these aims Availability Management needs to be recognised as a leading influence over the IT support organisation to ensure continued focus on Availability and stability of the IT Infrastructure. As the ‘champion’ for Availability in the IT organisation the function should embrace and engender the ethos of ‘continuous improvement’ within the IT support organisation.

Are you looking training with Right Jobs?

Contact Us
Get Training Quote for Free