These infrastructure interruptions have the potential to wreak as much havoc within a company as the loss of the data itself. However, their effects can generally be minimized through the application of recovery or continuity strategies that are the result of advanced planning and preparation.
The above description of disaster may suggest that only a major calamity—a terrorist bombing, an earthquake, or even a war—would qualify as a disaster. The term disaster conjures to mind flames engulfing the offices of Fiscal Services Limited, rather than an accidental hard disk erasure at the small business office down the block. In either case, if the result is an unplanned interruption of normal business processes, the event may be classified as a disaster. Disasters are relative and contextual. However, despite contextual diversity, there are some constants about disasters. One is time.
Because of business' growing dependency on customized information systems and networks, alternatives to system-provided functions and information cannot be implemented readily. Yet, for a business to survive a disaster, the time factor for restoration of system functions is critical.
In the past, most companies could withstand interruptions in normal processing for a protracted period of time. Given the increased dependency of business today on information technology, it is hard to imagine a company withstanding an outage of more than 48 hours without incurring serious difficulties for its market position. Indeed, for companies ranging from brokerages and banks to e-commerce vendors and just-in-time manufacturers, the costs associated with even minimal system or network interruptions may be extremely high.
Based on available evidence, the time required to recover critical business processes following an interruption is a universal determinant of successful recovery. Unplanned interruption can cost a business dearly in revenues, reputation, customers, and investors. The objective of disaster planning is to recover mission-critical processes as quickly as possible following the interruption event to mitigate its duration and costs.
Because IT resources are so essential to success of many organization’s today, it is critical that the services provided by these systems are able to operate effectively without excessive interruption. Disaster planning supports this requirement by establishing thorough plans, procedures, and technical measures that can enable a system to be recovered quickly and effectively following a service disruption or disaster. The business resumption plan should aim at achieving a systematic and orderly resumption of all the organizations IT services. The plan should provide for restoring service as soon as possible. Those functions that are most critical to achieving the agency mission must remain in operation during the recovery period.
There are nine major phases in the disaster planning process, they are namely: Project Planning, Critical Business Requirements, Recovery Strategies, Emergency Response/Problem Escalation, Plan Activation, Recovery Operations, Training, Testing, and Plan Maintenance.
In the Project Planning phase we define the project scope, organize the project, and identify the resources needed. Within this phase a preliminary management commitment is obtained, a disaster recovery/business resumption manager is designated, a disaster recovery/business resumption planning team is organized, current recovery preparedness is audited, the project schedule developed, documentation procedures specified, the recovery program overview defined, and the scope and aim of the disaster recovery/business resumption plan identified.
Within the Critical Business Requirements phase we identify the business functions most important to protect, and the means to protect them, analyse risks, threats, and vulnerabilities. An organisation may carry out hundreds of operations that management and staff consider important. Key resources may be unavailable during a disaster so the organization must concentrate its resources on the operations that are most important for public health, safety, and welfare. The aim of a disaster recovery/business resumption plan is to reduce potential losses, not to duplicate a business-as-usual environment. Within this phase the business impact analysis (BIA) is conducted. The BIA is a key step in the disaster planning process. The BIA enables the disaster recovery/business resumption manager to fully characterize the system requirements, processes, and interdependencies and use this information to determine contingency requirements and priorities. The purpose of the BIA is to correlate specific system components with the critical services that they provide, and based on that information, to characterize the consequences of a disruption to the system components. Key steps are listing critical IT resources, identifying disruption impacts and allowable outage times, and developing recovery priorities.
The Recovery Strategies phase arrange for alternate processing facilities to use during a disaster. Recovery strategies provide a means to restore IT operations quickly and effectively following a service disruption. Strategies should address disruption impacts and allowable outage times identified in the BIA. Several alternatives should be considered when developing the strategy, including cost, allowable outage time, security, and integration with larger, organization-level contingency plans. The selected recovery strategy should address the potential impacts identified in the BIA and should be integrated into the system architecture during the design and implementation phases of the system life cycle.
The recovery strategy should include a combination of methods that complement one another to provide recovery capability over the full spectrum of incidents. A wide variety of recovery approaches may be considered; the appropriate choice depends on the incident, type of system, and its operational requirements. Specific recovery methods may include commercial contracts with cold, warm, or hot site vendors, mobile sites, mirrored sites, reciprocal agreements with internal or external organizations, and service level agreements (SLAs) with the equipment vendors. In addition, technologies such as Redundant Arrays of Independent Disks (RAID), automatic fail-over, uninterruptible power supply (UPS), and mirrored systems should be considered when developing a system recovery strategy.
The Emergency Response/Problem Escalation phase specifies exactly how to respond to emergencies and how to tell when a "problem" has become a potential "disaster". In this phase potential threats are identified and emergency procedures developed. The other phases deal with what to do after the disaster recovery plan has been formulated, potential disasters identified and a focal point for coordinating the recovery program is set up. The Plan Activation phase determines procedures for informing the right people, assessing the impact on operations, and starting the recovery efforts. Within the Recovery Operations phase we develop the specific steps for reducing the risks of a disaster and restoring operations should a disaster occur.
The Training phase makes sure everyone understands the recovery plan and can carry it out efficiently. Training for personnel with contingency plan responsibilities should complement testing. Training should be provided at least annually; new hires with plan responsibilities should receive training shortly after they are hired. Ultimately, contingency plan personnel should be trained to the extent that that they are able to execute their respective recovery procedures without aid of the actual document. This is an important goal in the event that paper or electronic versions of the plan are unavailable for the first few hours resulting from the extent of the disaster.
The Testing phase makes sure the plan works effectively. Plan testing is a critical element of a viable contingency capability. Testing enables plan deficiencies to be identified and addressed. Testing also helps evaluate the ability of the recovery staff to implement the plan quickly and effectively. Each IT contingency plan element should be tested to confirm the accuracy of individual recovery procedures and the overall effectiveness of the plan.
The final phase, the Plan Maintenance phase makes changes and additions to keep the plan current. To be effective, the plan must be maintained in a ready state that accurately reflects system requirements, procedures, organizational structure, and policies. IT systems undergo frequent changes because of shifting business needs, technology upgrades, or new internal or external policies. Therefore, it is essential that the contingency plan be reviewed and updated regularly, as part of the organization’s change management process, to ensure new information is documented and contingency measures are revised if required. As a general rule, the plan should be reviewed for accuracy and completeness at least annually or whenever significant changes occur to any element of the plan. Certain elements will require more frequent reviews, such as contact lists. Based on the system type and criticality, it may be reasonable to evaluate plan contents and procedures more frequently.