Organizations of all sizes are consistently reporting increased numbers of cyber incidents, with data breaches and ransomware infections fast becoming a common occurrence. While solid security procedures and good planning can go a long way towards preventing and containing many incidents, sooner or later things can go wrong – and when they do, you have to be prepared for recovery in order to maintain business continuity, minimize costly downtime, and control fallout. In this article, we will look at 7 key aspects of planning for cyber incident recovery.
1. Your Recovery Goals
While the general goal of recovery efforts is obviously to restore normal operations, it’s a good idea to define specific recovery goals for your systems, processes, and business. Here are a few general goals to keep in mind:
- Minimize disruption to normal operations: Most cyber incidents are limited in scope and you will be resolving them in parallel with regular business operations. Wherever possible, recovery efforts should have minimum impact on regular work elsewhere in your organization.
- Contain and minimize damage: When faced with multiple options for recovery, you need to choose solutions that minimize the overall operational and economic impact for your business.
- Ensure operational continuity: Be prepared for multiple scenarios that may disrupt your communication, workflows, and business and recovery procedures. Make sure that you can maintain and resume operations in all likely crisis situations.
- Quickly and smoothly restore normal services: This is probably the most obvious technical goal of recovery. All your procedures and efforts should be focused on resuming normal operations as soon as possible while also considering your other strategic goals.
- Define recovery priorities: Depending on the type of incident, recovery will often be performed in stages, so you should define the order of recovery for systems and processes. Consider both the technical requirements (what systems are necessary to bring other areas back to operation) and the business aspects (which business processes can wait a little longer for recovery and which must be restored urgently).
2. Your Vital Assets
For effective recovery, you need to know what you will be recovering, so a detailed inventory of all physical and digital assets relevant to recovery is a crucial aspect of recovery planning. This should cover all items that are necessary for everyday operations, including:
- Hardware: Your physical infrastructure including servers, workstations, mobile devices, network devices, cabling, and power supply equipment.
- Software: The heart of your business infrastructure including operating systems, middleware, applications, hypervisors, systems and network management tools, recovery tools, and cybersecurity products.
- Data: All the information needed to keep your organization and its infrastructure running including user files, company intellectual property, business databases, administrative databases, configuration settings, and access credentials.
- Digital assets: Your vital intangibles including software licenses and security certificates. While they are easily overlooked, remember that a missing or invalid certificate can paralyze your web applications or itself cause a cyber incident and software licenses and activation keys can be crucial when you need to restore entire systems.
3. Backup Policy and Testing
Data is the most valuable asset of any business, so having a solid backup strategy is a vital aspect of recovery planning. When things go wrong, you need to have guaranteed access to a least one good backup of your data and this copy must be restored as quickly as technically possible. If this step fails, all your other recovery efforts will be in vain, so think carefully about the two main aspects of backup management:
- Backup strategy: Define your Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO) for different kinds of data covered by backups. Simply put, RTO states how quickly data should be restored from backups (how long you can afford to wait for recovery) and RPO indicates how recent your backup should be (how much recent data can you afford to lose). Based on these objectives, choose the type of backup media for different data types, balancing recovery performance and data safety with your budgetary and organizational constraints.
While any backup is better than none, it is best practice to follow the 3-2-1 rule: keep 3 copies of any important file (1 original and 2 backups), store your backups on 2 different types of media (for example on disk and tape or on disk and in the cloud), and store 1 backup offsite, at a physically separate location from your original data.
- Restore procedure testing: Even if you make regular backups, they will be useless in an emergency unless they can be successfully restored within your RTO and RPO targets. Backup restoration procedures must be regularly tested and updated and staff trained accordingly. Also, remember to periodically check that your backups are not corrupted and are always accessible within the RTO timeframe.
4. Your Recovery Personnel
People are your first line of defense in any incident, so make sure you define and maintain a list of all personnel who might be involved in cyber incident recovery. Specify roles in your recovery procedures and decision processes and ensure that each role is always suitably filled or delegated regardless of your current staffing situation. Considering that most recovery operations require administrative privileges of some sort, include credential management in your procedures. If some areas of your company’s operations are outsourced, make sure your provider agreements include relevant provisions for recovery.
5. Connectivity and Communication
Coordinating the work of multiple people and systems requires effective communication and in the confusion and uncertainty of a cybersecurity incident, this becomes paramount. Your recovery plan should define connectivity requirements for key systems and communication channels for recovery personnel. Anticipate likely attack and response scenarios and include the risk of degraded communication and/or connectivity in your plans. If possible, prepare backup communication and data transfer channels, both logical and physical. For example, if retrieving data from one of your offsite backups is not possible or feasible (maybe a network link is down or it would take too long considering your RTO for this data), you may need to transport physical media to your main site. For personnel communication, your plan might include a scenario where no communication over the company network is possible and staff must rely on face-to-face meetings and mobile devices.
6. Recovery Requirements for External Providers
Few modern organizations handle their IT entirely in-house and it’s likely your business has multiple dependencies on external providers. Ensure that all your service level agreements (SLAs) with third parties anticipate outages and recovery situations, both for your own systems and those of the provider. For example, if you use cloud storage for some of your backups, your SLA with the cloud storage or backup provider should define service levels and costs not just for routine backups and emergency data retrieval, but also for situations where the service is unavailable when you need it. Make sure you fully understand your dependencies on external providers and plan for multiple recovery scenarios.
7. Detailed Testing and Regular Updates
Even the best-planned recovery plan won’t be much use in an emergency if it contains outdated information or procedures. Ensure that recovery planning is tied into all relevant change processes in the organization, from HR to software and hardware maintenance. This helps avoid situations where a crucial software feature is missing from an updated version, new backup hardware is incompatible with existing media, or key staff no longer work at the company.
Regular testing is the best way of identifying problem areas and training your personnel in recovery procedures and communication. Remember that just one missing link in the recovery operations chain can be enough for the entire recovery process to fail or require time-consuming and potentially costly manual intervention.
Cyber attacks are currently considered the most likely man-made global threat, and with organizations worldwide heavily reliant on IT infrastructure and cloud platforms, they have the potential to cripple a business, a government institution, or even a whole country. When you add attacks on physical infrastructure, accidental damage to hardware, human error, utility failures and a host of other unforeseeable events, cyber incidents of all sorts are inevitable and bound to become ever more common. Armed with a carefully prepared disaster recovery plan, you can at least have some peace of mind knowing that when things do go wrong, you are ready to get your business back on its feet as soon as possible.