Unitrends has been focusing on backup and cloud disaster recovery for 30 years. In that time one thing hasn’t changed. New customers still come to us looking for a backup and cloud DR solution that gives them confidence that recoveries will happen. We thought it was time to draw on this experience to put together a series of blogs that explore the reasons recoveries fail and discuss ways you can prevent those failures from happening.
Unitrends 2019 Survey on the State of Backup and the Cloud found 42% of enterprises experienced downtime last year so recovery is not some esoteric topic. If you work in IT you will eventually have to deal with a recovery – and the results, either positive or negative.
Today backups are easy. In yesteryear, tapes were the dominant backup media, with all their management, storage and physical degradation issues. Today the disks are the most common backup technology (both disk to disk and disk to cloud) with about a 70% market share. Appliances automate backup schedules, so you don’t even have to think much about backups – they just happen.
It is recoveries that are difficult. Today the infrastructure and technology you are trying to recover is infinitely more complex than even a decade ago. You probably have data and applications running on premises, in the cloud and as SaaS apps such as O365 or Salesforce. Recovering them means having to recreate their exact environment and technical settings during downtime – a period of very high stress. If even one element is not perfect, the recovery will fail. Then, with the clock ticking, you must do forensic analysis to discover what went wrong.
One common cause of recovery failures is the lack of understanding of software dependencies. One example is that many business-critical applications such as SQL, Exchange, Oracle data bases, or CRM run on multiple servers (multi-tier or N-tier applications). One server may deal with processing, another data management, and possibly a third with the user interface and data presentation. Simply restoring each server operation is not enough. The different machines all need to communicate with one another perfectly and if you backed up each of these servers on different schedules, try restoring them with the wrong boot order, or host them with a different virtual network application functionality will not be restored. The recovery will fail, data may be lost, and you may have to waste many hours troubleshooting the issues.
Even how you configure your backups can cause recoveries to fail. Certain VM disk configurations can cause backups to appear successful but have little to no data in them. For example, in a VMware environment, if you configure both backup disks in independent mode, neither disk will have the ability to be snapshotted and the backups will be empty. Some backup technologies use an incremental forever schema which records a “journal” that tracks which data has changed since the last backup. If you do not configure enough space for this journal or your data change rate was much higher than expected, the journal could fill and write data past the last change, disabling the system’s ability to track which data is net new, making disaster recovery a problem.
These are just two common scenarios that can prevent successful recoveries. There are many more. To make it easier for you to systematically look for potential recovery issues we have created a Hidden Risks Checklist. This one-page document lists some common interdependencies that can be problematic if they are not considered when setting up and managing their backup and recovery program.
Organizations are increasing their expectations for fast recoveries. Respondents to our 2019 survey reported that, in just one years’ time 12% more organizations raised their RTOs to less than 4 hours. This ups the requirements for IT to have the tools, knowledge, and procedures in place to meet rising recovery expectations.
Please see the checklist, follow these blogs, read our upcoming White Paper and download our recovery tools so you are better prepared to deal with something that, statistics show, will be a critical event in your IT career. Fore-warned is fore-armed.