Businesses find comfort in backups, knowing data is protected from loss. Unfortunately, the reliability factor of backups depends on a number of factors, from the backup approach, state of the environment and the backup schedule. Likewise, recovery depends on the type of backups, restore method required and the recovery target. It’s no wonder that even with a backup strategy, a staggering 50% of restores fail.  

In a nutshell, painful backup failures are all too common. 

What is backup failure? 

No backup infrastructure can guarantee a 100% failure-free transaction every time. This is what you call backup failure, and according to a survey, the failure rate is at an all-time high of 37% and is expected to increase over the coming years.  

Backup and recovery are based on the system used — in many cases outdated or overworked infrastructure components may be poorly optimized for backups, and even less so for restoration. Backups exist in complex environments and critical data exists across on-premise, cloud and hybrid platforms. It’s difficult for multifunctional IT professionals (and systems) to keep track of all the data in such a complex environment without ever missing a beat.  

Conversely, hackers have benefited from the same complexity. Poor business continuity and disaster recovery (BCDR) allows attackers to target backup files, either deleting them outright or encrypting and holding data for ransom.   

One of the biggest problems with backup failure is that they may go undiscovered until a restoration request is processed. Backup failures hide in plain sight since they don’t impact the production environment and may not produce errors indicating an issue. Unfortunately, it’s often only when the backup files are needed for recovery that an operator discovers a previous backup failure and is unable to perform the necessary restore.   

Even if you’re taking backups on a regular schedule, the moral of the story is: Don’t. Ever. Get. Too. Comfortable. 

What are the risks of backup failure?

There is a considerable amount of risk of backup failure due to the fact it limits the ability of an organization to recover from a disaster and ensure business continuity. Here are some of the monetary and non-monetary risks that follow a backup failure.   

Lost productivity

Productivity is the first casualty of backup failure. The inability to recover backup files forces IT departments to work overtime to investigate and fix the issue to recover data, putting all the ongoing work in a logjam. It creates a ripple effect, delaying work across departments.    

Reputation cost

Fixing the reputation of your business after a backup failure is challenging since failed recoveries may lead to prolonged outages or disrupted services. In fact, reputation management can have a huge impact on your margins. Not only do you lose current customers but, with poor credibility and bad publicity, potential customers would never come knocking on your door.   

Penalties and legal fees

Backup failures can conflict with compliance regulations, leading to hefty penalties. Fighting penalties means dealing with a pile of legal paperwork and courts. In many instances, legal fees can exceed more than the penalty itself.      

 

Why do backups fail?

Although backup failures come as a surprise, the reasons for their failure are not so surprising. Here are a few common reasons why backups fail:  

Infrastructure issues

Backups move through multiple infrastructures and a failure at any one of them can affect the backup. The infrastructure includes tape drives, disk arrays, backup servers, networks and the cloud. Ransomware attackers are notorious for leveraging infrastructure vulnerability. Cybercriminals keep ransomware-causing malware hidden within the infrastructure for long gestation periods to extract copious amounts of sensitive data, leading to a surge in advanced persistent threats (APT).   

Here are some key factors that lead to an infrastructure failure:  

  • No direct control of WAN and cloud (public), impacting service-level agreements (SLAs).  

  • Using cloud backups with low performance and higher lag periods.  

  • The infrastructure does not support WAN-based replication, meaning any break in connectivity can interrupt backup activities.   

Media issues

The choice of media is perhaps the most important factor in backup success (or failure). In this regard, tape backups are still one of the most popular forms of long-term backup media. The low cost, high portability and excellent long-term data retention make tapes a desirable media choice for many. However, tapes are prone to shoe-shining (also known as tape back-hitching), where a tape drive makes a back and forth motion when there is an interruption in the data stream. We see this often with today’s “incremental” backup modes. Since incremental backups only track changes, they are often small in size. The repeated back and forth motion creates extreme wear and tear on both tapes and the tape drive, degrading service and increasing the chances of a backup failure.  

Software issues

Often, the backup software itself fails. The cause may be bad input, resource limit exceeded or other application software glitches. One of the more common reasons is patching — a large number of changes to software creates incompatibilities with the backup configuration. Microsoft Volume Shadow Copy Service (VSS) is another reason why many backup tasks fail. VSS errors crop up due to the use of multiple backup software since they give rise to confusion about which backup solutions should protect which data set.   

Human error

Human error is still the primary reason behind an overwhelming majority of cybersecurity problems. It’s no different for backup either since humans are tasked with the deployment and operation of the backup process. A part of the problem lies in poor security training, and the other is obvious — they are just human. For example, take Veeam — a backup and recovery company that had hundreds of millions of records exposed due to database mismanagement by employees. The exposed database contained 200 gigabytes worth of customer records, including names, email addresses and some IP addresses.    

Prevent backup failure with Unitrends

Unitrends prevents backup failure by testing and remediating issues before running backups, ensuring 100% recovery confidence.  

Unitrends Recovery Assurance

Recovery Assurance orchestrates recovery testing in an isolated, sandbox environment. You can test multiple machines and segment them by boot group, test reconfiguration of resources and networking, and execute application-level scripts to ensure apps and services are working as expected and as needed for recovery.   

Unitrends Helix

Helix is a reactive method that monitors your backup appliance and production environment to autonomously correct VSS errors and a variety of other software-related issues to ensure clean, consistent backups. On deployment of Unitrends Helix, the system ensures that the criteria for taking a successful backup are met before the backup runs.

Want to gain peace of mind when it comes to your backups? Try Unitrends   

About Adam Marget

Adam is a Technical Specialist on the Unitrends marketing team supporting digital and in-market events. Over the last 4 years with Unitrends, he has been delighted at the opportunity to work with customers, prospects, and partners alike to help solve challenges around data protection and business continuity. Adam joined Unitrends in 2016, bringing with him experience working with variety of manufacturers’ technology from edge to core as a coworker from national IT solutions provider CDW.