How to Get Duped by Deduplication: Disregard Recovery
Deduplication is the process of removing duplicate data. In order to reconstitute deduplicated data so that it may be recovered, the inverse of the deduplication process must occur. This process, which is commonly termed “hydration” “re-hydration” or “re-duplication”, tends to have two negative consequences:
- It takes time to transmogrify the data from the deduplicated state to the original state.
- There is a risk that something could go wrong and data could be lost.
The “transmogrification” of the the data (I will admit to you that I first heard this word reading a Calvin and Hobbes cartoon years and years ago) is a fundamental consequence of the technology. The best way to handle this on a technical basis is to use a technique known as reverse referencing – creating the deduplication index/cache from the last backup. There are various ways to make this work, but the best way from the standpoint of recovery is to reverse reference and not allow deduplication across that last backup set. This is the approach that the software developers at Unitrends took and was driven by a requirement that deduplication have the least negative impact on recovery times on the most frequent recovery case (recovering the last backup performed.) This is one of the reasons (not the only one, but one of the reasons) that a hybrid compression/deduplication data reduction technique was invented for Adaptive Deduplication.
In order to decrease your risk, you have to pay attention to what your recovery time objectives are with respect to deduplication. You also have to focus on the underlying reliability of the physical storage on which the deduplicated data is stored.
The Complete Series: How to Get Duped by Deduplication
- How to Get Duped by Deduplication: Ignore Recovery
- How to Get Duped by Deduplication: Blissfully Accept Published Ingest Rates from Vendors
- How to Get Duped By Deduplication: Focus on Technology Instead of Price Per Effective Terabyte
- How to Get Duped by Deduplication: Be Oblivious to Physical Storage Costs
- How to Get Duped by Deduplication: Blindly Believe Data Reduction Ratios
- How to Get Duped by Deduplication: Ignore Time
- How to Get Duped by Deduplication: Neglect Backup Size, Backup Type, and Retention
- How to Get Duped by Deduplication: Pay No Attention to Server, Workstation, PC, and Notebook Overhead
- How to Get Duped by Deduplication: Ignore the Impact of Encryption