How to Get Duped by Deduplication: Neglect Backup Size, Backup Type, and Retention
Another lesson on getting the wool pulled over your eyes – stick to the credo that “data is data” and don’t pay attention to the overall size of your backup or to the retention desired of your backup (retention just means the number of backups from which you want to be able to recover.)
In practice, backup size not only dramatically impacts ingest times (particularly in post-processing deduplication devices – which as noted elsewhere in this blog tends to start fast but quickly get bogged down if and when you exceed the landing site size) but also radically changes your expected data reduction ratios. More retention typically means more redundant data which means higher data reduction ratios – and the smaller the backup size, the more retention you’re typically going to get.
Of course, the degree to which retention will impact the data reduction ratio depends upon the backup type in use. Doing full masters every day means maximum data reduction; doing block-level incrementals every hour means that your data reduction ratio will probably be small.
Now – all of this doesn’t take into account the number of clients that you’re protecting, or the change rate of the data that you’re protecting. If you have ten clients all of which have the same data, you’re going to get a 10:1 data deduplication ratio. If you have a low change rate, you’re going to get a higher deduplication ratio.
It’s also important to take a step back and realize that your deduplication ratio changes over time as well. Deduplication ratios should increase over time as more data is added.
The Complete Series: How to Get Duped by Deduplication
- How to Get Duped by Deduplication: Ignore Recovery
- How to Get Duped by Deduplication: Blissfully Accept Published Ingest Rates from Vendors
- How to Get Duped By Deduplication: Focus on Technology Instead of Price Per Effective Terabyte
- How to Get Duped by Deduplication: Be Oblivious to Physical Storage Costs
- How to Get Duped by Deduplication: Blindly Believe Data Reduction Ratios
- How to Get Duped by Deduplication: Ignore Time
- How to Get Duped by Deduplication: Neglect Backup Size, Backup Type, and Retention
- How to Get Duped by Deduplication: Pay No Attention to Server, Workstation, PC, and Notebook Overhead
- How to Get Duped by Deduplication: Ignore the Impact of Encryption