How to Get Duped by Deduplication: Pay No Attention to Server, Workstation, PC, and Notebook Overhead
Data reduction, whether it’s compression or deduplication, is simply the substitution of processor, memory, and disk I/O for disk storage space. The processor, memory, and disk I/O have to take place somewhere. As we discussed earlier, deduplication can take place at either the source (the element being protected) or the target (the backup server/deduplication device.)
If you’re going to use source deduplication, then you need to make sure that you are protecting computers and other information technology devices that have pretty low utilization rates. In other words, be careful using source level deduplication on a busy computer.
Virtualization has tended to increase the utilization rates of host computers – and that makes sense unless customers purchase a lot of excess server capacity (which doesn’t tend to make a lot of sense – because hardware is a commodity with better performance at a lower cost over time.) So keep your eye on utilization rates when you’re considering deduplication on the computer that you’re protecting.
Backup Types and Source-Level Deduplication
Another thing to pay attention to is the interaction between backup types and source-level deduplication. If you’re doing a master every day, then source-level deduplication buys you a lot. If you’re doing incremental forever with synthetic masters/fulls, then source-level deduplication isn’t going to buy you much (but target deduplication in that case is going to get very good deduplication ratios because of the potential for duplicate data on the target when the synthetic master/full is created.)
In addition, the use of inclusion/exclusion policies on the source itself is going to dramaticaly change the effectivity of both source and target deduplication. If you exclude non-changing files and instead do a monthly or quarterly image-based backup (bare metal), then your daily backups aren’t going to show as high a deduplication ratio.
The Complete Series: How to Get Duped by Deduplication
- How to Get Duped by Deduplication: Ignore Recovery
- How to Get Duped by Deduplication: Blissfully Accept Published Ingest Rates from Vendors
- How to Get Duped By Deduplication: Focus on Technology Instead of Price Per Effective Terabyte
- How to Get Duped by Deduplication: Be Oblivious to Physical Storage Costs
- How to Get Duped by Deduplication: Blindly Believe Data Reduction Ratios
- How to Get Duped by Deduplication: Ignore Time
- How to Get Duped by Deduplication: Neglect Backup Size, Backup Type, and Retention
- How to Get Duped by Deduplication: Pay No Attention to Server, Workstation, PC, and Notebook Overhead
- How to Get Duped by Deduplication: Ignore the Impact of Encryption