How to Get Duped by Deduplication: Pay No Attention to Server, Workstation, PC, and Notebook Overhead

Data reduction, whether it’s compression or deduplication, is simply the substitution of processor, memory, and disk I/O for disk storage space. The processor, memory, and disk I/O have to take place somewhere. As we discussed earlier, deduplication can take place at either the source (the element being protected) or the target (the backup server/deduplication device.)

If you’re going to use source deduplication, then you need to make sure that you are protecting computers and other information technology devices that have pretty low utilization rates. In other words, be careful using source level deduplication on a busy computer.

Virtualization has tended to increase the utilization rates of host computers – and that makes sense unless customers purchase a lot of excess server capacity (which doesn’t tend to make a lot of sense – because hardware is a commodity with better performance at a lower cost over time.)  So keep your eye on utilization rates when you’re considering deduplication on the computer that you’re protecting.

Backup Types and Source-Level Deduplication

Another thing to pay attention to is the interaction between backup types and source-level deduplication.  If you’re doing a master every day, then source-level deduplication buys you a lot.  If you’re doing incremental forever with synthetic masters/fulls, then source-level deduplication isn’t going to buy you much (but target deduplication in that case is going to get very good deduplication ratios because of the potential for duplicate data on the target when the synthetic master/full is created.)

In addition, the use of inclusion/exclusion policies on the source itself is going to dramaticaly change the effectivity of both source and target deduplication.  If you exclude non-changing files and instead do a monthly or quarterly image-based backup (bare metal), then your daily backups aren’t going to show as high a deduplication ratio.

The Complete Series: How to Get Duped by Deduplication