How to Get Duped by Deduplication: Neglect Backup Size, Backup Type, and Retention

Another lesson on getting the wool pulled over your eyes – stick to the credo that “data is data” and don’t pay attention to the overall size of your backup or to the retention desired of your backup (retention just means the number of backups from which you want to be able to recover.)

In practice, backup size not only dramatically impacts ingest times (particularly in post-processing deduplication devices – which as noted elsewhere in this blog tends to start fast but quickly get bogged down if and when you exceed the landing site size) but also radically changes your expected data reduction ratios. More retention typically means more redundant data which means higher data reduction ratios – and the smaller the backup size, the more retention you’re typically going to get.

Of course, the degree to which retention will impact the data reduction ratio depends upon the backup type in use. Doing full masters every day means maximum data reduction; doing block-level incrementals every hour means that your data reduction ratio will probably be small.

Now – all of this doesn’t take into account the number of clients that you’re protecting, or the change rate of the data that you’re protecting.  If you have ten clients all of which have the same data, you’re going to get a 10:1 data deduplication ratio.  If you have a low change rate, you’re going to get a higher deduplication ratio.

It’s also important to take a step back and realize that your deduplication ratio changes over time as well.  Deduplication ratios should increase over time as more data is added.

The Complete Series: How to Get Duped by Deduplication