How to Get Duped by Deduplication: Ignore Time

What I mean by this is that a surefire way to get duped by deduplication is to ignore the fact that the data reduction ratio of a backup appliance, solution, or deduplication device will be at its worst the first day it is put in use and at its best when it reaches its ultimate limit in terms of storage.  This make sense, right?  After all, the more data that exists on the storage device the more possibility that duplicate data exists – and the more duplicate data exists the higher the deduplication ratio (and the overall data reduction ratio as well.)

However, it’s one thing to superficially grasp this rather simple concept  – it’s another to actually understand what it means to your backup strategy.  Because of the relentless advertising of deduplication vendors using aggressive goals such as 20:1 deduplication ratios, there’s a belief that this actually means that on the first day of use you’re going to see 20:1 deduplication.  You’re not (or at least you’re not going to see it very often – it would take a very atypical set of data for this to occur.)

In fact, if the deduplication strategy optimizes re-hydration time (the amount of time it takes to recover your data), your first set of backups aren’t going to be deduplicated at all in order to optimize recovery time.  To do this, a technique known as “reverse referencing” is used that keeps the “cache” or “index” of data most closely associated with the most recent backup (and that can include both a master and an incremental/differential as well.)

The best way to take into account all of this is to understand that you should expect very poor data reduction ratios on the first backup and that subsequent backups will exhibit an increasingly better data reduction ratio.  The best data reduction ratio will occur approximately just before your backup device is full of backups.  And of course, if your data reduction methodology is working correctly then this is going to take a while to occur because as your data reduction ratio increases it will take longer to fill your storage – and that’s a good thing!

The Complete Series: How to Get Duped by Deduplication