[If you go to the Unitrends web site, you can find a paper that discusses all of these at greater length.]
How to Get Duped by Deduplication: Blindly Believe Data Reduction Ratios
In a white paper and another posting in this series, in order to portray some deduplication solutions in the best possible light, we made the assumption of a 20:1 data reduction ratio (also called a data deduplication ratio.) In order to be duped by deduplication, you should blithely and blindly make that assumption as well.
In truth, data reduction ratios vary widely based on the following primary factors:
- The type of data being deduplicated (unstructured data is better)
- The type of backup being deduplicated (multiple master and differential backups)
- The frequency and degree of data that is changed (the lower, the better)
- The retention of the data (the longer, the better)
- The specifics of the data reduction algorithms being used
Anyone who tells you that they can predict the specific data reduction rate that you’ll achieve for your data is misleading you. The best that can be done is to ask questions and make assumptions about “normal” data. Thus, you’re always better off assuming a lower data reduction ratio.
One other important consideration is the effect of time on deduplication.
The bottom line here is pretty simple. The more duplicate data you have, the more data reduction techniques such as data deduplication (and compression) can help.