Deduplication is increasingly a standard option for disk-based backup (i.e., D2D backup.) (Note: I have to say “option” because so many vendors charge extra for deduplication while other vendors sell deduplication devices that are dedicated to being the functionally partitioned “back end” of the actual backup system – which seems to me to be just plain wrong – but that’s a different topic.) However, deduplication being used in primary storage is more of an emerging phenomenon.
Primary storage deduplication has been available for some time, but has seen a relatively low adoption rate. In 2010 we saw the heating up of the M&A (Mergers and Acquisition) market led by Dell’s acquisition of Ocirina. But companies like NetApp have been offering primary storage deduplication for some time – and companies like EMC’s Data Domain are also pushing into primary storage deduplication.
I believe one of the biggest factors in moving primary storage deduplication forward is Solaris with ZFS. I’m seeing increasing number of customers who are opting to use this technology in primary storage. And while you’re not going to see the same type of deduplication ratios for ZFS on standard servers and storage that you see in specialized deduplication devices, the fact is that in terms of “bang for the buck” (terabytes per dollar) it’s hard to beat.
However, there is a problem with ZFS and these other primary storage deduplication devices – the lack of a “deduplication-aware backup API” that allows backup vendors to efficiently backup deduplicated data without having to rehydrate that data.
ZFS has a set of utilities called “zfs send” and “zfs receive” that theoretically could solve this problem for ZFS. However, upon closer inspection what you find is that you’d have to in essence create a sparse file container on your backup target and then create a ZFS filesystem within that sparse file container. While I haven’t actually used this yet, what of course is the issue is that unless you’re replicating ZFS to ZFS (and to be fair, that appears to be the design intent of “zfs send” and “zfs receive” there’s going to be a huge amount of work necessary to make “zfs send” act as a “backup API.”
We’re currently investigating this – because while it may have shortcomings, at least ZFS has offered up a potential interface for a “deduplication-aware backup API” – most of the other primary storage deduplication companies aren’t even to that point yet.