Backup, Compression, Deduplication, SNIA, and “Playing with Words”
A post concerning compression/deduplication and backup definitions I wrote yesterday received a great comment noting that I was “playing with words” concerning compression and deduplication and referencing the SNIA definitions. I responded back to the comment at length; but upon further thought I think it’s worth actually dedicating a post to this.
The question at hand is whether I should do the following as recommended in the post:
The SNIA dictionary has clear definitions for both compression and dedupe which have been created through a vetted collaboration process among industry vendors… recommend using those definitions.
My take on this, which I made in a response to the comment but will do so again here, are made below.
Customer Obsessed Versus Technology Obsessed Backup
There are many, many blogs out there dedicated to technology. The focus of this blog is on customers and backup technology. There is to some degree a conflict between what a vetted collaboration of industry vendors will agree to versus the objectives of this blog. What’s interesting to me is calling this out. Now – with that said – the commenter did a great job of making this clear – so I owe him my thanks.
The most applicable SNIA definitions I found are given below:
- Data Deduplication is the replacement of multiple copies of data—at variable levels of granularity—with references to a shared copy in order to save storage space and/or bandwidth
- Subfile Data Deduplication is a form of data deduplication that operates at a finer granularity than an entire file or data object
- Single Instance Storage is form of data deduplication that operates at a granularity of an entire file or data object
- Compression is the encoding of data to reduce its storage requirement – deduplicated data can also be compressed
Backup, Compression, Deduplication, SNIA, and Customer-Oriented Definitions
The point that I was making in the compression/deduplication and backup definition post yesterday was the same one that – gasp! – Wikipedia makes and others make. (Note: I realize that referencing Wikipedia is a serious faux pas in technology-related discussions – but I’m doing so to make a point – and that is the difference in technology obsession versus customer obsession.) Deduplication is a form of compression – to my point – a form of lossless compression. I’m not trying to be clever – the point is that both are technologies with the same objective. That objective is to provide more storage for less cost (because it would be stupid to try to provide more storage for more cost) – which of course is the heart of any “capacity optimization” technique. It’s just that even after getting all the advanced degrees I’d still rather use express ideas in fewer syllables than more. (note: I’ve always wondered in our industry if PhD should be pronounced “FUD” :))