What is Enterprise Backup Deduplication (SHA-1 Shattered)?
Google recently announced that a team of researchers have broken SHA-1 in practice. So what does this have to do with enterprise backup – and enterprise backup deduplication? That’s what we’ll explore in this blog post.
So what is SHA-1? SHA-1 is a 20-year old cryptographic hash function standard used for data integrity verification (as well as digital signatures.) SHA-1 produces a 20-byte (160-bit) signature.
So why do you care? If you’re using a modern enterprise backup system that includes deduplication – and it’s hard to believe that a modern enterprise backup system wouldn’t use inter-job storage-wide deduplication – then the way that the deduplication works is by using a cryptographic hash function to ensure the integrity of each deduplicated “chunk” of data that you back up. If your backup system is using SHA-1, then Google just announced that they were able to hack your backup or storage system that uses deduplication if it uses SHA-1.
Are you in immediate danger? No. It took an enormous amount of compute resource to crack SHA-1. At the same time, Google was able to do this in an order of magnitude less time (with the same resources) as the prior brute force attack. And with hackers collaborating not only on algorithms but methods to distribute those algorithms to dramatically accelerate the hacking of data, Google’s announcement is a big signal to move off of SHA-1.
What’s the practical implication to you? The practical implication is that in the next few years malware such as Ransomware could incorporate this hack into destroying your data. And if your current backup vendor is using SHA-1, that they will need to move to a more advanced cryptographic hash function for data integrity verification. That movement will be a big deal for you – all of your current and retained backup and archive and replicated data will need to be re-hashed with the new cryptographic function.
Why would backup and deduplication target vendors use SHA-1? It really comes down to a combination of a failure of imagination and abdication of responsibility. It’s a lot easier to use a smaller cryptographic hash standard – you use less memory and less disk space. But cryptographic hash functions are constantly under attack – and the attack that yesterday took 10 years of compute, and today take 1 year of compute, can tomorrow take a few seconds or less.
So ask your backup vendor what their deduplication cryptographic hash function is – and see what their commitment to true enterprise-level functionality really is.
As always, would love to know your thoughts on this or anything else.
P.S. What is the deduplication hash cryptographic function used by Unitrends in our Unitrends Cloud, our physical appliances, and our virtual appliances? SHA-512 (from the SHA-2 family of cryptographic hash functions.)