SAN backup, or the lack of SAN backup, is in the news. The failure of the EMC DMX-3 SAN for the Virginia Department of Motor Vehicles has made not only the technology press but the Washington Post, the Boston Globe, and others. No wonder. The lack of a SAN backup strategy left 26 of 83 of Virginia’s state agencies down.
From Sam Nixon, Virginia’s Chief Information Officer
A piece of equipment went down that is meant to never go down, went down. […]
The recovery time has been unacceptable.
From 2001: A Space Oddessey
Interviewer: HAL, you have an enormous responsibility on this mission, in many ways perhaps the greatest responsibility of any single mission element. You’re the brain, and central nervous system of the ship, and your responsibilities include watching over the men in hibernation. Does this ever cause you any lack of confidence?
HAL: Let me put it this way, Mr. Amor. The 9000 series is the most reliable computer ever made. No 9000 computer has ever made a mistake or distorted information. We are all, by any practical definition of the words, foolproof and incapable of error.
There are echoes of the arrogance and hubris that lay at the heart of Kubrick’s 2001 – the belief that humans can build infallable solutions and systems. Human’s can’t do that – and never will be able to do it. The more important any technology, the more reliant we are on the artifacts of that technology, the more important it is that we think through the implications of failure.
The trouble is when intelligent people begin believing the hype regarding SAN reliability and availabilty. SANs go down. Period. SAN data becomes corrupted. Period. SAN snapshots, SAN replication, and SAN redundancy are all techniques to manage reliability and availability. They are not techniques to manage true SAN data protection – SAN backup does that.
The initial blame in this case was a memory card failure in the caching subsystem of the SAN coupled with a problem with a fail-over to another SAN. The true trouble is that once the failure occurred, they couldn’t be sure that the replication system they had in place hadn’t corrupted the data. This is the heart of the limits of replication, versus SAN backup, that plague the attempts by some vendors to tout their SAN technology as a substitute for SAN backup. It isn’t. And the people of Virginia are now living with the consequences.