Most of Google’s services went down Friday (August 16, 2013) afternoon. According to one real-time analysis company (GoSquared), internet traffic dropped 40% (yes, by almost half) during the time Google was down. Pretty amazing.
Here’s Google take on the incident:
To clarify, that 11 minutes was came from the posting times of each of the updates on the Dashboard, which is different than the actual incident time. The dashboard clearly states “Between 15:51 and 15:52 PDT, 50% to 70% of requests to Google received errors; service was mostly restored one minute later, and entirely restored after 4 minutes.”
I’m not surprised that Google went down for a few minutes – I am surprised and impressed how quickly they recovered.
- It is a surprise.
- The event has a major impact.
- Once the event occurs, it is rationalized by hindsight as if it could have been expected.
What this practically means is pretty simple. It’s obvious in hindsight that the major cloud services will go down. Period. But there’s relatively little planning for it – beyond what the cloud services themselves are doing.
Likewise, whether you’re 100% on-premise, 100% in the cloud, or you have a hybrid cloud, the question isn’t whether you’re going to experience a major outage – but when and how bad is it going to be. Because the people to whom you report are going to – in hindsight after the event occurs – view the outage as something that not only could have been expected but should have been expected.
Or to put as simply as possible – welcome to the world of backup.