Amazon’s northern-Virginia-based EC2 data center site experienced a sudden outage a week ago yesterday, putting a long list of clients offline including the popular online destinations Quora and Reddit.
I personally thought it was interesting to see the initial comments from one of these sites in particular. Quora, a site that acts as a massive question-and-answer platform, posted this on their website: “We’ll be back shortly, we hope. Sorry, it sucks for us too. We’d point fingers, but we wouldn’t be where we are today without EC2.”
At the time, they weren’t aware that the problems would persist for another six days. During that time, Amazon worked around the clock to restore services, releasing continual updates here, and just today, they have released a post mortem report, laying out the details of the outage, including causes and circumstances.
The specific cause was cited as complications arising after a “network change” was performed on the 21st: “During the change, one of the standard steps is to shift traffic off of one of the redundant routers…[but] the traffic shift was executed incorrectly.”
There really isn’t any argument that Amazon is a pioneer in the cloud computing arena with its EC2 service. Their formula has helped many small, medium, and large websites experiencing burgeoning traffic volumes find flexible computing solutions to match their growth, such as Dropbox, Quora and Reddit. Quora’s comment highlights the general appreciation their customers have felt for the much-needed cloud service.
The way Amazon has handled this latest setback appears to have involved a diligent, open-book dialogue with their customers, and that’s also worth noting when the temptation comes to dish out criticism. They’ve also embarked on the rocky waters of cloud-based music storage and video streaming, stepping on a number of toes in the process, but proving their gutsy ingenuity and commitment to an immersive entertainment/convenience experience for their customers.
Amazon will recover from the outage. The saddest part of the issue is that the customers who wholly rely on their websites for business will bear the brunt of those 7 days of downtime. That can easily amount to millions of unrecoverable dollars. Just like a restaurant couldn’t recoup the losses incurred over a weeklong closure, these companies are simply out 7 days’ of possible revenue. All the people who would have eaten there during the week have gone elsewhere for their meals. Maybe they’ll be hungry again in the future, but anything that might have been gained in that week is gone for good.
According to Amazon’s post mortem report, they’ll be issuing a service credit to affected customers as well as improving their communication and service tools for future operational issues. The big question now is: is that enough?