amazon

Heroku, Pinterest Among Sites Knocked Offline in Amazon Data Center Outage

2 comments

An Amazon data center in Ashburn, Virginia suffered a power outage at 9:45 p.m. PDT on Thursday, causing some websites using AWS cloud technology to go offline.

High-profile websites like Heroku, Pinterest, Quora and HootSuite saw downtime, as well as many smaller sites.

Amazon resolved the issue within a few hours, which affected some customers in the US-East-1 Region, and regularly updated customers on the situation via the AWS status page.

The situation began at around 9:45 PDT, where shortly after the following message appeared on the Amazon status page in regards to Amazon ElastiCache in Northern Virginia:

“Some Cache Clusters in a single AZ in the US-EAST-1 region are currently unavailable. We are also experiencing increased error rates and latencies for the ElastiCache APIs in the US-EAST-1 Region. We are investigating the issue.”

While Amazon confirmed the outage on its Service Health Dashboard, it did specify the cause of the power outage.

The outage was the third significant downtime Amazon has experienced in the past 14 months for the US-East-1 region, following outages in April 2011 and March 2012. US-East-1 region is Amazon’s oldest availability zone, which resides in a data center in Ashburn, Virginia.

Although Amazon has multiple avalability zones, approximately 70 percent of AWS customers are concentrated in the US East region.

As always, angry Internet users flocked to social networking sites like Twitter where they criticized major website for not distributing their infrastructure across multiple Amazon availability zones.

Earlier this week, Amazon announced it is offering free basic support to all of its AWS customers and will cut the price of its premium support.

Talk back: How does your cloud computing provider deal with outages? Do you use your health dashboard to update customers to incidents like outages as they happen? What do you think of the latest EC2 outage? Let us know in the comment section.

Newsletters

Subscribe Now and Get Our Exclusive Report on "The Hosting Infrastructure Ecosystem"

Enter your email to receive messages about offerings by Penton, its brands, affiliates and/or third-party partners, consistent with Penton's Privacy Policy.

Related Forum Threads

About the Author

Justin Lee has been a staff analyst with theWHIR since 2004. He writes about a range of web hosting and IT-related issues facing the industry on the WHIR website, as well the print version of the WHIR magazine. Follow him on Twitter @Justin_theWHIR.

Add Your Comments

  • (will not be published)

2 Comments

  1. Justin I've posted my take on this outage on this post: http://natishalom.typepad.com/nati_shaloms_blog/2012/06/lessons-from-herokuamazon-outage-on-your-choice-of-a-paas-platform.html Lesson 1: Choose the Right PaaS for the Job Lesson 2: Database Availability Must Address Datacenter Failure Lesson 3: Coping with Failure, Avoiding a Single Point of Failure

    Reply
  2. We need to assume that failure like this are inevitable. Instead we need to design our system to cope with those failure. To reduce the chance for a failure of a particular cloud provider I would even go as far as setting DR sites on two different cloud providers. The following post on high scalability.com shows how you could easily migrate workload from Amazon to Rackspace in the event of scaling or failure event. That include both the data and the application workload. The post include a working example on github that shows how this setup work. See the full detailes here: http://highscalability.com/blog/2012/6/15/cloud-bursting-between-aws-and-rackspace.html

    Reply