serverexpansion

Admin Error Brings Down Joyent’s Ashburn Data Center

1 comment

datacenterknowledgelogo

Brought to you by Data Center Knowledge

Joyent, a San Francisco-based provider of high-performance cloud infrastructure services, saw one of its data centers go down Tuesday as a result of an error made by an administrator. The company had to reboot all servers in its US-East-1 data center, located in Ashburn, Virginia.

The provider has not released information on what exactly caused the outage, but is promising a “full postmortem.” In a forum post on Hacker News, Joyent CTO Bryan Cantrill wrote that the company would be providing the information “as soon as we reasonably can.”

Cloud outages sting more than others

Outages of service provider data centers cause a lot more damage than enterprise data center outages do because they host infrastructure for many companies instead of one. Cloud data center outages are especially painful because each physical server may be a host to multiple customers’ virtual compute nodes.

Another service provider, Internap, which offers cloud hosting services, experienced three outages at its New York City data centers during the past two weeks. The company did not say how many customers the outages affected overall, but at least 20 companies were affected by one of incidents.

Internap’s problems were caused by electrical equipment failure. This kind of an outage is different from Joyent’s. Internap’s outage happened at the facilities layer of the stack, while Joyent’s incident happened at the IT administration level.

‘Fat finger’ shouldn’t hurt so much

While human error was at fault, Joyent’s system ideally would have been built to withstand such errors. “While the immediate cause was operator error, there are broader systemic issues that allowed a fat finger to take down a data center,” Cantrill wrote, adding that the company would be improving software and operational procedures to prevent such incidents from happening in the future.

Joyent does not plan to discipline the administrator that made the error, Cantrill told The Register, explaining that the company was more interested in learning from the incident than punishing people.

Joyent provides public and private cloud infrastructure services for companies that need more computing horsepower than the mainstream Infrastructure-as-a-Service providers, such as Amazon Web Services, can offer.

In addition to the Ashburn data center, brought online in February 2012, its cloud infrastructure lives in data centers in San Francisco, Las Vegas and Amsterdam.

Original post appeared here: Admin Error Brings Down Joyent’s Ashburn Data Center

Add Your Comments

  • (will not be published)

One Comment

  1. That's refreshing to hear the gaffer publicly state no sackings but instead to learn from the mistake. Much better than having a blame culture where mistakes are covered up and can reoccur

    Reply