Read the latest issue of WHIR Magazine or subscribe to receive it FREE!

365 Main Recovers From Downtime

By theWHIR.com , August 15, 2007

365 Main Recovers From DowntimeBy Justin Lee, theWHIR.com

August 15, 2007 -- (WEB HOST INDUSTRY REVIEW) -- A data center's reputation has become synonymous with its reliability, which is built almost entirely on its ability to maintain 100 percent uptime, and thus, assure clients that their service will be available 24 hours a day, 365 days a year.

For this reason, data centers spend millions of dollars a year updating their back-up power generators and related equipment to ensure that their clients are never without power in the event of a power failure or unforeseen disaster.

So when data center operator 365 Main's (365main.com) San Francisco facility failed to start on July 24 during a PG&E power outage, resulting in 40 percent of customers in the facility losing power to their equipment for up to 45 minutes, no one was more concerned than 365 Main itself.

"There's no question that this was a difficult experience for us and for our customers," Miles Kelly, vice president of marketing at 365 Main. "We take operations very seriously. We talk about being the world's finest -- we do have a track record of 99.9967 percent across the entire portfolio, so any downtime we take very seriously."  The outage occurred after the transformer breakers at a local PG&E power station inexplicably opened. Power outages normally trigger 365 Main's back-up diesel generators to start-up and take over providing power supply to customers, however, three of 365 Main's 10 back-up power generators, manufactured by Hitec, failed to complete their start sequence.

The Hitec units are strenuously tested and inspected, Kelly says, on a daily, weekly, monthly, quarterly, semi-annual and annual basis, with the documented information made available to 365 Main customers for review. These same affected units handled perfectly in recent inspections prior to the incident.

Within hours of the outage, 365 Main sent a team of Hitec specialists to the San Francisco data center facility to join on-site technicians and begin systematically testing the generators to find the root cause.

Finally, after days of thorough testing, the team found a weakness in an essential component of the back-up generator system known as a Detroit Diesel Electronic Controller, which prevented the component from correctly resetting its memory. The invalid data left in the DDEC's memory then caused misfiring or engine start failures when the generators were called on to start during the power outage. The investigation team fixed the issue by altering the timing of a command to the DDEC component, allowing more time between the engine shut-down command and the DDEC reset command. Once this fix was introduced, the Hitec generators successfully passed more than 50 consecutive start-up sequence tests without incident. "This particular outage revealed this particular weakness, and that is something that has been addressed," says Kelly. "There is no such thing as 100 percent-uptime data center, but we are doing everything we can to achieve that."

365 Main has performed the DDEC fix in both its San Francisco and El Segundo facilities ? the only two facilities in its portfolio with Hitec generators containing DDECs.

The data center operator is also sharing the discoveries of its investigation with other Hitec customers. Meanwhile, Hitec has expanded its preventative maintenance procedures as a direct result of discoveries made during the investigation. Following the outage, 365 Main published an apology to its customers as well as provided daily updates directly from the investigation team meeting minutes, enabling customers and the public to track progress.

All of the affected 365 Main customers received refunds for any dropped electrical power from their servers during the outage under their 365 Main service level agreements. The company also launched an extensive customer outreach program where it met with the CEO's of its customers in order to prove their credibility as a reliable data center operation, says Kelly.

"To best deal with the outage we have been very forthcoming in acknowledging the seriousness of what happened, letting people know all the information we had, day by day," says Kelly. "No question our reputation is at risk and we are doing everything we can to show that this particular problem has been taken care of."

  • (0) Comments

Comment anonymously or log into your WHIR account

Logging in allows enhanced commenting features (such as external linking) in news, features, blogs and more.

User:

Pass:

(reset password)

Don't have an account yet? Register now!


 

Read Back Issues of WHIR Magazine

October 2009 - Web Hosting's All Star Team
This has been, for us, one of the most interesting, exciting and challenging build-ups to an issue of the magazine yet, Web Hosting's All Star Team. The balloting process was our first experiment with a kind of user participation we're planning to do a lot more with in the months to come. We had thousands of ballots submitted, with hundreds of write-in suggestions and a demonstration of user engagement that has us feeling super positive about the project.
About This Issue | Read Digital Edition

July 2009 - What am I Worth?
One of the interesting luxuries of working on a project like the printed WHIR magazine is that it allows us to play with things like our point of view from one issue to the next. In recent months we've been giving added attention to the kind of practical and applicable advice aimed at smaller hosts and resellers. This issue carries on with that point of view, asking, in our cover story, "what am I worth?" It's a complicated question without a clear-cut answer.
About This Issue | Read Digital Edition

May 2009 - The Blueprint for a Small Web Host
I was a little surprised by how difficult it became to see this idea through. We set out to assemble a blueprint for a small hosting business, but butted up pretty quickly against the general impossibility of covering all the territory that was out there to be covered. The basic constraints of a printed magazine, and the less-than-infinite amount of time we had available forced us to face the fact that we could never produce an exhaustive guide to starting a hosting company.
About This Issue | Read Digital Edition

Read more WHIR Magazine back issues