Meet the new cloud category – Marketplace-as-a-Service (MaaS)
Exploring Cloud SLAs: Amazon vs Rackspace
Cloud research results (part 3 of 3): application test/dev and load spikes top workloads
Cloud research results (part 2 of 3): SMBs, Enterprises, and VARs/SIs are top target markets
Cloud research results (part 1 of 3): lots of interest to offer yet few plans
The mainstream media’s coverage and interest in cloud computing has been an exciting boon to our industry, although I’m not sure if I should laugh or cry every time I see an article mentioning an outage of a notable public cloud. I suppose it’s inevitable that with increased attention comes increased scrutiny and as they say in the newspaper business “if it bleeds it leads” – that seems to be the approach many journalists have taken to covering cloud outages.
The reality is networks crash. Routers and hard drives fail. While all data center operators and hosting/cloud providers take measures to build in various layers or redundancy, outages happen. This has and always will be the case for hosted services. In the end, this is why service level agreements (SLAs) exist and what truly matters is a provider’s transparency, customer service and credits.
To this end, I thought it would be an interesting exercise to compare the two leading infrastructure cloud services SLAs from Rackspace and Amazon. Although this meant coping with flashbacks from the 84-page report I wrote for Tier 1 Research back in 2001 on this very topic comparing the SLAs of the top 20 MSPs of the era, and I found it to be rather illuminating. Here’s a quick reference table:
|
|
Rackspace – Cloud Servers |
Amazon – EC2 |
||||||||||||||||||
|
Uptime / Availability Guarantee |
100% |
99.95% |
||||||||||||||||||
|
Time span |
Current period |
“service year” or the preceding 365 days |
||||||||||||||||||
|
Time-to-resolution |
1 hour |
Not specified |
||||||||||||||||||
|
Credits |
· 5% of the fees for each 30 minutes of network or data center downtime, up to 100% of the fees · 5% of the fees for each additional hour of downtime past time-to-resolve, up to 100% of the fees |
10% of bill per eligible credit period |
||||||||||||||||||
|
Notification onus |
Customer |
Customer |
||||||||||||||||||
|
Window |
30 days after incident |
30 days after incident |
||||||||||||||||||
|
|
Rackspace – Cloud Files |
Amazon – S3 |
||||||||||||||||||
|
Availability |
99.9% |
99.9% |
||||||||||||||||||
|
Definition |
(i) The Rackspace Cloud network is down, or (ii) the Cloud Files service returns a server error response to a valid user request during two or more consecutive 90 second intervals, or (iii) the Content Delivery Network fails to deliver an average download time for a 1-byte reference document of 0.3 seconds or less, as measured by The Rackspace Cloud's third party measuring service. |
“Error Rate” means: (i) the total number of internal server errors returned by Amazon S3 as error status “InternalError” or “ServiceUnavailable” divided by (ii) the total number of requests during that five minute period. We will calculate the Error Rate for each Amazon S3 account as a percentage for each five minute period in the monthly billing cycle. |
||||||||||||||||||
|
Credits |
|
|
Key Takeaways:
· It was quite interesting to see the different approaches taken by the two. On the Cloud Servers side, Rackspace service guarantee of 100% uptime is a long-standing marketer’s tactic, which simply means they will pay for any down time that does occur. Amazon on the other hand has a more realistic guarantee of 99.95%, which actually translates into just over 4.3 hours of non-scheduled downtime a year.
· The fact that Rackspace specifies a time-to-resolve guarantee and offers credits if it misses speaks to its heritage as a managed hoster first. This is something Amazon can and should put into place.
· In terms of credits, Amazon caps their credit at 10% per period compared to Rackspace which will ultimately provide 100% credit if it is warranted. Separate but related on Storage, Amazon caps the credit at 25% whereas again Rackspace offers up to 100% if warranted. Again, Rackspace's approach is much more customer-centric.
· Both Rackspace and Amazon put the burden of an SLA violation notification and credit request on their customer. This would be my biggest critique of both firms’ SLAs. It is something I advocated back in 2001 and nothing has changed: customers are already frustrated by an outage – why make them bear the administrative responsibility to prove the outage existed and chase you down for the credit owed? Other cloud providers may want to consider offering an automated credit function when outages occur (think Orbitz.com if someone books the same room as you at a lower price) as a way to differentiate from the market leaders.
In conclusion, while auto-paying for SLA violations would be an improvement for both Amazon and Rackspace, I also recommend all cloud service providers post their uptime front-and-center on their website. At the end of the day, the credit issued for downtime (even if it was automatic) tends to pale in comparison to the acute frustration and anxiety the customer is experiencing. Ultimately, the best SLA would be “our CEO will fly to your office personally to explain what happened” but we all know that just isn’t scalable.
Source Links
http://www.rackspacecloud.com/legal/sla
http://aws.amazon.com/ec2-sla/






















Comment anonymously or log into your WHIR account
Logging in allows enhanced commenting features (such as external linking) in news, features, blogs and more.
Comment by Anonymous on Tuesday, February 02, 2010
Has anybody tried looking at UKFast Cloud?? They have won ISP of the year for the past 4 years, offer 100% connectivity, have freephone 24/7/365 support and are completley British based.
And have not had any outages (which Amazon and Rackspace seem to suffer from on a monthly basis!)
Comment by Anonymous on Friday, February 05, 2010
The biggest issue I see is the complexity of monitoring the SLA from a customer vantage point. Amazon's is virtually impossible to monitor, specially if you consider that "you" make S3 requests from many, many servers, often your customers' browser points directly at your S3 objects. It's impossible to aggregate all this. Your suggestion to auto-credit is of course the right one, but your write-up doesn't highlight that the current situation is as good as no credit. -Thorsten, CTO RightScale
Comment by Joshua Beil on Sunday, February 07, 2010
Thanks for the great comment Thorsten. While there are 3rd party services out there like Nimsoft and Cloudkick, you are absolutely right that if a customer does not have the tools to monitor an outage, they will never be able to ask for their credit, making current credits from any cloud service provider taking this approach almost the equivalent of no credits!
Comment by Anonymous on Tuesday, February 16, 2010
Thanks! Very helpful. Added links to your post in our analysis of Amazon and Rackspace at www.cloudTP.com.
Comment by Anonymous on Tuesday, March 02, 2010
I'd love to see the same comparison with appnexus.com too.