Microsoft said Thursday morning that it has restored Azure services affected by a storage incident that disrupted services in 26 of the public cloud’s 28 regions, according to its status page. Azure lists two separate incidents on its status history page, the global one beginning at 22:42 UTC on Wednesday and one confined to the US East region, which started at 21:50 the same day.
The global incident is characterized by Azure as a storage provisioning problem, potentially caused by a software error, which may have prevented customers from provisioning new resources, turning on Azure Monitor diagnostics, and using Visual Studio Team Services Build, among other services affected. The issue was mitigated as of 0:00 UTC on Thursday, and existing Storage resources were not affected, according to Microsoft.
The US East incident lasted more than eight hours before services were restored at 6:00 UTC Thursday, and during the outage many Azure services were affected, including Virtual Machines, Azure Media Services, Application Insights, Logic Apps, Data Factory, Site Recovery, SQL Database, and API Management. The preliminary root cause identified by Azure engineering is the unavailability of one Storage cluster due to power loss.
Microsoft says it plans to publish a detailed analysis of the US East incident’s root cause over the weekend.
Azure suffered an outage on “leap day” in 2012, which lasted for more than 12 hours, prompting Microsoft to refund all Azure customers one-third of their bill for the month. It also suffered a “partial performance degradation” in 2013 and a shorter service disruption in 2014. AWS suffered an S3 storage service outage weeks ago, almost 5 years to the day after the Azure “leap day” outage, which caused downtime and affected the page load speed for over half of the internet’s top 100 retailers.