Microsoft's Office 365 service experienced a rather lengthy outage earlier this week. On Monday and Tuesday, users in North America were faced with Lync and Exchange outages that lasted for several hours.
Microsoft said that the Lync and Exchange outages were unrelated, but another breakdown in the Service Health Dashboard meant that those who were affected were not being notified of the outage. It was a double hit for Microsoft: not only were core features offline, but the mechanism to alert users of the outage was failing as well.
Lync Online's drop off was caused by a brief loss of connectivity. When connectivity was restored, the backlog of traffic caused a significant spike in traffic and overloaded the remaining servers, which disrupted the service for some customers.
The Exchange issue was the result of a failure in a directory that caused a directory partition to stop responding to authentication requests. Microsoft said that this was a unique failure and that was the reason for the extended downtime with that platform.
As you would expect, Microsoft said that the issues have been fixed and that they have learned from this experience on how to avoid such scenarios again. While Office 365 has been stable (for the most part), the platform has historically had no issues with downtime of this length in the past.
Source: Microsoft
21 Comments - Add comment