When you purchase through links on our site, we may earn an affiliate commission. Here’s how it works.

Google's 'uninterruptible' power supply ironically interrupted Cloud with a six hour outage

The Google Cloud logo on a grey background

Neowin readers are probably quite familiar with all the outages and downtimes that Microsoft 365 and its related services often face. For example, last week, the M365 went down on the 9th due to an Exchange Admin Center (EAC) outage, and only a day later, users found themselves locked out of family subscriptions as a consequence of a bug.

Similar to Microsoft, Google Cloud also faces outage issues like these from time to time and towards the end of last month, that is what had happened, as Google's uninterruptible power supply (UPS) system failed to deliver the uninterrupted power that it was built for, leading to near-six and a half hour outage. The problem occurred in the "us-east5-c" zone, which is in Columbus, Ohio, and the zone comprised systems built on AMD EPYC and Intel Xeon processors.

Google has explained when and why it happened in its support article and has also detailed the scale of the issue:

On Saturday, 29 March 2025, multiple Google Cloud Services in the us-east5-c zone experienced degraded service or unavailability for a duration of 6 hours and 10 minutes.

..

The root cause of the service disruption was a loss of utility power in the affected zone. This power outage triggered a cascading failure within the uninterruptible power supply (UPS) system responsible for maintaining power to the zone during such events. The UPS system, which relies on batteries to bridge the gap between utility power loss and generator power activation, experienced a critical battery failure.

This failure rendered the UPS unable to perform its core function of ensuring continuous power to the system. As a direct consequence of the UPS failure, virtual machine instances within the affected zone lost power and went offline, resulting in service downtime for customers.

The power outage and subsequent UPS failure also triggered a series of secondary issues, including packet loss within the us-east5-c zone, which impacted network communication and performance. Additionally, a limited number of storage disks within the zone became unavailable during the outage.

Google has also explained how it fixed the issue:

Google engineers diverted traffic away from the impacted location to partially mitigate impact for some services that did not have zonal resource dependencies. Engineers bypassed the failed UPS and restored power via generator by 14:49 US/Pacific on Saturday, 29 March.

The majority of Google Cloud services recovered shortly thereafter. A few services experienced longer restoration times as manual actions were required in some cases to complete full recovery.

Credit where credit's due, the tech giant has thoroughly apologised for the incident to its Cloud customers and has also outlined the steps it has taken to prevent such an issue in the future:

To our Google Cloud customers whose services were impacted during this disruption, we sincerely apologize. This is not the level of quality and reliability we strive to offer you, and we are taking immediate steps to improve the platform’s performance and availability.

...

Google is committed to preventing a repeat of this issue in the future and is completing the following actions:

  • Harden cluster power failure and recovery path to achieve a predictable and faster time-to-serving after power is restored.
  • Audit systems that did not automatically failover and close any gaps that prevented this function.
  • Work with our uninterruptible power supply (UPS) vendor to understand and remediate issues in the battery backup system.

Google is committed to quickly and continually improving our technology and operations to prevent service disruptions. We appreciate your patience and apologize again for the impact to your organization. We thank you for your business.

You can find full details about the problem in the support article here on Google's Cloud status website.

Report a problem with article
Modernized Windows Hello
Next Article

Microsoft confirms Windows Hello issues in latest Windows 11 updates

Office Logo
Previous Article

Microsoft makes it harder to enable ActiveX in Office apps to improve security

Join the conversation!

Login or Sign Up to read and post a comment.

0 Comments - Add comment