- The US-East5-C area (Columbus, Ohio) by Google Cloud had a six-hour interruption
- It was caused by a failure in its uninterrupted power supply
- More than 20 services and some storage discs fell
Like the insinguable Titanic, he fulfilled his disappearance when he sank, Google Cloud has recovered from an important interruption caused by an interruption of its uninterrupable feeds (UPS).
The company confirmed that its US-East5-C area, also known as Columbus, Ohio, experienced a “degraded or not available” service for a period of six hours and 10 minutes on March 29, 2025, blaming it for a “loss of power of public services in the affected area.”
More than 20 cloud services suffered reduced performance or inactivity time as a result of interruption, including Bigquery, Cloud SQL, Cloud VPN and Virtual Private Cloud.
Google’s uninterrupted power supply only had a fairly important failure
In its incident report, the company explained exactly what had happened: “This blackout triggered a waterfall failure within the uninterrupted system of the power supply (UPS) responsible for maintaining energy to the area during such events.”
“The UPS system, which is based on batteries to close the gap between the loss of public services energy and the activation of the generator’s power, experienced a critical battery failure,” the record continues.
Google’s Columbus area uses Powerful Intel Chips such as Broadwell, Haswell, Skylake, Cascade Lake, Sapphire Rapids and Emerald Rapids, as well as AMD EPYC Roma and Milan processors to feed their cloud computing services. The cloud giant also pointed out that “a limited number of storage discs within the area was not available during the interruption.”
The engineers were aware of the interruption at 12:54 PT on March 29, without overlooking the failed UPS and the power restored through the generator at 14:49 PT. Most of the services were put back online, from then on, but some manual action for a complete restoration was required, hence the interruption of six hours.
Google Now promises to learn from this event, tightening the energy failure of the cluster and the recovery routes and the audit systems that did not automatically made a commutation by mistake, as well as working with their UPS provider to mitigate future incidents.