Storing data in the cloud might help protect your information from local power outages and hardware failures, but cloud service data centers aren't immune from similar problems . . . as Google learned last week. A series of four lightning strikes on the local power grid knocked out power to the Google Compute Engine (GCE) data center in St. Ghislain, Belgium, leading to a permanent loss of a small amount of recently stored data.
The lightning strikes that occurred on August 13 caused a "brief loss of power" to data storage systems serving Google Compute Engine customers in western Europe, according to a status report on the company's Cloud Platform Web site. Although backup power quickly came on, some recently written data that was more susceptible to power outages was permanently lost, Google said.
Google engineers were able to recover some lost data in the hours and days following the storm. However, information from "a very small number of disks" could not be restored. Google said the lost data amounted to "less than 0.000001 percent" of the space on allocated persistent disks in the region, although it's not clear how much actual data that entailed.
Working To Improve 'Contributory Factors'
"This outage is wholly Google's responsibility," the company said in its latest update on the Google Cloud Status page. "However, we would like to take this opportunity to highlight an important reminder for our customers: GCE instances and persistent disks within a zone exist in a single Google data center and are therefore unavoidably vulnerable to data center-scale disasters."
To reduce the risk of losing data to such local disasters, customers who need maximum availability should be prepared to switch operations to other Google Compute Engine zones and use snapshots and Google Cloud Storage to ensure "resilient, geographically replicated repositories" for their data, according to the company.
In its apology to customers, Google noted that its analysis of the incident identified "several contributory factors" in the data center's hardware and software, and was working to improve those to "maximize the reliability of GCE's whole storage layer."
Weighing Systemic Risks
Experts who work to protect information technology systems from failure -- whether caused by the technology itself, human error or "acts of God" like lightning -- generally recognize there is a point where added defenses see diminishing returns . . . in other words, some safeguards become too expensive or difficult to implement when weighed against potential failures with a minuscule likelihood of occurrence.
In today's increasingly interconnected world, however, even highly unlikely failures on a local scale can lead to far more serious and widespread impacts. That concept, known as "systemic risk," is the focus of growing research in IT, finance and other areas.
Systemic risks "are characterized by the possibility that a small internal or external disruption could cause a highly non-linear effect, including a cascading failure that infects the whole system, as in the 2008-2009 financial crisis," noted a report earlier this year from the Global Challenges Foundation.
In a Hacker News discussion today about the Google data center incident, one commenter noted, "once you get into several nines of reliability, really rare events that are impossible to model start to dominate your risk budget."
Another commenter added, "Assuming 1 petabyte of total storage at the data center, that equates to about 100mb. I wonder how much storage they have there."
Opened in September 2010, the GCE data center in Belgium was built for €250 million (around $280 million) and underwent a €300 million ($336 million) expansion between 2013 and early 2015. The energy-efficient facility was Google's first to run entirely without refrigeration, instead employing advanced evaporative cooling using gray water from a nearby industrial canal.
Posted: 2015-08-29 @ 6:57pm PT
Companies throughout Florida have learned to have generator backup in such times of need.