Google admits that users of its Persistent Disks storage system have lost data
Share on Twitter.
Get the most reliable SMTP service for your business. You wished you got it sooner!
August 19, 2015
Late yesterday, Google has finally admitted that some of its customers running its Persistent
Disks storage system have lost data, and added that a combination of lightning and old storage disks
was to blame.
For now, Google says it's mostly users in the Europe-West-1-B region that appear to have
been affected the most.
The service outage hit last Friday and left some users totally unable to connect to Persistent Disks, a storage
system that exists independently of a virtual machine.
The issue lasted for several hours, and problems persisted across the weekend. Google has now published
its analysis of the outage and says that on August 13th, “four successive lightning strikes on the electrical
grid of a European datacenter caused a brief loss of power to storage systems which host disk capacity
for GCE instances in the Europe-West-1-B zone.”
“Although automatic auxiliary backup systems restored power fairly quickly, and the storage systems
are designed with battery backup, some recently written data was located on storage systems which were
more susceptible to power failure from extended or repeated battery drain,” Google admitted.
“In almost all cases, the data was successfully committed to stable storage, although manual
intervention was required in order to restore the systems to their normal serving state. But in
a few cases, recent writes were unrecoverable, leading to permanent data loss on the Persistent Disk
About five to six percent of disks in the data centre recorded “at least one I/O read or write failure”
during the incident.
Overall, read failures persisted into Monday for about 0.05 percent of its users, and Google now
says that about 0.000001 percent of disk space has proved impossible to recover.
Several customers were understandably inconvenienced by this mishap, and a few voiced their
“This outage is wholly Google's responsibility,” the document continues, but then goes on to say “to
highlight an important reminder for our customers-- GCE instances and Persistent Disks within a zone
exist in a single Google datacenter and are therefore unavoidably vulnerable to datacenter disasters.”
In other words, should lightning strike twice, you should remember that a single datacentre can't beat two.
“Full data protection, integrity and redundancy is critical to most business operations, and a disaster recovery
solution is absolutely a must in these conditions,” says Jonathan Price, vice president of data center
technology at Sun Hosting, a major data center
services provider and disaster recovery specialist located in Montreal, Canada.
But Google's confessional also says the company “has an ongoing program of upgrading to storage
hardware that is less susceptible to the power failure mode that triggered this incident. Most Persistent
Disk storage is already running on this hardware.”
Google adds that it's conducted a review of the incident and that “several opportunities have been identified
to increase physical and procedural resilience.”
Get the most dependable SMTP server for your company.
Share on Twitter.