Get the lowest-cost and the best server colocation service in the business. Learn more.
Information Technology News.

Human error causes Google cloud service outage for over an hour

Share on Twitter.

Get the best SMTP service for your business. Guaranteed or your money back.

November 30, 2015

Human error and a mistaken peering advertisement from a European network took Google Cloud's Europe-west1 region offline last week for about 72 minutes.

The blunder happened when an unnamed network owner connected a new peering link to Google, and in the process, it advertised DNS reachability for a lot more traffic than it could possibly handle.

As a direct result and as Google explained to us later, most of the lost traffic carried destination addresses in eastern Europe and the Middle East.

“The peer's network signalled that it could route traffic to many more destinations than Google engineers had anticipated, and more than the link had capacity for.

Google's network responded accordingly by routing a large volume of traffic to the link. At 11:55, the link saturated and began dropping the majority of its traffic”, Google said.

That kind of error, Google's report continues, would usually be detected by automated safety checks, but “the automation was not operational due to an unrelated failure, and the link was brought online manually, so the automation's safety checks did not occur as they should have”.

“To prevent a recurrence of this problem, Google network engineers are changing the procedure to disallow manual link activation”, Google asserted.

Route announcement errors are a growing and recurring headache on the Internet, and has been for a long time.

For instance, in June of this year Telekom Malaysia mis-advertised internet routes to Level 3 Communications, causing the U.S. provider to route most of its traffic onto a network that couldn't cope with the hugh amount of increase in traffic and dropped over 98 percent of the packets.

But there are times when a mis-routing makes networking engineers suspicious that it was deliberate, such as in 2010 when China Telecom hijacked U.S. military and government traffic via BGP, and there has been several such incidents that were reported since.

BGP's problem is that the protocol trusts route announcements, putting a premium on trustworthy network operators and robust processes. But as we all know now, this isn't always the case, and network and system admins need to be more vigilant to prevent such blunders from happening.

Source: Google.

Get the best SMTP service for your business. Guaranteed or your money back.

Share on Twitter.

IT News Archives | Site Search | Advertise on IT Direction | Contact | Home

All logos, trade marks or service marks on this site are the property of their respective owners.

Sponsored by Sure Mail™, Avantex and
by Montreal Server Colocation.

       © IT Direction. All rights reserved.