Google's Spanner Database Offers Global Scale - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

05:13 PM
Connect Directly

Google's Spanner Database Offers Global Scale

Spanner, Google's latest database technology, relies on atomic clocks to ensure the consistency of distributed data.

Big Data Talent War: 10 Analytics Job Trends
Big Data Talent War: 10 Analytics Job Trends
(click image for larger view and for slideshow)
Google has released details of a global distributed database called Spanner, which the company claims is the first system to distribute data globally while supporting consistent distributed transactions.

Google began deploying Spanner in 2011 as part of an effort to rewrite F1, its internal advertising backend. A Spanner deployment, as befits Google's ambitions, is called a universe. The company has three universes: a test/playground universe, a development/production universe, and a production-only universe.

F1 was originally based on a MySQL database that handled tens of terabytes of data. Its initial structure involved a sharding scheme--a way to divide the database--that assigned customer data to a fixed shard. This allowed the use of indexes and sophisticated processing of queries, but proved difficult and costly when resharding--a resource-intensive manual process--was necessary to accommodate database growth.

Because of the complexity of resharding F1 data, Google's engineers began trying to limit the growth of the F1 MySQL database by storing data using Bigtable, a NoSQL data storage technology that the company developed in 2004 and continues to rely on. But Bigtable limited both the transactional data visibility Google needed and the ability to query across the entire data set.

[ What's new with OpenStack? Read OpenStack Poised For Cloud Leadership. ]

Google has always put a premium on the ability to operate at scale and Spanner represents a continuation of that goal.

"Spanner automatically reshards data across machines as the amount of data or the number of servers changes, and it automatically migrates data across machines (even across datacenters) to balance load and in response to failures," Google's research paper explains. "Spanner is designed to scale up to millions of machines across hundreds of datacenters and trillions of database rows."

Google fellow Jeff Dean, one of more than two dozen Google engineers credited as co-authors of the paper, described early work on Spanner back in 2009 at the third ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware (LADIS).

Spanner provides applications with the ability to dynamically control where data resides--data stored close to the user can be delivered faster and data replicas stored near each other can be written to faster--and how many times data is replicated. The system can move data dynamically and transparently between data centers to balance resource usage. What's more, it can read and write data consistently on a global scale--something that is difficult for databases to do.

The key to Spanner is what Google calls the TrueTime API, which allows timestamps to be reliably applied to database operations around the globe. Without sufficiently accurate timestamps, network latency and clock variations can lead to synchronization errors in distributed systems.

To achieve a globally consistent time system, Google has deployed a set of atomic clocks and GPS hardware in every data center. Without redundant time references, Spanner would become unreliable, which isn't what Google wants with its revenue-generating ad system. The paper's authors argue that future distributed systems should no longer accept loosely synchronized clocks.

"This is more or less NoSQL throwing in the towel," said Jim Starkey, CTO of NuoDB, a company that makes a cloud database system with goals similar to Spanner, in a phone interview.

Google pushed MySQL to the breaking point, says Starkey, noting that the company's data was heavily sharded and didn't scale. NoSQL, in the form of Bigtable, wasn't the answer for Google's distributed ad system because it only promises eventual data consistency. When accessing data in real-time, he explained, the information might not be consistent throughout the system. NoSQL trades guarantees of atomicity, consistency, isolation, durability--the ACID properties that ensure data reliability during database operations--for performance in distributed systems at scale.

"Trying to get ACID transactions on Bigtable is a programming nightmare and too difficult for most programmers to do," said Starkey.

This is something Google itself acknowledges in its paper: "[W]e have also consistently received complaints from users that Bigtable can be difficult to use for some kinds of applications: those that have complex, evolving schemas, or those that want strong consistency in the presence of wide-area replication."

Starkey says Spanner is very interesting technology, but he's not sure whether it will be useful for other businesses that don't deal with the same distributed data challenges as Google. Even Google, he suggests, may not end up using it for its external-facing applications like Gmail or Picasa, because those apps don't have the same data access requirements as F1.

"The question is can Google really get the throughput that they need for their applications," said Starkey. "My take on it is that Spanner is quite complicated to configure, install, and manage. But it's really exciting to see another company going out there to deal with databases in a way that validates the assumptions we made."

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
2021 Outlook: Tackling Cloud Transformation Choices
Joao-Pierre S. Ruth, Senior Writer,  1/4/2021
Enterprise IT Leaders Face Two Paths to AI
Jessica Davis, Senior Editor, Enterprise Apps,  12/23/2020
10 IT Trends to Watch for in 2021
Cynthia Harvey, Freelance Journalist, InformationWeek,  12/22/2020
White Papers
Register for InformationWeek Newsletters
The State of Cloud Computing - Fall 2020
The State of Cloud Computing - Fall 2020
Download this report to compare how cloud usage and spending patterns have changed in 2020, and how respondents think they'll evolve over the next two years.
Current Issue
2021 Top Enterprise IT Trends
We've identified the key trends that are poised to impact the IT landscape in 2021. Find out why they're important and how they will affect you.
Flash Poll