Information Technology News.

The end of NoSQL databases could be near

Share on Twitter.

Sponsered ad: Get a Linux Enterprise server with 92 Gigs of RAM, 16 CPUs and 8 TB of storage at our liquidation sale. Only one left in stock.

Sponsered ad: Order the best SMTP service for your business. Guaranteed or your money back.

March 21, 2017

We keep reading more and more that the end of NoSQL databases could be near and for good reason. There's been quite a bit of negative news about the implementation in the last five to six months and things appear to be getting worse, not better.

To be sure, the commercial release by Google of its Spanner database as a public beta in February came as both a pleasant surprise and a wake-up call.

Overall, some say that Google's Spanner DB could represent the culmination of the so-called NewSQL movement, which is a rejection of NoSQL inherent values and a return to the old attributes of SQL, ACID transactions and good old relational (RDB) tables.

Let's be frank here. NoSQL has a few crazy interface languages, bizarre data structures and distributed data tables that aren't what experienced DB admins are used to work with in a production database. But that's not all.

For its part, Spanner promises a brave new alternative, while still keeping to old standards that allow a distributed database to seemingly break CAP theorem proposed by UC Berkeley computer scientist Eric Brewer in the late 1990’s.

The theorem itself is simple to understand, but often misunderstood. It simply states that any distributed system such as a database can only guarantee two of the following-- Consistency, Availability and Partition Tolerance.

Basically, if you have two or more computers in a system, if there is a break in communications then your system can be inconsistent (each computer giving different answers) or not available to answer any queries.

NoSQL systems generally fall into one of two categories: either they don’t answer if there is a partition break (MongoDB for instance) or they let nodes on either side of the partition give a different answer (Cassandra for instance). And ask most experienced DB admins and they will let you know that that's unacceptable.

If you always need both the same answer across the whole system and always to be available, you can’t let a partition break. That's common sense.

Traditional relational database system do this by only having one master, the keeper of the truth and slaves which keep copies.

Spanner is seen as the solution for this problem: a system that is available, consistent and 100 percent reliable. But is it?

Eric Brewer of CAP theorem is now employed by Google and while not directly involved in the Spanner project itself, a whitepaper from Brewer makes it very clear that while Spanner does not break the CAP theorem, it is also not 100 percent reliable.

So does that pose an issue? Not really, since Spanner is just so available it might as well be 100 percent reliable. But read on.

The reason for this is that Google owns the entire infrastructure running Spanner and there is no other data on the Spanner network other than Google’s data.

Spanner has availability of 99.9999 percent, which means as a customer, you can treat it as a system that will always be consistent and available. You can treat it just like your reliable relational database.

However, there will be the occasional partition which will involve Google engineers to scratch their heads and in that case, because of the way Spanner works, onside of the partition will be fine and carry on as usual, while the other side will be unavailable. It's the nature of the beast.

But it’s possible that both sides will be able to read data, if you have access to the network of course. So far, so good, but there are some potential problems with this, and that's what you need to be aware of.

One is caused by the actual method in which Spanner implements distributed transactions by utilizing a system called Paxos.

What it does is that Paxos implements transactions through the use of “group leader” and periodic elections in the system for this leader. This can cause an issue if the leader fails. You might need to wait out for a new election to happen before transactions can continue, or the leader might be restarted and you will need to wait for that...

Another potential problem is the fact that Spanner isn't a true relational database per se, but it’s a key-value store in semi-relational tables. Each row must have a name and each table must have an ordered set of primary keys based on these names.

This has a direct effect on the SQL-like language that is used to interact with Spanner-- it’s very similar to SQL but different enough to cause problems for experienced SQL users, a bit of what we said earlier.

In particular, when creating tables, the DB admin must define how tables are “interleaved” to describe the locality relationships between multiple tables. If you get this wrong then there is a price to be paid in terms of performance-- your system just won’t work as fast as you need it, especially if you have a globally distributed system.

Google does admit this in its original whitepaper, saying that there is room for improvement in the way Spanner handles complex SQL queries and the problem lies in the way each node handles data.

But maybe (we are hoping) this has improved since the original paper. Time will tell. But for now, Spanner does have some useful features up its sleeve thanks to the use of Google’s TrueTime.

This is an implementation of older synchronized clocks using first gen GPS receivers and atomic clocks in every data center. This can cause issues during a partition if a node can’t connect to a master. Its clock will drift, causing the election of Paxos masters to slow down.

But TrueTime does allow schema changes to be scheduled for a later date and for both schemas to run at the same time, with a change to the new schema at a later date.

That could certainly be helpful for organizations heavily invested in DevOps. Schema changes of database and roll backs in particular are always a major issue in such cases and in particular the roll back of bad schema changes. Running both at the same time would be a real plus in the near future.

However, there is good news. Google Spanner does represent a real improvement in distributed database systems. It’s not a direct replacement for relational SQL databases, though as it does not appear you will be able to simply port a SQL application onto Spanner.

There are still some changes to be made to the way data tables are defined and to the syntax of the SQL used to file and retrieve data, however, and that's something that Google needs to address soon.

But the real question is how many organizations actually need access to a globally scalable relational database? Whatever happens with that, we'll keep you in the loop.

Source: Google.

Sponsered ad: Get a Linux Enterprise server with 92 Gigs of RAM, 16 CPUs and 8 TB of storage at our liquidation sale. Only one left in stock.

Sponsered ad: Order the best SMTP service for your business. Guaranteed or your money back.

Share on Twitter.

IT News Archives | Site Search | Advertise on IT Direction | Contact | Home

All logos, trade marks or service marks on this site are the property of their respective owners.

Sponsored by Sure Mail™, Avantex and
by Montreal Server Colocation.

       © IT Direction. All rights reserved.