Large scale persistent key value stores are rather fashionable right now.
For anyone who has missed this there is a survey from Richard Jones at last.fm of some of the main software projects out there – although missing some like Tokyo Cabinet – there is a lot in this area going on in Japan.
Scaling has suddenly become something that a lot of people have to do, and they are working in an open source world, so many of the things they build et released out into the world. Many of these projects come out of the big web sites, or are products for building scaling services like Amazon’s SimpleDB. Scaling a relational database above the point where replication alone works gets difficult, and replication only improves read performance while leaving a write bottleneck. Sharding is the usual approach, partitioning the data across multiple machines, but it removes the ability to perform joins (as these now cross machines). Once you lose that you have lost the theoretical basis of the relational model.
The key value stores that exist are more or less based on building persistent distributed high performance versions of basic data structures. Two dominate, the simple hash table, which is just a pure key value store, as with MemcacheDB which is built on Berkeley DB which is perhaps the oldest persistent key value store, but with a network interface not just a language API. These just give very fast lookup and storage of values associated with a key. Thats it, one single value. The other form is the B-tree based form, which generally add the ability to do range queries, so with careful selection of keys you can return multiple values from tanges, as in CouchDB. Pretty simple then.
Although relational databases are basically built on B-trees, there is a huge gap for most applications in using structure at this level. Normalisation has gone, so there goes that part of the storage design process. Some applications or parts of applications map easily, and these easy parts are where these technologies are getting most use now. Some people are using denormalisation now with database backends (Flickr stores data twice in some cases). So far there is no data model, so it is expensive custom work in each situation to map these calable storage models.
As a side note, there used to be a lot of discussion about integrating SQL into programming languages as a native data type. But the only language the relational model really works with is Prolog as that has a matching native structure. Key value stores match much better to native structures, but mainly because of the weaker expressive structure.
The relational database did well because the model had the right balance between implementation for people and for computers to optimize much of it. That is generally a hard balance. Key value stores are putting all the work back on the programmers right now. Some projects are starting to work on the next stages – CouchDB views get a few things right that other frameworks do not look at, but there is a lot of work to do here.