TCLP 2010-04-28 Rant: NoSQL

This is a feature cast, an episode of The Command Line Podcast.

In the intro, thanks to Josh for his donation and a call for ideas for a premium for larger donations. I’m looking for something as unique and distinctive as the merit badges that would be appropriate for $50 or more and monthly donations of $5 or more.

There is no listener feedback this week.

The hacker word of the week this week is fat-finger .

The feature this week is a rant where I try to get to exactly what it is that bugs me about NoSQL. In it, I refer to my hacking 101 piece on databases.

[display_podcast]

Grab the detailed show notes with time offsets and additional links either as PDF or OPML. You can also grab the flac encoded audio from the Internet Archive.

Creative Commons License

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 United States License.

4 Replies to “TCLP 2010-04-28 Rant: NoSQL”

  1. Just some feedback on ORM vs NoSQL.

    I don’t disagree with a lot of your argument in favor of RDBMSes, but there are a few points that I have personally encountered that I would like to bring up.

    First, Relational and ORM systems certainly provide a flexibility of use for data the NoSQL lacks. I think a lot of your points boil down to up front design of the data space for an application. If you do this work diligently, there is a lot of flexibility that comes with it.

    However, having dealt with Google App Engine and porting applications, you get an intimate detailed account of how your ORM interacts with a large NoSQL database in a way that you rarely see with RDBMSes. ORM becomes, in the traditional sense, a layer of abstraction, that you have to understand in order to effectively develop an app. With less structured , or object oriented databases, you understand it at a more fundamental level. And “queries” mean more what they mean in real world cost than in an ORM system.

    Systems like GAE which attach a real monetary cost to calls disillusion a lot of programmers about not just what the trivial cost of their queries are, but what the mapping costs are in CPU time.HIbernate or TopLink are great for bootstrapping an application. Yes, you can tweak them for performance. But something like App Engine with its mapping of traditional ORM to a NoSQL system can expose performance problems the traditional developer might not see.

  2. There were few mistakes on perception of nosql databases; First of all, the advantage of nosql is not that it does something SQL databases “cannot do”. It does distribution of data out of the box, that is, it is so simplified, ingrained in the product that you don’t even think twice about them. But with SQL databases, sharding, distribution is an afterthought. Not that you cannot DO these with SQL databases, it’s just that with nosql these tasks are SIMPLER. Included in the product from day one.

    There are pedagocial issues at play here, which are almost as important as technological ones.

    Same is true for basic CRUD operations. They are SIMPLER with nosql than they are with sql dbs. With Google Bigtable, I define Model classes in Python, send them over to the cloud, and I _have_ a database. Following through pointers, as in order.owner.address.street is very simple to do, and built-in, in contrast to SQL databases where you have to use something like Hibernate to achive the same result.

    Plus, nosql makes you concious of sharding of data from day one; since joins are discouraged, you think distribution, and you have to think big. Sure, for small Web sites, small # of users you can use one database, and keep using joins, but you can also use one nosql shard, and use LESS complicated query (meaning no joins) and achieve same result.

  3. More comments on this issue: Can SQL databases become like nosql databases if they were designed to be that way? Sadly no. There are certain expectations we have when dealing with SQL databases. Logically, a TABLE is something that is “appended to” without delay / conflict, and joins bring back results pretty fast.

    However, when a database grows beyond a certain size, we all know the rules change. If we use some vendor specific way to distribute a database (which is a problem unto itself), then we kinda know in the back of our mind that joins are now okay anymore. And there is the problem of that distribution; how does it take place? Which vendor specific gizmo shabang takes care of it?

    With nosql databases, the unit of distribution is the object. We apply a mod() function to the key, and we end up with a shard, and that is where the data goes for that key. SImple. We _know_ this takes place for that (any) key. It is out of the box functionality.

    The problem with SQL databases is that we (the developers) are expected to bow in front of the relational gods for our daily needs, but then, when things get too big, we need to think “different” and pray differently. Well; nosql dbs say, you think big from day one, and small is taken care of, and distribution is ok.

    It doesn’t matter if I have a small database for my application XYZ; if I am hosting this app on the cloud, then I am _already_ part of a BIG database, making use of facilities that were beyond my reach previously.

    My point on pedagogical issues are related with this small / big divide: It is easier to learn one set of technology that handles big and small the same way, then to learn a technology that handles small fine (in a complicated way -sql, barf-) and big, in a totally vendor specific way (double barf). Things like Cassandra, BigTable say we handle big and small _same_ way, and on the cloud, well – even backups, failover, etc. etc. is taken care of.

    I think any SQL database that tries to copy this, will end up being a nosql database itself.

Mentions

  • Database Joins, Reddit, NoSQL « Bitratchet

Leave a Reply

Your email address will not be published. Required fields are marked *