Four types of NoSQL Databases you should learn now
So you’ve heard about Mongodb, Couchdb, Cassandra, Riak, REDIS, Neojs, InfiniteGraph, Voldemort, IBM Cloudant as new classes of Databases but you’re not sure what the differences are and how they’re different from traditional SQL-type database that you learn several years ago in school. Well, you’re not alone.
It’s been some time I worked with databases. Most of my recent works have been theoretical performance analysis on wireless networks, and solving mathematical optimization problems where I end up writing these types of academic technical papers. Lately though, I found my morbidly curious self reading web and mobile application blogs, articles and whitepapers.
I’ve also decided to exploit my quantitative analysis and Python programming skills in the exciting area of Big Science where I now work @BigDataAnalytics Lab, @UNLV. I’ll be writing about my progress in this new field later however, in this short article, I’ll be writing a short synopsis on NoSQL and DBaaS course I took this weekend at the IBM sponsored Big Data University.
I will assume you are familiar with SQL-type RDBMS databases. In case you’re wondering, NoSQL are distinctly different from traditional row-column RDBMS (SQL-type) databases. Open this window on the side for definition of NoSQL Databases »
NoSQL features, use cases, limitations and examples
I’ll discuss the four major types of NoSQL databases, including their primary use cases, major architectural differences, advantages and limitations of each. There are four varieties of NoSQL databases you should be aware of:
- Key-Value type
- Document type
- BigTable or Column-oriented type
- Graph-based type.
These four NoSQL databases all have unique features that make them fit for different types of applications. In fact, you may end up implementing more than one type of these databases for a single application. One commonality is that they are mostly developed as Open-source technologies and more developer friendly than RDBMS dB.
Key-value types are the least complex of the NoSQL databases. If you’re familiar with Python dictionary then you know what key-value means. All data is stored with a key and an associated value blob.
Because Key-Value stores are represented as a hashmap, they’re powerful for basic Create-Read-Update-Delete operations, and these databases typically scale quite well and shard easily across ‘x’ number of nodes.
They’re great when quick performance is required and the data are not connected.
They are not meant for complex queries attempting to connect multiple pieces of data, and are fitting for single key operations only. When there are many-to-many relationships in the data, a Key-Value store is likely to exhibit poor performance.
When would be the best time to incorporate a Key-Value store? Well, anytime you need quick performance for basic Create-Read-Update-Delete operations and your data is not connected. For example:
- Storing and retrieving session information for a Web application.
- Storing user profiles and preferences within an application
- Storing shopping cart data for online stores or marketplaces.
Example Key-Value NoSQL Databases
Document databases store data as documents (e.g. JSON, XML, BSON, etc). They are built off the Key-Value model by making the value visible for query. They have very flexible schema and unlike RDBMS, the data need not have same length.
The first example would be for event logging for an application or process. Each instance would constitute a new document or aggregate, containing all the information corresponding to the event.
Another would be online blogging. Each user would be represented as a document; each post a document; and each comment, like, or action would be a document. All documents would contain information about the type of data, such as username, post content, or timestamp when the document was created.
More generally speaking, document stores work well with working datasets for Web and mobile applications. They were designed with the internet in mind – think JSON, RESTful API, and unstructured data.
It’s not possible for a document store to handle a transaction that operates over multiple documents and a relational database may be a better choice in this instance.
Document databases may not be the right choice if you find yourself forcing your data into an aggregate-oriented design. If it naturally falls into a normalized/tabular model, this would be another time to research relational databases instead.
Document databases are the most popular of the NoSQL databases in use today, and below you see some of the more common document databases.
Example Document-type NoSQL Databases
Column-Family databases spawned from an architecture that Google created called BigTable. These databases are also commonly called BigTable clones or Columnar databases. As you can tell from the name, these databases focus on columns and groups of columns when storing and accessing data.
Column Families are several rows, each with a unique key or identifier, that belong to one or more columns. These columns are grouped together in families because they are often accessed together.
It’s also important to point out that rows in a column family are not required to share any of the same columns. They can share all, a subset, or none of the columns and columns can be added to any number of rows and not to others.
Column-Family databases are great for when you’re dealing with large amounts of sparse data. When compared to row-oriented databases, Column-Family databases can better compress data and save storage space. In addition, these databases continue the trend of horizontal scalability. Like Key-Value and Document stores, these databases can handle being deployed across clusters of nodes.
Some example use cases for a Column-Family database include event logging and blogs, similar to document databases, but the data would be stored in a different fashion.
For enterprise logging, every application can write to its own set of columns and have each row key formatted in such a way to promote easy lookup based on application and timestamp.
Counters are a unique use case. You may come across applications that need an easy way to count or increment as events occur. Some Column-Family databases, like Cassandra, have special column types that allow for simple counters. In addition, columns can have a time-to-live parameter, making them useful for data with an end date, like trial periods or ad timing.
Examples of Column-family NoSQL Databases
Graph Databases, are the last NoSQL variation to review. This database type stands apart from the previous three types covered because it doesn’t follow a few of the common traits previously seen.
From a high level, Graph Databases store information in entities, or nodes…and relationships, or edges. Graph databases are phenomenal when your data set resembles a graph-like data structure. Traversing all the relationships is quick and efficient, but these databases tend not to scale as well horizontally. Graph Databases can be very powerful when your data is highly connected and related in some way.
Social networking sites can benefit by quickly locating friends, friends of friends, likes, and so on. And routing, spacial, and map applications may use graphs to easily model their data for finding close locations or building shortest routes for directions.
Lastly, recommendation engines can leverage the close relationships and links between products to easily give other options to their customers.
Graph Databases are not a good fit for when you’re looking for some of the advantages offered by the other NoSQL variations. When an application needs to scale horizontally, you’re going to quickly reach the limitations associated with these types of data stores.
Another general negative surfaces when trying to update all or a subset of nodes with a given parameter. These types of operations can prove to be difficult and non-trivial.
Example of Graph-based NoSQL Databases
So here you go. We’ve gone through the major type of noSQL databases, described the primary use cases of each type. We discussed the major architectural differences, advantages and disadvantages of each type. Have you worked with or learning any of these databases? Share your experiences in the comment box below. Which one do you find most convenient, simple or practical in each category?