In the past few months, I have tested many NoSQL solutions. Redis, MongoDB, HBase yet Cassandra is the Column Store DB I picked because of its speed (on writes), reliability, built in feature set that makes it multi-datacenter aware. The one other personal reward for Cassandra is it is written in Java. I like reading and writing in Java more than C++ although it really does not matter for me personally in the end.
Let us talk about the reason why I am introducing Cassandra into my infrastructure and some of its drawbacks I have noticed so far.
Why it is being introduced:
We have a feature where we record every single click for 50 million Monthly Active Users (real-time) and storing this in mySQL is just waste of semi-good hardware for data that is only looked at for the past 24 hours. Over the course of some time (couple of months) more than 3 billion rows accumulated, which translated into a 3.5 TB distributed INNODB datafile. So purging/archiving this data just sucks.
Now introducing user clicks into Cassandra was rather easy. I researched various sites, asked my Digg buddy and then figured out the rest. Within two days I was up and running with Cassandra and had a great understanding about Column Families and SuperColumns.
Developing with a Cassandra Data layer:
Now that the infrastructure is up, I needed to add a data model to /opt/cassandra/conf/storage.conf. The SQL that drove this functionality consisted of two main SQL statements.
Add click
Get a Range of Clicks
For these operations, mySQL rarely takes 0.001 seconds (1 ms).
Cassandra for writes is rather fast, but for reads, Getting Range Clicks
i.e. using
recv_get_slice
it takes .02 seconds (20 ms).
What does this mean? MySQL is a hell of a lot faster! Is it because of my CF design? No.
for instance, take this slug (keys separated by a delimiter to make a distinct key name).
{$clicker}_{$pageowner} => [ $object_id_clicked0 => $value,$ts, ... $object_id_clicked =>N $value,$ts ]
The slug says that clicker A, clicked on page B and what is stored are columns of distinct clicks - remember this is a column store DB.
So
{$clicker}_{$pageowner} == the KEY for the COLUMN $object_id_clicked with the value being a $value and the free extra value the $timestamp.
Here is the php code
$columnPath = $this->getColumnPath($objectid);
$this->getCassandraConnect()->insert(self::KEYSPACE, $this->getKey($clicker, $pageOwner), $columnPath, $this->today_ts, microtime(true), ConsistencyLevel::ZERO);
Now I want a list of items clicked.
$data = $this->getCassandraConnect()->get_slice(self::KEYSPACE, $this->getKey($clicker, $pageOwner), $this->getColumnParent(), $this->getSlicePredicate(), ConsistencyLevel::ONE);
This says give me the last N logged clicks that the clicker A made for page owner B. This is a hash lookup (Big-O(1)) but a sorted list of columns (O(nlogn)) and return the last N elements.
Why is Cassandra sooo slow on reads. Is it because my memory config is not enough?
No. 7 GB of data is allocated for data that fits in memory ( for now ).
Is it because my disk is saturated?
avg-cpu: %user %nice %system %iowait %steal %idle
12.21 0.00 2.85 0.48 0.00 84.47
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.01 14.01 0.23 6.61 22.25 165.03 27.34 0.33 47.57 7.35 5.04
sda1 0.00 0.00 0.00 0.00 0.00 0.00 20.83 0.00 7.74 6.40 0.00
sda2 0.01 14.01 0.23 6.61 22.25 165.03 27.34 0.33 47.57 7.35 5.04
dm-0 0.00 0.00 0.24 20.63 22.25 165.03 8.97 0.72 34.27 2.41 5.04
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 8.00 0.00 5.28 0.22 0.00
No its not.
My
hunch is the slowdown is a combo of the low thread read pool and in the inherent sorting and there doesn't seem to be a way to turn it off, so without digging in the code I couldn't say (I will be able to once I get an understanding of the code layout).
In conclusion, I like Cassandra, it’s very fast in writes, slow (for my taste but fast enough) in reads and what takes 2 lines of SQL takes 250 lines of PHP code interfacing with the Cassandra.thrift suite. I am going to use it still because it is good enough and I love the built in HA of it. Additionally the performance is improving very quickly.