Monday, November 26, 2018

Cassandra Data Distribution using Partitioners


Before wading through the partitioners lets have a look how generally data is being distributed in a Cassandra
cluster. It includes a ring like a topology between the nodes. Here the data are being broken into tokens and are circulated
among the nodes through the Cassandra ring. Data is distributed across the cluster by the value of the token by using the
hash technique. Based on the value of the tokens the data will be evenly distributed among the cluster. The main advantage
of this method is the data retrieval process can be quick based on the token range. Now we can have can see what does a
partitioner role in a cluster in detail.

Partitioners
Partitioners allow how row keys should be sorted and how data will be distributed across your nodes in the cluster. The
read/write request to the cluster are evenly distributed when each part of the hash range receives the same number of
tokens on average. Based on the difference in the hash methods partitioners are classified into three types as below:

  • Murmur3Partitioner
  • RandomPartitioner
  • Byte Ordered Partitioner

Murmur3 Partitioner
This is the default partitioning strategy for Cassandra. It provides fast hashing and good performance. It
uses MurmurHash hash values to distribute the data across the clusters.

Random Partitioner
It involves MD5 hash applied to place the keys on the node ring. An MD5 hash provides a natural way of
load balancing keys to nodes. Each data item is mapped to a token by calculating the MD5 hash of its key
The disadvantage of this method is that it causes inefficient range queries when keys specified in the range
might be in another ring.

Byte Ordered Partitioner
This method involves the distribution of the data lexically by key bytes in an ordered manner. It treats
the data as raw bytes, instead of converting them to strings. This is most likely to use when you want a
partitioner that doesn't want to validate the keys as being strings.

How to change a Partitioner?
So based on the need for the application you can choose any one of the above partitioners. In order
to apply it make the below change in the cassandra.yaml file. This file will be generally located in conf
directory.

So this is all about how a data is distributed among the Cassandra cluster and how partitioner helps more
for the sorting of data & retrieving them. Got any queries about partitioners to comment on them.

0 comments:

Post a Comment