Installing and using Apache Cassandra With Java Part 3 (Data model 2)
The last time we talked about the various containers that are present within the Cassandra data model (for example, column, super column, column family, etc..). One part we didn’t talk about is the the sorting behavior. Unlike normal relational databases Cassandra has no capability of querying so you are not able to specify sorting when you retrieve data.
By default Cassandra sorts the data as soon as you store it in the database and it remains sorted. This gives you an enormous performance boost, however you need to think before you start storing data.
Sorting can be specified on the ColumnFamily CompareWith attribute, these are the options you can choose from (it is possible to create custom sorting behavior but we will cover that later):
- BytesType
- UTF8Type
- LexicalUUIDType
- TimeUUIDType
- AsciiType
- LongType
Each of these types threat the contents of your Columns‘ name as a different data type, for example, the LongType threats your Columns name as a 64 Bit long value.
So lets look at some examples, suppose we have a ColumnFamily defined where the CompareWith is set to LongType, the data before formatting would look like:
| Columns | |
| Name | Value |
| “9″ | “Ronald” |
| “3″ | “John” |
| “15″ | “Eric” |
Since we are using the LongType to sort the name attribute of the Columns the data will be stored in the following way:
| Columns | |
| Name | Value |
| “3″ | “John” |
| “9″ | “Ronald” |
| “15″ | “Eric” |
As you can see the ordering is now a natural ordering of numbers, Now if we would change the CompareWith so that we use the the UTF8Type then the result will be compared as UTF8 strings, which would result in the following ordering:
| Columns | |
| Name | Value |
| “15″ | “Eric” |
| “3″ | “John” |
| “9″ | “Ronald” |
As you can see the result is completely different, so every type that can be used in the CompareWith attribute has it’s own behavior.
The rules of sorting not only apply to Columns but also to SuperColumns, in case of the SuperColumns we also need to specify a second sorting rule using the CompareSubcolumnsWith attribute.
Suppose we have the following construction, three SuperColumns with each containing three Columns, unordered:
| SuperColumns | |||||||||||
| Key | Value | ||||||||||
| “Learning Cassandra Part 2″ |
| ||||||||||
| “Learning Cassandra Part 1″ |
| ||||||||||
| “Learning Cassandra Part 3″ |
| ||||||||||
Now when we add a UTF8Type ordering to the CompareWith and the UTF8Type to CompareSubcolumnsWith to the SuperColumnFamily we get the following result:
CompareSubcolumnsWith="UTF8Type" Name="Posts"/>
| SuperColumns | |||||||||||
| Key | Value | ||||||||||
| “Learning Cassandra Part 1″ |
| ||||||||||
| “Learning Cassandra Part 2″ |
| ||||||||||
| “Learning Cassandra Part 3″ |
| ||||||||||
In this example i used the UTF8Type for both the SuperColumn as for the Column within the SuperColumn, this doesn’t have to be the case, you can mix them using all the various sorting types. However it is not possible to have different sorting types on the same level, so it is not possible to use UTF8Type and the LongType for different SuperColumns in the same SuperColumnFamily, the same rule applies for Culumns.
Besides the standard provided sorting types it is also possible to add your own custom sorting types. To create these you need to create a Class which extends the org.apache.cassandra.db.marshal.AbstractType class. To use it in the configuration file you need to package your class in a Java Archive and add it to the /lib folder of your Cassandra installation. In the database configuration file you need to specifiy the fully qualified classname in the CompareSubcolumnsWith or CompareWith attribute. This makes the sorting capabilities even more powerful. In a later post i will show an example of creating a custom sorting type.
Currently i am already working hard on the next blog post which will cover the basics of using Thrift (Apache Cassandra’s client API).
It’s great, thanks a lot for sharing
And waiting for your how to use thrift blogs
Hi
I am a new Bie of Cassandra DB, say suppose i want to load all the records from a table, in the cassandra terminilogy how to load all the key values from a columnfamily, if you dont know the keys values.
and one more thing any sorting kind of thing we can’t done at db level right, all has take care in the application only, so more burden in the application, any alternatives for this …..
Thanks in Advance….
Retrieving all data from a ColumnFamily can be done by using the KeyRange class. Normally you would specify a starting key and an end key but they are not required. However you do need to specify a count as to how many rows you want to retrieve.
You need to keep one thing in mind, until Cassandra 0.7 (and possibly the first beta’s 0.7) you cannot retrieve more data that can fit in memory at a time.
About the sorting, Cassandra stores it’s data in a sorting manner, besides the default sorting types you can also create a custom sorting class. More information about this can be found here:
Creating custom sorting types for Apache Cassandra
Hope this answers your questions…