Home > Cassandra, Database Development > Creating custom sorting types for Apache Cassandra

Creating custom sorting types for Apache Cassandra

Apache CassandraIn part three of Installing and using Apache Cassandra With Java we already talked about sorting and the various sorting types that are available by default. this time we are going to have a look as to what is needed to create your very own sorting type. In this example we will create a sorting type which will sort the data as if they are dates. Now to clarify already, i use the european notation style (dd-MM-yyyy) so don’t get confused by that.

Also what is very important to know, the sorting types are used when you are storing data (i mentioned this in part three), however, the sorting types are also used when you are using the get_slice method of the Cassandra Client.

As i mentioned in part three Cassandra provides us with an abstract class org.apache.cassandra.db.marshal.AbstractType which we must extend to define our sorting behavior. When we extend this class we need to provide two methods, first is the getString(byte[] bytes) method, this method is actually not that important, it is used by Cassandra so that it can display logging information with the actual data at hand. Another method that needs to be implemented is the compare(byte[] value1, byte[] value2) method of the Comparator interface.

Here is what we have for the getString method:

private static String ENCODING = "UTF-8";

...

@Override
public String getString(byte[] bytes) {
  try {
    return new String(bytes, ENCODING);
  } catch (UnsupportedEncodingException e) {
  }
 
  return "";
}

Since this method is used for logging purposes we will not throw an exception when the encoding failed, this would just mess up the logging that called this method. So in case anything went wrong we will ignore it and send an empty string back.

The compare method is a little bit more interesting:

private static String ENCODING = "UTF-8";

private static String DATE_FORMAT = "dd-MM-yyyy";

...

@Override
public int compare(byte[] value1, byte[] value2) {
  try {
    String date1 = new String(value1, ENCODING);
    String date2 = new String(value2, ENCODING);
     
    if (date1.isEmpty()) {
      if (date2.isEmpty()) {
        return 0;
      } else {
        return -1;
      }
    }
     
    if (date2.isEmpty()) {
      return 1;
    }

    SimpleDateFormat simpleDateFormat =
      new SimpleDateFormat(DATE_FORMAT);

    return simpleDateFormat.parse(date1)
      .compareTo(simpleDateFormat.parse(date2));
     
  } catch (UnsupportedEncodingException e) {
    throw new RuntimeException(e);
  } catch (ParseException e) {
    throw new RuntimeException(e);
  }
}

We first convert the byte arrays into normal String objects. We then return the appropriate value in case any of them is empty or when they are both filled we will use the comparator of the Date class to determine what the correct return value is. We use theSimpleDateFormat to provide us the capability of transforming a String into a java.util.Date object.

If anything fails, either the conversion to String or the conversion to Date it will throw a RuntimeException, this is necessary since we also need to be sure that the value is correct so it that it wont be stored when the client provides false information ( for example if i enter “ronald” in stead of a date it should fail ).

So now we just need to use it. To apply the sorting mechanism you need to compile the code and create a jar archive of it. You need to add it to the classpath of Cassandra, a simple way would be to add it to the /lib folder of Cassandra. In your storage-conf.xml file you can apply the sorting in the following way:

<ColumnFamily CompareWith="nl.sodeso.cassandra.db.DateType" Name="SortingSample"/>

Download the following binary file SortingType Binary (82) which you can use directly (by adding it to the /lib folder).

This posting also includes a download SortingType Source (102) with all the source code in a Eclipse project. When you have the project loaded into your Eclipse you will probably see some error messages, this has to do with a variable in your project libraries. Change the target folder of the variable in the following way:

Right click on the project and choose Properties. Click on the Java Build Path in the left part and then click on the Libraries tab. Now click on one of the lines that mention CASSANDRA_HOME and choose the Edit… button. Now you can click on the Variable… button to create a new variable with the name CASSANDRA_HOME, make sure it points to your Cassandra home folder ( for example C:/dev/apache-cassandra-0.6.0-beta2 ).

  • Add to favorites
  • Digg
  • del.icio.us
  • DZone
  • Reddit
  • StumbleUpon
  • Slashdot
  • Tumblr
  • Twitter
  • FriendFeed
  • Facebook
  • Google Bookmarks
  • MySpace
  • Faves
  1. Justin
    April 7th, 2010 at 14:31 | #1

    Thanks for the article! You might wan’t to think twice about making SimpleDateFormat static as it is not thread safe.

  2. April 7th, 2010 at 15:16 | #2

    Ah, good one :) was way too busy with the sorting behavior that i didn’t pay much attention to that. I’ve fixed it.

  1. April 14th, 2010 at 13:34 | #1
Did you know that it is also possible to register as a user? this enables you to create comments without constantly specifying your name, e-mail and captcha code. Register