Home > Cassandra, Database Development, Java Development > Installing and using Apache Cassandra With Java Part 1 (Installation)

Installing and using Apache Cassandra With Java Part 1 (Installation)

Looking for the follow-up postings?

Apache Cassandra

General reaction to some of the comments:

So i have been getting quite some reactions about the installation process of Cassandra, especially about the 3rd party libraries which are not included in the distribution download of version 0.6.0. I’ll try to explain myself why i specifically chose to use this method of installation. First of all, it gives me a better understanding what the installation process is all about, if i use a package manager, i cannot see what is being installed where it is being installed, and what it’s dependencies are, further more one big dis-advantage is that if i would like to have multiple versions of Cassandra on my disk i cannot use the package manager since it will update the current installation when a new version is released. Using the extraction method i can decide for myself where i want to install it and how i want to use it, in this way i can have multiple versions running next to each other (of course only one at the time) but i can always fall back to a previous version or test a new version.

One interesting note is that i also got an e-mail from Jonathan Ellis (blog, twitter) who has the project chair at the Cassandra database project and is architect at Rackspace, he stated that the final release could include all the libraries in the Cassandra distribution or there will be no distribution at all, in this case users just use Ant to build the distribution from source. Personally i hope that they will release distributions, my experiences with JBoss and Tomcat (and others) that handle this method of distribution is very positive, but this is still in debate on the dev mailing list (Developers: (subscribe) (archives)).

UPDATE: Newer versions of Cassandra (version 0.6.1) already include most of the libraries, so before you start downloading the 3rd party libraries separately you might want to check if it doesn’t already exist.

I’m going to write a few postings on how to use the Cassandra database with Java, although i am in no way an expert on how to use Cassandra i am very intrigued about the database because of it’s small installation, high performance and scalability. During the writing of these posts i am also learning the Cassandra database and i’m sharing my experiences with it through my posts on this blog.

Like i said before, Cassandra is a very high performing and scalable database, it doesn’t follow the normal SQL database principles like schema’s, tables / columns, datatypes and a query language like SQL. Instead it’s a non-relational database similar to Google’s BigTable. Cassandra was initially developed by Facebook which has contributed it to the open source community. Currently it is used by websites like Facebook, Twitter, Digg, Rackspace and many others. So even though it is still only version 0.6 at the time of writing this it has already proven itself in production environments.

Some of the key-features of Cassandra:

  • Fault Tolerant – Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.
  • Decentralized – Every node in the cluster is identical. There are no network bottlenecks. There are no single points of failure.
  • Flexible – Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.
  • Highly Available – Writes and reads offer a tunable ConsistencyLevel, all the way from “writes never fail” to “block for all replicas to be readable,” with the quorum level in the middle.

Some of the features i find very interesting:

  • Java – Cassandra is completely written in Java, which i find very nice, not only because i am fond of Java but also because it proves that Java can also be very scalable. It also has the advantage that you can easily incorporate a cluster of database on different operating systems.
  • Small – Compared to other databases, Cassandra is very small, the download is only +/- 15 Mb and after installation it is only +/- 17Mb large.
  • Ease of use – As you will find out during these postings, Cassandra is very easy to install, and for my feeling it is also quite easy to use, the clients do still suffer from a lack of libraries like connection pooling (in the case of Java) and auto-fail over but there are open source projects working on this.
  • Not really a feature but from my own perspective, it is something different for a change, up until now i have mainly used relational databases and it does get ‘boring’, i hope that by using Cassandra i will get some new ideas on how things can also be done in a different way.

To learn more about Cassandra and it’s capabilities i would like to refer you to the following web pages which contain basically all the information you need to learn more about Cassandra:

For now this post will be all about the installation of Cassandra,this could be different for various operating systems but i have noticed it to be the same on Windows and MacOS X (and therefor i would assume it is also the same for most Linux installations). While under Linux and MacOS X you can use package managers to install Cassandra i prefer to do it a bit differently so that i get a better understanding to what is needed to get the installation to work.

First of all, download the Cassandra binary from the download website, currently i am using using the 0.6.0 beta2 version which was released on 26 of February 2010), as you may already notice, the download is actually quite small for a database (+/- 15 Mb). When downloaded find a nice spot to extract the archive, there is no actual installation so find a suitable place directly from where the database will work.

When extracted you will find a few folders which i will explain here:

  • /bin – contains all the executables for Windows, Linux and MacOS X
  • /conf – contains the logging property files, the password properties file and even more important, the storage configuration file.
  • /interface – contains the Thrift interface file.
  • /javadoc – contains the Java documentation of the Cassandra database source code.
  • /lib – contains the Cassandra and 3rd party libraries used by Cassandra and the Cassandra library itself.

Most of the time you will be using the /bin and /conf folder. However when you try to start Cassandra you will get a few exceptions since it misses a few 3rd party libraries , so we add them now:

  • Log4J (download), from the archive copy the log4j-x.x.x.jar file into the /lib folder of Cassandra
  • Google Collections (download), from the archive copy the google-collect.x.x.jar into the /lib folder of Cassandra
  • Apache Commons Collections (download), from the archive copy the commons-collections-x.x.x.jar into the /lib folder of Cassandra
  • Apache Commons Lang (download), from the archive copy the commons-lang-x.x.jar into the /lib folder of Cassandra
  • SLF4J (download), from the archive copy the slf4j-api-x.x.x.jar and slf4j-log4jxx-x.x.x.jar into the /lib folder of Cassandra
  • Apache Commons CLI (download), from the archive copy the commons-cli-x.x.jar into the /lib folder of Cassandra
  • jLine (download), from the archive copy the jline-x.x.xx.jar into the /lib folder of Cassandra

At the time of writing i used the following versions of each library:

  • Log4J (version 1.2.15)
  • Google Collections (version 1.1)
  • Apache Commons Collection (version 3.2.1)
  • Apache Commons Lang (version 2.5)
  • SLF4J (version 1.5.11)
  • Apache Commons CLI (version 1.2)
  • jLine (version 0.9.94)

One thing to notice in the /lib folder, the download i have an extra apache-cassandra-0.6.0-beta1.jar file beside the apache-cassandra-0.6.0-beta2.jar, i have no clue why we have to but i removed the beta1 library since i think this was a mistake.

We also need to setup (unless you already have them) some environmental variables.

Press the Windows Key + Break (which will open the System Properties window), go to the Advanced tab and click on Environment Variables, in the System Variables part check if the JAVA_HOME variable is present which should point to a Java 1.6 installation, and check if the same path of the Java installation is configured in your Path variable. Do notice that the JAVA_HOME contains the path to the base root of the Java installation and the Path variable refers to the /bin folder of the Java installation.

Secondly, add the CASSANDRA_HOME variable which should point to the base installation folder of your Cassandra installation.

For Linux / MacOS X you will need to perform these actions a bit differently, first open up a terminal window and execute the export function, now check if the JAVA_HOME is present, if not add it using the following command:

export JAVA_HOME=”/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home”

Do mind however that the path could be different on your installation, so check to be sure, also check if you have a Java 1.6 installation, secondly check if the PATH variable contains a reference to the Java /bin folder of the same installation, if not you can add it in the following manner:

export PATH=”/System/Library/Frameworks/JavaVM.framework/Versions/1.6/Home/bin:$PATH”

Finally we need to add the CASSANDRA_HOME:

export CASSANDRA_HOME=”/Users/myself/Development/apache-cassandra-0.6.0-beta2/”

Now when you have done everything you should be able to run your Cassandra installation for the first time, go to a terminal / console window and go to your installation folder. For windows just execute the cassandra.bat file, for Linux / MacOS X execute the command in the following way:

sudo ./cassandra

Upon execution you should get some logging information which should look something similar to the following:

INFO 21:21:26,689 Auto DiskAccessMode determined to be mmap
INFO 21:21:26,894 Replaying /var/lib/cassandra/commitlog/CommitLog-1268770559987.log
INFO 21:21:26,910 Log replay complete
INFO 21:21:26,942 Saved Token not found. Using 158140755061238798699946344590455852626
INFO 21:21:26,949 Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1268770886949.log
INFO 21:21:26,958 Starting up server gossip

As you can see, the installation is actually quite simple, however we still need to configure the database before we can actually use it. That will be part two in the series about Installing and using Apache Cassandra With Java, and you won’t have to wait long for this part, i hope to get this posted either tomorrow (17th of March 2010) or on thursday (18th of March 2010).

  • Add to favorites
  • Digg
  • del.icio.us
  • DZone
  • Reddit
  • StumbleUpon
  • Slashdot
  • Tumblr
  • Twitter
  • FriendFeed
  • Facebook
  • Google Bookmarks
  • MySpace
  • Faves
  1. grzegorz
    March 16th, 2010 at 22:25 | #1

    I`m looking forward for next part, nice job. I did installation on Windows 7 (64bit) using Cassandar 0.5.1 and step with getting 3rd party libraries (like Log4J, Google Collections, Apache Commons Collections, Apache Commons Lang or SLF4J) wasn`t required as all of them were already in /lib directory. Server is up and running perfectly.

  2. March 17th, 2010 at 05:38 | #2

    I have also downloaded the 0.5.1 version of Cassandra and indeed the libraries were already present, however when you download the 0.6.0 version of Cassandra these seem to be missing again. I have a feeling they have some minor issues with making the distributions. Before writing this post i wasn’t able the download any version from any repository except for the nightly builds, and as i mention in the post, version 0.6.0 even contains two versions of Cassandra itself. But i will update this post with the extra information. Thanks for the heads up.

  3. Thai Dang Vu
    March 17th, 2010 at 14:10 | #3

    Thank you for the article. I’m waiting for part 2 so that I have something interesting to read in meetings.

  4. Bryan
    March 17th, 2010 at 15:20 | #4

    Of course, if you download the source and run ant, it goes off and gets the dependencies for you.

  5. March 17th, 2010 at 15:57 | #5

    That is completely true, i’ve also installed it using a package manager wich works just fine, however, in de documentation it is stated that the best way to install it is by just extracting the archive. I personally prefer that way of installing since i have full control over what is happening and what is needed to be done to get it working. I will update the post with all comments i have recieved about the libraries. Thanks for the extra information.

  6. skanga
    March 17th, 2010 at 22:45 | #6

    In order to run cassandra-cli you also need Commons CLI jar file from http://commons.apache.org/cli/download_cli.cgi and JLine jar file from
    http://sourceforge.net/projects/jline/files/

  7. skanga
    March 17th, 2010 at 23:04 | #7

    The substitution in the batch file did not work for me as shown below.

    C:\Java\servers\Apache-Cassandra-0.6.0-beta2\bin>cassandra
    Invalid parameter – P:
    Starting Cassandra Server
    Listening for transport dt_socket at address: 8888
    Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/cassandra/thrift/CassandraDaemon
    Caused by: java.lang.ClassNotFoundException: org.apache.cassandra.thrift.CassandraDaemon
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
    Could not find the main class: org.apache.cassandra.thrift.CassandraDaemon. Program will exit.

    C:\Java\servers\Apache-Cassandra-0.6.0-beta2\bin>cassandra.bat

    I fixed it in cassandra.bat by replacing

    REM Shorten lib path for old platforms
    subst P: “%CASSANDRA_HOME%\lib”
    P:
    set CLASSPATH=P:\

    for %%i in (*.jar) do call :append %%i
    goto okClasspath

    :append
    set CLASSPATH=%CLASSPATH%;P:\%*
    goto :eof

    WITH

    REM For each jar in the CASSANDRA_HOME lib directory call append to build the CLASSPATH variable.
    for %%i in (%CASSANDRA_HOME%\lib\*.jar) do call :append %%~fi
    goto okClasspath

    :append
    set CLASSPATH=%CLASSPATH%;%1%2
    goto :eof

  8. Marcelo
    March 19th, 2010 at 13:42 | #8

    @Ronald Mathies

    I downloaded version 0.5 and copied the missing libs from there.

    Also, I don’t think you need to run it with sudo in Linux. I also created some symbolic links (/var/lib/cassandra) pointing to the actual place I decompressed the tarball. I don’t know if it was necessary, but it seemed that way from a quick look at the startup script.

  9. March 19th, 2010 at 14:30 | #9

    You don’t have to use sudo, however i had a lot of problems due to security restrictions on my computer ( for example opening a random port is resticted on my computer )

  10. Karthik
    March 26th, 2010 at 22:07 | #10

    Where is the secong part for this????? Is it posted somewhere else.. Can I get the link???

    Thanks

  11. March 26th, 2010 at 22:16 | #11

    Ah yes, i still need to add all the links to all the parts, so these are the parts that are available:

    Part 2: http://www.sodeso.nl/?p=108
    Part 3: http://www.sodeso.nl/?p=207

    Currently i am working on part 4..

  12. maxi
    March 30th, 2010 at 11:31 | #12

    Hi,

    I’m trying to install cassandra on windows xp machine.
    I follow all those steps but I can’t get it work.

    I also modify the storage-conf.xml changing the paths for CommitLogDirectory and DataFileDirectories.

    But when I run cassandra.bat I get a messege:

    C:\Programs\cassandra\apache-cassandra-0.6.0-beta3\bin>cassandra.bat
    Starting Cassandra Server
    El sistema no puede hallar la ruta especificada.

    (Something like “The system can not find the path specified.”)

    What is wrong here?

    Regards.

  13. maxi
    March 30th, 2010 at 11:53 | #13

    @maxi
    Hi again,

    I solve the problem (my mistake). I had the JAVA_HOME env var setting to \bin folder instead of base root java installation.

    Great article. Thank for shared.

  14. April 26th, 2010 at 11:40 | #14

    Thanks for this great series !
    I downloaded version 0.6.1 to follow this tutorial, and all the libraries are included. Too bad, I realized that after downloading all the needed libraries :-)

  15. April 26th, 2010 at 12:01 | #15

    Ah yes, i did mention that in the post i think that they still haven’t decided if they wanted to bundle everything together or not. Apparently for version 0.6.1 they did. I will adjust the post so that it reflects this better, and that people first check if they are present or not.

  16. May 4th, 2010 at 11:10 | #16

    Great article, I tried Cassandra under OpenJDK 1.6.0_18 – it didn’t work. However it worked with Sun JDK.

  17. Palanikumar
    May 28th, 2010 at 06:54 | #17

    Thanks for a beautiful write up on Cassandra.. It was very helpful to get to speed with your articles ..

  1. March 18th, 2010 at 02:59 | #1
  2. March 20th, 2010 at 15:38 | #2
  3. April 7th, 2010 at 06:36 | #3
  4. April 8th, 2010 at 00:07 | #4
  5. April 21st, 2010 at 08:11 | #5
  6. April 27th, 2010 at 02:02 | #6
  7. May 6th, 2010 at 15:10 | #7
  8. May 6th, 2010 at 15:38 | #8
  9. May 7th, 2010 at 16:08 | #9
  10. June 12th, 2010 at 09:25 | #10
Did you know that it is also possible to register as a user? this enables you to create comments without constantly specifying your name, e-mail and captcha code. Register