Installing and using Apache Cassandra With Java Part 1 (Installation)
Completely updated (August 1, 2010) for Cassandra 0.6.4
In the old days, well old, in the time when we were still using Cassandra 0.5.x it was a bit more work to install Cassandra then now-a-days.. with the current release of 0.6.4 it is as easy as a winds breeze. This has been the case since the last few versions and has led me to rewrite this part completely. A lot of information that was here is not relevant anymore. So here is the completely re-written part about installing Cassandra:
Introduction:
I’m going to write a few postings on how to use the Cassandra database with Java, although i am in no way an expert on how to use Cassandra i am very intrigued about the database because of it’s small installation, high performance and scalability. During the writing of these posts i am also learning the Cassandra database and i’m sharing my experiences with it through my posts on this blog.
Like i said before, Cassandra is a very high performing and scalable database, it doesn’t follow the normal SQL database principles like schema’s, tables / columns, datatypes and a query language like SQL. Instead it’s a non-relational database similar to Google’s BigTable. Cassandra was initially developed by Facebook which has contributed it to the open source community. Currently it is used by websites like Facebook, Twitter, Digg, Rackspace and many others. So even though it is still only version 0.6 at the time of writing this it has already proven itself in production environments.
Some of the key-features of Cassandra:
- Fault Tolerant – Data is automatically replicated to multiple nodes for fault-tolerance. Replication across multiple data centers is supported. Failed nodes can be replaced with no downtime.
- Decentralized – Every node in the cluster is identical. There are no network bottlenecks. There are no single points of failure.
- Flexible – Read and write throughput both increase linearly as new machines are added, with no downtime or interruption to applications.
- Highly Available – Writes and reads offer a tunable ConsistencyLevel, all the way from “writes never fail” to “block for all replicas to be readable,” with the quorum level in the middle.
Some of the features i find very interesting:
- Java – Cassandra is completely written in Java, which i find very nice, not only because i am fond of Java but also because it proves that Java can also be very scalable. It also has the advantage that you can easily incorporate a cluster of database on different operating systems.
- Small – Compared to other databases, Cassandra is very small, the download is only +/- 15 Mb and after installation it is only +/- 17Mb large.
- Ease of use – As you will find out during these postings, Cassandra is very easy to install, and for my feeling it is also quite easy to use, the clients do still suffer from a lack of libraries like connection pooling (in the case of Java) and auto-fail over but there are open source projects working on this.
- Not really a feature but from my own perspective, it is something different for a change, up until now i have mainly used relational databases and it does get ‘boring’, i hope that by using Cassandra i will get some new ideas on how things can also be done in a different way.
To learn more about Cassandra and it’s capabilities i would like to refer you to the following web pages which contain basically all the information you need to learn more about Cassandra:
For now this post will be all about the installation of Cassandra, this could be different for various operating systems but i have noticed it to be the same on Windows and MacOS X (and therefor i would assume it is also the same for most Linux installations). While under Linux and MacOS X you can use package managers to install Cassandra i prefer to do it a bit differently so that i get a better understanding to what is needed to get the installation to work.
First of all, download the Cassandra binary from the download website, currently i am using using the 0.6.4 version. When downloaded find a nice spot to extract the archive, there is no actual installation so find a suitable place directly from where the database will work.
When extracted you will find a few folders which i will explain here:
- /bin – contains all the executables for Windows, Linux and MacOS X
- /conf – contains the logging property files, the password properties file and even more important, the storage configuration file.
- /interface – contains the Thrift interface file.
- /javadoc – contains the Java documentation of the Cassandra database source code.
- /lib – contains the Cassandra and 3rd party libraries used by Cassandra and the Cassandra library itself.
Most of the time you will be using the /bin and /conf folder.
Now when you have done everything you should be able to run your Cassandra installation for the first time, go to a terminal / console window and go to your installation folder. For windows just execute the cassandra.bat file, for Linux / MacOS X execute the command in the following way:
sudo ./cassandra
Upon execution you should get some logging information which should look something similar to the following:
As you can see, the installation is very simple, just download and extract the archive. In the top of this posting you can find more information on how to work with Cassandra in the follow-up parts of the tutorial.

I`m looking forward for next part, nice job. I did installation on Windows 7 (64bit) using Cassandar 0.5.1 and step with getting 3rd party libraries (like Log4J, Google Collections, Apache Commons Collections, Apache Commons Lang or SLF4J) wasn`t required as all of them were already in /lib directory. Server is up and running perfectly.
I have also downloaded the 0.5.1 version of Cassandra and indeed the libraries were already present, however when you download the 0.6.0 version of Cassandra these seem to be missing again. I have a feeling they have some minor issues with making the distributions. Before writing this post i wasn’t able the download any version from any repository except for the nightly builds, and as i mention in the post, version 0.6.0 even contains two versions of Cassandra itself. But i will update this post with the extra information. Thanks for the heads up.
Thank you for the article. I’m waiting for part 2 so that I have something interesting to read in meetings.
Of course, if you download the source and run ant, it goes off and gets the dependencies for you.
That is completely true, i’ve also installed it using a package manager wich works just fine, however, in de documentation it is stated that the best way to install it is by just extracting the archive. I personally prefer that way of installing since i have full control over what is happening and what is needed to be done to get it working. I will update the post with all comments i have recieved about the libraries. Thanks for the extra information.
In order to run cassandra-cli you also need Commons CLI jar file from http://commons.apache.org/cli/download_cli.cgi and JLine jar file from
http://sourceforge.net/projects/jline/files/
The substitution in the batch file did not work for me as shown below.
C:\Java\servers\Apache-Cassandra-0.6.0-beta2\bin>cassandra
Invalid parameter – P:
Starting Cassandra Server
Listening for transport dt_socket at address: 8888
Exception in thread “main” java.lang.NoClassDefFoundError: org/apache/cassandra/thrift/CassandraDaemon
Caused by: java.lang.ClassNotFoundException: org.apache.cassandra.thrift.CassandraDaemon
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
Could not find the main class: org.apache.cassandra.thrift.CassandraDaemon. Program will exit.
C:\Java\servers\Apache-Cassandra-0.6.0-beta2\bin>cassandra.bat
I fixed it in cassandra.bat by replacing
REM Shorten lib path for old platforms
subst P: “%CASSANDRA_HOME%\lib”
P:
set CLASSPATH=P:\
for %%i in (*.jar) do call :append %%i
goto okClasspath
:append
set CLASSPATH=%CLASSPATH%;P:\%*
goto :eof
WITH
REM For each jar in the CASSANDRA_HOME lib directory call append to build the CLASSPATH variable.
for %%i in (%CASSANDRA_HOME%\lib\*.jar) do call :append %%~fi
goto okClasspath
:append
set CLASSPATH=%CLASSPATH%;%1%2
goto :eof
@Ronald Mathies
I downloaded version 0.5 and copied the missing libs from there.
Also, I don’t think you need to run it with sudo in Linux. I also created some symbolic links (/var/lib/cassandra) pointing to the actual place I decompressed the tarball. I don’t know if it was necessary, but it seemed that way from a quick look at the startup script.
You don’t have to use sudo, however i had a lot of problems due to security restrictions on my computer ( for example opening a random port is resticted on my computer )
Where is the secong part for this????? Is it posted somewhere else.. Can I get the link???
Thanks
Ah yes, i still need to add all the links to all the parts, so these are the parts that are available:
Part 2: http://www.sodeso.nl/?p=108
Part 3: http://www.sodeso.nl/?p=207
Currently i am working on part 4..
Hi,
I’m trying to install cassandra on windows xp machine.
I follow all those steps but I can’t get it work.
I also modify the storage-conf.xml changing the paths for CommitLogDirectory and DataFileDirectories.
But when I run cassandra.bat I get a messege:
C:\Programs\cassandra\apache-cassandra-0.6.0-beta3\bin>cassandra.bat
Starting Cassandra Server
El sistema no puede hallar la ruta especificada.
(Something like “The system can not find the path specified.”)
What is wrong here?
Regards.
@maxi
Hi again,
I solve the problem (my mistake). I had the JAVA_HOME env var setting to \bin folder instead of base root java installation.
Great article. Thank for shared.
Thanks for this great series !
I downloaded version 0.6.1 to follow this tutorial, and all the libraries are included. Too bad, I realized that after downloading all the needed libraries
Ah yes, i did mention that in the post i think that they still haven’t decided if they wanted to bundle everything together or not. Apparently for version 0.6.1 they did. I will adjust the post so that it reflects this better, and that people first check if they are present or not.
Great article, I tried Cassandra under OpenJDK 1.6.0_18 – it didn’t work. However it worked with Sun JDK.
Thanks for a beautiful write up on Cassandra.. It was very helpful to get to speed with your articles ..
Hi,
I have a requirement of using Cassandra in my application. In my application there is one table with lot of data and most of my application uses that table. Due to lot of data,performance of the application is decreasing when i use that table is in Oracle.
So, I have decided to use the Cassandra database for that one table and all other tables in oracle. Lot of business logic is dependent on that table.
No my question is, Can I use the Cassandra for a table which has lot of business logic.
I am unable to implement lot of where clauses for Cassandra database.
Is there any supporting tool to use Cassandra in an efficient way?
Please let me know…
i am in urgency..
Thanks in advance
By Mallik
Where clauses are not possible within the Cassandra database. What you normally do is create multiple inverted index ColumnFamilies. The advantage of this is that retrieving data based on keys is very fast and efficient. Since Cassandra only allows you to define a single key value you need to figure out what your search paths are. Suppose i have a table with cars. If i would like to search all cars from a single brand i would create a ColumnFamily where the key is the brand name and columns are the ID’s of the cars. But if i want to search cars by engine type i would need to create another ColumnFamily where the engine type is the Key and the columns are the ID’s of the cars. These two column families don’t have any information about the cars itself except the key information.
A third Column family would be needed to store the car details where the key is the ID of the car en de columns the details.
About the business logic, Cassandra itself doesn’t provide any means or methods for handling business methods. So you would need to apply business logic from within the client before storing data or after retrieving data. Also keep in mind that the consistency between the two tables needs to be handled, what happens if data is removed from one database but us dependent on the data in the other database?
But some questions, the table you are referring to, how much data does it contain? What is the structure of the data?