Home > Cassandra, Database Development > Importing and exporting JSON data from Apache Cassandra

Importing and exporting JSON data from Apache Cassandra

Apache CassandraImporting and exporting data from a database is something important, it keeps you save when hardware / software fails and adds the ability to perform common database administration tasks. If something goes wrong you can just recover an earlier made backup.

Cassandra support two different ways of backing up data. The first way is by using the nodetool in combination with the snapshot argument, this creates a complete dump of the data. Before you start using the snapshot tool to create a backup read the documentation online, there are some playing rules involved.

However, in this topic i will discuss the second method. Cassandra provides two applications for exporting and importing data, sstable2json and json2sstable. Like the name suggests, an export is stored in a JSON style formatting, if you want to know more about the JSON data structures i suggest you head over to the following website:

http://www.json.org/

There are quite some really nice advantages of using these two tools, first of all, you don’t have to export all the data. It is possible to only export a number of rows by specifying keys. The output is human-readable, since it uses the JSON notation you can do other things with the export besides storing it. You could also export data from an existing database into the required format and import it into a sstable.

The two applications resides in the /bin folder of your Cassandra installation, unlike Cassandra or the CLI interface there is no execution file directly there, i don’t know if this was done on purpose but it does make it a bit more difficult as to how to use it. One reason might be that if you want to export data from a different node then then this could interfere since it needs to know the storage-conf.xml file from the node that you want to export data from.

However, since i currently only use a single node i created two Windows batch files which makes it a bit easier to use. You can download them from below, using these batch files on different nodes should not be a problem. It uses the CASSANDRA_HOME environmental variable to locate the necessary dependencies and configuration file. You also need to have a JAVA_HOME environmental variable pointing to a Java 1.6+ installation.

Cassandra doesn’t have to be running to export or import data, however you do need to specify the database files containing the data. These files can be on different locations depending on your configuration. To know the location of the files open the storage-conf.xml file and look for the DataFileDirectory setting. If you haven’t changed it it will be /var/lib/cassandra/data.

Now to start simple, lets export some data to the console, in my case i will use the database configuration as specified by the posts i made some time ago. I have a number of column families in the Blog keyspace, one of them is Authors. Suppose i want to export the compete Authros ColumnFamily i would execute the following:

sstable2json C:\var\lib\cassandra\data\Blog\Authors-1-Data.db

Notice i specify the XXXX-Data.db file, every ColumnFamily has three files, always specify the XXXX-Data.db file since this contains the actual data. So here is the result i got on my console:

{
   "Eric Long":[
      [
         "636f756e747279",
         "556e69746564204b696e67646f6d",
         1269842588093,
         false
      ],
      [
         "656d61696c",
         "657269632028617429206c6f6e672e636f6d",
         1269842588093,
         false
      ],
      [
         "7265676973746572656453696e6365",
         "30312f30312f32303032",
         1269842588093,
         false
      ]
   ],
   "Ronald Mathies":[
      [
         "636f756e747279",
         "4e65746865726c616e64732c20546865",
         1269842588125,
         false
      ],
      [
         "656d61696c",
         "726f6e616c64202861742920736f6465736f2e6e6c",
         1269842588125,
         false
      ],
      [
         "7265676973746572656453696e6365",
         "30312f30312f32303130",
         1269842588125,
         false
      ]
   ]
}

As you notice, the data itself is not really readable, this is because it is all stored in hexadecimal values, except for the keys, they are readable.

It is also possible to specify keys of the rows that you want to export, if we for example only want row with the key “Ronald Mathies” we can do the following:

sstable2json C:\var\lib\cassandra\data\Blog\Authors-1-Data.db -k “Ronald Mathies”

You can specify as many keys as you like, just add the -k with the name of the key.

sstable2json C:\var\lib\cassandra\data\Blog\Authors-1-Data.db -k “Ronald Mathies” -k “Eric Long”

It should also be possible to write te results directly to a file using the -f but at the time of writing i was unable to get this to work.

To import data into your Cassandra database you need to use the json2sstable. To import you need to have a file with the contents of an earlier export, i just copy the data and add it to a file, watch out line breaks as this can break the JSon structure.

Word of warning!, i’ve sometimes experienced problems with importing data which resulted in an unreadable database (Cassandra crashed upon starting). My advice, if it is possible keep a copy of the database files when you import data.

The command to import data is as follow:

json2sstable -K Blog -c Authors c:\data.json c:\var\lib\cassandra\data\Blog\Authors-1-Data.db

You will not see any response if it succeeded. We specify the keyspace, column family and the file containing the data we want to import. The last argument is the full path to the database file which will contain the data.

If you see the following error, then you need to add the json_simple-x.x.jar to your /lib folder of your Cassandra installation. You can find the library here: http://code.google.com/p/json-simple/

Exception in thread "main" java.lang.NoClassDefFoundError:
  org/json/simple/parser/ParseException

As you can see it is very easy to create backups this way or to just export some data.

The following two downloads contain the files for exporting and importing data using the json2sstable and sstable2json, extract the files into your /bin folder of your Cassandra installation.

sstable2json.zip (149)
json2sstable.zip (120)


If you have any remarks or questions then please let me know by filling out a comment.

  • Add to favorites
  • Digg
  • del.icio.us
  • DZone
  • Reddit
  • StumbleUpon
  • Slashdot
  • Tumblr
  • Twitter
  • FriendFeed
  • Facebook
  • Google Bookmarks
  • MySpace
  • Faves
  1. May 18th, 2010 at 09:01 | #1

    Could you please provide how to configure “cassandra.yaml” file for linux installation.
    I am having problem in defining “Keyspace” and “CF”.
    Please give an example of defining Keyspace and Column Family in “cassandra.yaml”.

  2. May 18th, 2010 at 09:24 | #2

    @ Sharana

    I am not really familiar with yaml but i do have some hints that might help, first of all there is documentation on the Wiki of Cassandra, however this documention is not displayed by default (it is still hidden) but you can find it here:

    http://wiki.apache.org/cassandra/StorageConfiguration_0.7

    Secondly, if i look at the syntax and i want to add a new Keyspace and ColumnFamily then i think the markup would look like the following:

    keyspaces:
    [standard Keyspace1 definition]
    – name: Blog
    replica_placement_strategy: org.apache.cassandra.locator.RackUnawareStrategy
    replication_factor: 1
    column_families:
    – name: Authors
    compare_with: LongType

    – name: Posts
    column_type: Super
    compare_with: UTF8Type
    compare_subcolumns_with: UTF8Type

    Where the Keyspaces: and [standard Keyspace1 definition] lines are the ones that already exist within the cassandra.yaml file.

    (sorry but the indentation of the various lines fails within the comments, but if you look at the sample within the cassandra.yaml file and in the online documentation you can see the indentation rules)

  1. April 13th, 2010 at 06:14 | #1
  2. April 13th, 2010 at 06:51 | #2
Did you know that it is also possible to register as a user? this enables you to create comments without constantly specifying your name, e-mail and captcha code. Register