The command line

GNU/Linux, web development and some other things

Accessing Cassandra From Pharo

NoSQL databases are the topic of the day anywhere in the web. So this is good time to put a tutorial for accessing a Cassandra database from a Pharo Smalltalk image using the Thrift interface (there isn’t a high-level client for accessing Cassandra from Pharo yet). Following instructions were tested on a Debian GNU/Linux Squeeze (testing) amd64 laptop. Install the required dependencies As root: aptitude install libboost-dev automake libtool flex bison pkg-config g++ build-essential ruby-dev python-dev Create a working directory As normal user create a working directory (I use my home directory) mkdir /home/miguel/cassandra cd /home/miguel/cassandra Get the thrift svn trunk source code. The current tar.gz package on the download page of Thift doesn’t include the necessary fixes. svn co http://svn.apache.org/repos/asf/incubator/thrift/trunk thrift Update: When this post was originally written, the patch I did for generating correct code for smalltalk wasn’t part of a released version of thrift, that is the reason you had to get it from subversion trunk. But now is integrated and proper releases are out so there is no need to get thrift from svn, you can just get the tar.gz package from the thift download page (currently version 0.4.0): http://incubator.apache.org/thrift/download/ uncompress the tar.gz and you’ll get a folder named (in my case): thrift-0.4.0/ Get the cassandra code Go to http://cassandra.apache.org and download 0.5.1 version of Cassandra (here is the mirror I got, yours will likely be different): wget http://www.devlib.org/apache/cassandra/0.5.1/apache-cassandra-0.5.1-bin.tar.gz tar zxf apache-cassandra-0.5.1-bin.tar.gz Get a Pharo image Go to http://www.pharo-project.org/pharo-download/ and download a Pharo dev or a PharoCore image. I use a PharoCore RC3 image: wget https://gforge.inria.fr/frs/download.php/26668/PharoCore-1.0-10515rc3.zip unzip PharoCore-1.0-10515rc3.zip You now have Thrift, Cassandra and Pharo ready to use. Compile the Thrift source code cd thrift/ cd thrift-0.4.0/ ./bootstrap.sh ./configure make Generate the Smalltalk Thrift code for accessing Cassandra cd .. ./thrift/compiler/cpp/thrift --gen st apache-cassandra-0.5.1/interface/cassandra.thrift ./thrift-0.4.0/compiler/cpp/thrift --gen st apache-cassandra-0.5.1/interface/cassandra.thrift This will generate the file: gen-st/cassandra.st in the /home/miguel/cassandra directory (your working directory). You now have two Smalltalk files: thrift/lib/st/thrift.st gen-st/cassandra.st Load the Smalltalk Thrift code in the Pharo image Open the Pharo image and file-in the two previous files in that order (first thrift.st and then cassandra.st) Start and test the Cassandra server If you have already a Cassandra node, skip this step. If you are testing, stay with me. cd apache-cassandra-0.5.1/ Edit conf/log4.properties, change the line: log4j.appender.R.File=/var/log/cassandra/system.log to: log4j.appender.R.File=/home/miguel/cassandra/var/log/cassandra/system.log Edit conf/storage-conf.xml, change the lines: <CommitLogDirectory>/var/lib/cassandra/commitlog</CommitLogDirectory> <DataFileDirectories> <DataFileDirectory>/var/lib/cassandra/data</DataFileDirectory> </DataFileDirectories> <CalloutLocation>/var/lib/cassandra/callouts</CalloutLocation> <StagingFileDirectory>/var/lib/cassandra/staging</StagingFileDirectory> to: <CommitLogDirectory>/home/miguel/cassandra/var/lib/cassandra/commitlog</CommitLogDirectory> <DataFileDirectories> <DataFileDirectory>/home/miguel/cassandra/var/lib/cassandra/data</DataFileDirectory> </DataFileDirectories> <CalloutLocation>/home/miguel/cassandra/var/lib/cassandra/callouts</CalloutLocation> <StagingFileDirectory>/home/miguel/cassandra/var/lib/cassandra/staging</StagingFileDirectory> Then start the Cassandra server: ./bin/cassandra -f Connect with the Cassandra provided client (Cassandra started on port 9160): ./bin/cassandra-cli --host localhost --port 9160 Insert a value: set Keyspace1.Standard1['jsmith']['first'] = 'John' Read back the value: get Keyspace1.Standard1['jsmith'] Connect from Pharo to the Cassandra server Open a workspace and try inserting 10000 values in the Cassandra server: "Insert 10000 values" [| cp result client | client := CassandraClient binaryOnHost: 'localhost' port: 9160. cp := ColumnPath new columnFamily: 'Standard1'; column: 'col1'. 1 to: 10000 do: [ :i | result := client insertKeyspace: 'Keyspace1' key: 'row', i asString columnPath: cp value: 'v', i asString timestamp: 1 consistencyLevel: ((Cassandra enums at: 'ConsistencyLevel') at: 'QUORUM').]] timeToRun Select the code and “print it”. It took 7326 milliseconds in my laptop. Now read the values from the Cassandra server: "Read 10000 values" [| cp result client | client := CassandraClient binaryOnHost: 'localhost' port: 9160. cp := ColumnPath new columnFamily: 'Standard1'; column: 'col1'. 1 to: 10000 do: [ :i | result := client getKeyspace: ‘Keyspace1’ key: ‘row’, i asString columnPath: cp consistencyLevel: ((Cassandra enums at: ‘ConsistencyLevel’) at: ‘QUORUM’).]] timeToRun Select it and “print it”. It took 7977 milliseconds to read back the 10000 values. Read a value from the cassandra-cli interface: get Keyspace1.Standard1['row999'] you should get: cassandra> get Keyspace1.Standard1['row999'] => (column=col1, value=v999, timestamp=1) Returned 1 results. That is it. Adapt the code to your needs. Cheers