NoSQL databases are the topic of the day anywhere in the web.
So this is good time to put a tutorial for accessing a Cassandra database from a Pharo Smalltalk image using the Thrift interface (there isn’t a high-level client for accessing Cassandra from Pharo yet). Following instructions were tested on a Debian GNU/Linux Squeeze (testing) amd64 laptop.
Install the required dependencies
As root:
aptitude install libboost-dev automake libtool flex bison pkg-config g++ build-essential ruby-dev python-dev
Create a working directory
As normal user create a working directory (I use my home directory)
mkdir /home/miguel/cassandra
cd /home/miguel/cassandra
Get the thrift svn trunk source code.
The current tar.gz package on the download page of Thift doesn’t include the necessary fixes.
svn co http://svn.apache.org/repos/asf/incubator/thrift/trunk thrift
Update: When this post was originally written, the patch I did for generating correct code for smalltalk wasn’t part of a released version of thrift, that is the reason you had to get it from subversion trunk. But now is integrated and proper releases are out so there is no need to get thrift from svn, you can just get the tar.gz package from the thift download page (currently version 0.4.0):
http://incubator.apache.org/thrift/download/
uncompress the tar.gz and you’ll get a folder named (in my case):
thrift-0.4.0/
Get the cassandra code
Go to http://cassandra.apache.org and download 0.5.1 version of
Cassandra (here is the mirror I got, yours will likely be different):
wget http://www.devlib.org/apache/cassandra/0.5.1/apache-cassandra-0.5.1-bin.tar.gz
tar zxf apache-cassandra-0.5.1-bin.tar.gz
Get a Pharo image
Go to http://www.pharo-project.org/pharo-download/ and download a Pharo dev or a PharoCore image. I use a PharoCore RC3 image:
wget https://gforge.inria.fr/frs/download.php/26668/PharoCore-1.0-10515rc3.zip
unzip PharoCore-1.0-10515rc3.zip
You now have Thrift, Cassandra and Pharo ready to use.
Compile the Thrift source code
cd thrift/
cd thrift-0.4.0/
./bootstrap.sh
./configure
make
Generate the Smalltalk Thrift code for accessing Cassandra
cd ..
./thrift/compiler/cpp/thrift --gen st apache-cassandra-0.5.1/interface/cassandra.thrift
./thrift-0.4.0/compiler/cpp/thrift --gen st apache-cassandra-0.5.1/interface/cassandra.thrift
This will generate the file:
gen-st/cassandra.st
in the /home/miguel/cassandra directory (your working directory).
You now have two Smalltalk files:
thrift/lib/st/thrift.st
gen-st/cassandra.st
Load the Smalltalk Thrift code in the Pharo image
Open the Pharo image and file-in the two previous files in that order (first thrift.st and then cassandra.st)
Start and test the Cassandra server
If you have already a Cassandra node, skip this step. If you are testing, stay with me.
cd apache-cassandra-0.5.1/
Edit conf/log4.properties, change the line:
log4j.appender.R.File=/var/log/cassandra/system.log
to:
log4j.appender.R.File=/home/miguel/cassandra/var/log/cassandra/system.log
Edit conf/storage-conf.xml, change the lines:
<CommitLogDirectory>/var/lib/cassandra/commitlog</CommitLogDirectory>
<DataFileDirectories>
<DataFileDirectory>/var/lib/cassandra/data</DataFileDirectory>
</DataFileDirectories>
<CalloutLocation>/var/lib/cassandra/callouts</CalloutLocation>
<StagingFileDirectory>/var/lib/cassandra/staging</StagingFileDirectory>
to:
<CommitLogDirectory>/home/miguel/cassandra/var/lib/cassandra/commitlog</CommitLogDirectory>
<DataFileDirectories>
<DataFileDirectory>/home/miguel/cassandra/var/lib/cassandra/data</DataFileDirectory>
</DataFileDirectories>
<CalloutLocation>/home/miguel/cassandra/var/lib/cassandra/callouts</CalloutLocation>
<StagingFileDirectory>/home/miguel/cassandra/var/lib/cassandra/staging</StagingFileDirectory>
Then start the Cassandra server:
./bin/cassandra -f
Connect with the Cassandra provided client (Cassandra started on port 9160):
./bin/cassandra-cli --host localhost --port 9160
Insert a value:
set Keyspace1.Standard1['jsmith']['first'] = 'John'
Read back the value:
get Keyspace1.Standard1['jsmith']
Connect from Pharo to the Cassandra server
Open a workspace and try inserting 10000 values in the Cassandra server:
"Insert 10000 values"
[| cp result client |
client := CassandraClient binaryOnHost: 'localhost' port: 9160.
cp := ColumnPath new
columnFamily: 'Standard1';
column: 'col1'.
1 to: 10000 do: [ :i |
result := client insertKeyspace: 'Keyspace1'
key: 'row', i asString
columnPath: cp
value: 'v', i asString
timestamp: 1
consistencyLevel: ((Cassandra enums at: 'ConsistencyLevel') at: 'QUORUM').]] timeToRun
Select the code and “print it”. It took 7326 milliseconds in my laptop.
Now read the values from the Cassandra server:
"Read 10000 values"
[| cp result client |
client := CassandraClient binaryOnHost: 'localhost' port: 9160.
cp := ColumnPath new
columnFamily: 'Standard1';
column: 'col1'.
1 to: 10000 do: [ :i |
result := client getKeyspace: ‘Keyspace1’
key: ‘row’, i asString
columnPath: cp
consistencyLevel: ((Cassandra enums at: ‘ConsistencyLevel’) at: ‘QUORUM’).]] timeToRun
Select it and “print it”. It took 7977 milliseconds to read back the 10000 values.
Read a value from the cassandra-cli interface:
get Keyspace1.Standard1['row999']
you should get:
cassandra> get Keyspace1.Standard1['row999']
=> (column=col1, value=v999, timestamp=1)
Returned 1 results.
That is it. Adapt the code to your needs.
Cheers