07 Jun 2010

How Fluent Cassandra Handles Runtime Types

Today I had the question come up about some wonky behavior with retrieving data from Cassandra for non-string types. Here is the issue in a nut-shell:

dynamic obj = record.CreateSuperColumn();
obj.Id = 1234;
obj.CreatedOn = DateTime.Now;
obj.IsOnline = true;

// blah blah blah save to database and retrieve

Console.WriteLine(obj.Id);  // (some unprintable characters)
Console.WriteLine(obj.CreatedOn);  // (some unprintable characters)
Console.WriteLine(obj.IsOnline);  // (some unprintable characters)

To understand why this is happening we first must talk about how Cassandra stores data in the database. Cassandra stores everything by columns either by super-column or a regular column but for the sake of this post we are just going to talk about regular columns. These regular columns are made up for three properties:

Property	Type
Name	CompareWith type
Value	binary
Timestamp	64-bit integer

The Name with CompareWith type is set in the configuration and can be ASCII, UTF8, LexicalUUID, TimeUUID, Long, or Bytes. In other words in the .NET world they can be string, Guid, DateTime, long, or byte[]. The Value can only be the Bytes or byte[] type. And the Timestamp is used for synchronization between Cassandra servers and shouldn’t be directly controlled. To relate back to the type conversion problem that I mentioned above, we need to take a deeper look at what happens to the Value property of the column when it is set and saved.

From when you set a property to your chosen type to when it is saved in Cassandra it goes through a two steps that you probably aren’t aware of, first the type is serialized and stored in Fluent Cassandra’s flexible BytesType that is intelligent enough to understand how to serialize common runtime types in to binary so that you as the developer doesn’t have to worry about interacting with the Cassandra database at a low level. This intelligent type system is also the major driver behind the ASCII, UTF8, LexicalUUID, TimeUUID, Long, and Bytes type that also help serialize the Name property of the column correctly.

However the issue as alluded to in the beginning of the article comes when you are retrieving the object out of Fluent Cassandra.

Fluent Cassandra when pulling a column out of the database only has the binary data to work with, and thus doesn’t know which of the runtime types to convert it to. That is why we need to explicitly tell Fluent Cassandra what type we need this property to be desterilized to. We do that by casting the property to the type we want it retrieved as, to build on the example above we would get the column values in the following way:

Console.WriteLine((int)obj.Id);  // 1234
Console.WriteLine((DateTime)obj.CreatedOn);  // 2010-6-7 12:30:38 PM
Console.WriteLine((bool)obj.IsOnline);  // true

The act of casting is enough to tell the BytesType object how the binary data should be desterilized in to a runtime type that is understood by .NET. This is all done through a lot of operator magic, but the result is the same. You get the type you entered in to the database out of the database.

I think this is pretty straight forward once you understand what is happening in the backend. But I am open for suggestions if you have a better idea on how deserialization can be handled in a more straight forward manor. I am currently working on the support for complex types, but right now the following types are supported to be serialized in to column values:

byte[], byte, sbyte
short, ushort, int, uint, long, ulong
float, double
decimal
bool
string
char
Guid
DateTime, DateTimeOffset

All other types will throw compiler errors for the time being. I am working on a way to use binary serialization to store other types, but I am not current happy with the interface, because it is not as straight forward as the above. Again if you have suggestions, I would love to hear them.

06 Jun 2010

Your First Fluent Cassandra Application (part 2)

8 Comments How To, Personal, Portfolio

Last time I demonstrated how to create your first Fluent Cassandra app. After we finished learning about how to create records and save them to the database, I issued a challenge to implement comments for our command line blog app we created. I hinted at how I would have done it with this column family configuration:

<ColumnFamily Name="Comments"
    ColumnType="Super"
    CompareWith="TimeUUIDType"
    CompareSubcolumnsWith="UTF8Type" />

And this is what we are going to implement today.

Basic Structure

The basic information of our blog’s comments we need to keep, is the standard information that you would expect from any blog comment.

Name
Email
Website
Comment
Date

However in Cassandra we aren’t going to use the standard flat table that you might see in an RDBMS system, where the comment row contains all the information in the bullet list above, plus a reference to the post identity, all summed up under a comment identity. In a column based database like Cassandra we would use a structure that looks like this:

key:

“first-blog-post”

super column name:

2010-6-3 12:43:00 AM (in Time UUID)

name:

“Nick Berardi”

email:

“[email protected]”

website:

“www.coderjournal.com”

comment:

“Wow fluent cassandra is really neeto…”

super column name:

2010-6-3 3:12:33 PM (in Time UUID)

name:

“Joe User”

email:

“[email protected]”

website:

“”

comment:

“I agree with you Nick!”

The first thing you might notice is that the key for our comments family is going to be the same as the key for our posts family. This is done to tie the contents of the two tables together under one comment lookup entity. The next thing you may notice is that the super column name isn’t actually a string, it is a Time UUID or for you .NET people a System.Guid that stores the date time. And then the last thing is the actual property columns for all the meta data we want to store about each comment.

Coding The Comments

We are going to pick up where we left off in the last post. If you want to follow along, open up your previous project from the last post, or use the file located here.

The first thing we need to do, as we did with the posts, is to get the repository for the comments column family.

// get the comments family
var commentsFamily = db.GetColumnFamily<TimeUUIDType, UTF8Type>("Comments");

Then we need to create the record for adding the comments against, as we did for the tags and post details in the previous post:

dynamic postComments = commentsFamily.CreateRecord(key: "first-blog-post");

And this time lets attach the postComments to the database ahead of time, so that it tracks the changes as they are made.

// lets attach it to the database before we add the comments
db.Attach(postComments);

Now lets create 5 comments that are 5 seconds apart from each other to give us some data to play with in the database, and then save the changes off to the database.

// add 5 comments
for (int i = 0; i < 5; i++)
{
    dynamic comment = postComments.CreateSuperColumn();
    comment.Name = i + " Nick Berardi";
    comment.Email = i + " [email protected]";
    comment.Website = i + " www.coderjournal.com";
    comment.Comment = i + " Wow fluent cassandra is really great and easy to use.";

    postComments[GuidGenerator.GenerateTimeBasedGuid()] = comment;

    Console.WriteLine("Comment " + i + " Done");
    Thread.Sleep(TimeSpan.FromSeconds(5));
}

// save the comments
db.SaveChanges();

Now that we have 5 comments in the database stored for our blog post, we should probably query them out:

DateTime lastDate = DateTime.Now;

for (int page = 0; page < 2; page++)
{

Since comments are sometimes paged, we are going to query two pages of comments separately from the database for our blog post. Our comments are stored by date, so we need to pull them out of the database by date. This is done by starting at the current date and querying backwards.

// lets back the date off by a millisecond so we don't get paging overlaps
lastDate = lastDate.AddMilliseconds(-1D);

Console.WriteLine("Showing page " + page + " starting at " + lastDate.ToLocalTime());

var comments = commentsFamily.Get("first-blog-post")
    .Reverse()
    .Fetch(lastDate)
    .Take(3)
    .FirstOrDefault();

The above is a little more complex than our last query, but easy enough to understand the basic premise of what it is doing, because of the descriptive fluent interface. Since we are querying by date it is easiest to pull them out in the reverse order of LIFO (last-in-first-out). To do this we use a method called Reverse, which does exactly what it sounds like, reverses the column order. Then we are going to Fetch a column starting at our lastDate and Take 3 columns for our page. And to finish it off since we are only querying one key, we are going to use the LINQ method FirstOrDefault to return our queried records back to us.

If the above query was SQL it would look something like this:

SELECT TOP(3) *
FROM comments
WHERE commented_on <= getdate()

Now that we have our comments, lets display the comment as we did for the post in the previous article.

foreach (dynamic comment in comments)
{
    var dateTime = GuidGenerator.GetDateTime((Guid)comment.ColumnName);

    Console.WriteLine(String.Format("{0:T} : {1} ({2} - {3})",
        dateTime.ToLocalTime(),
        comment.Name,
        comment.Email,
        comment.Website
    ));

    lastDate = dateTime;
}

Nothing really mind blowing is happening here, we use the column name (our Time UUID) to extract the date, and then we display the properties for the comments. There is a subtle part of the code at the bottom of the foreach loop where we set the date to the lastDate. This is done to keep track of the last date we pulled out of the database so we can requery by that date when we pull the comments from the database for the second page. You may or may have not noticed this code in the above statement:

// lets back the date off by a millisecond so we don't get paging overlaps
lastDate = lastDate.AddMilliseconds(-1D);

But this is used so we don’t pull back the same comment over again.

Fun Part

The fun part for me is hitting the run button and waiting to see if everything is working as I intended. If everything is working as expected this is what the output will look like for our new comments section.

Comment 0 Done
Comment 1 Done
Comment 2 Done
Comment 3 Done
Comment 4 Done
Showing page 0 starting at 6/6/2010 9:13:22 AM
9:13:17 AM : 4 Nick Berardi (4 [email protected] - 4 www.coderjournal.com)
9:13:12 AM : 3 Nick Berardi (3 [email protected] - 3 www.coderjournal.com)
9:13:07 AM : 2 Nick Berardi (2 [email protected] - 2 www.coderjournal.com)
Showing page 1 starting at 6/6/2010 9:13:07 AM
9:13:02 AM : 1 Nick Berardi (1 [email protected] - 1 www.coderjournal.com)
9:12:57 AM : 0 Nick Berardi (0 [email protected] - 0 www.coderjournal.com)

We added in our 5 comments and and then we pulled back 2 pages of up to 3 comments each.

Pretty neat huh?

02 Jun 2010

Your First Fluent Cassandra Application

17 Comments How To, Personal, Portfolio

As your are probably aware by now if you follow my Twitter status or have looked in to some of my recent posts. I am developing a library called FluentCassandra which is a .NET library for using the Cassandra database in a .NETty way. The project has progressed quite nicely in the last couple of months and I am finally ready to start talking about it and giving examples on how it can be used in your applications. So lets gets started…

Step 1)

The first thing we need to do is make sure that your machine is properly setup to run Cassandra. Back in March I put together a jump start for Windows developers to do just that. So if you don’t have it running on your machine already, start there.

Step 2)

The next thing we need to do is to locate and configure the database storage-conf.xml file, which was referenced in the previous steps instructions.

Open the storage-conf.xml in your favorite text editor.

Add the following to the <Keyspaces /> tag in the file:

<Keyspace Name="Blog">
    <ColumnFamily Name="Posts"
        ColumnType="Super"
        CompareWith="UTF8Type"
        CompareSubcolumnsWith="UTF8Type" />

    <ReplicaPlacementStrategy>org.apache.cassandra.locator.RackUnawareStrategy</ReplicaPlacementStrategy>
    <ReplicationFactor>1</ReplicationFactor>
    <EndPointSnitch>org.apache.cassandra.locator.EndPointSnitch</EndPointSnitch>
</Keyspace>

Save it.

The above configuration creates one Column Family (or table in RDBMS speak) called Posts in a Keyspace (or database in RDBMS speak) called Blog. We are going to use this column family in our code below.

Step 3)

Next grab a copy of FluentCassandra from http://github.com/managedfusion/fluentcassandra

Create your your own console app or use FluentCassandra.Sandbox console app provided in the source downloaded.

Step 4)

Now for the fun part, the coding.

The first thing we need to do is create a context for the database entities that we are going to save. This is done with the CassandraContext.

using (var db = new CassandraContext(keyspace: "Blog", host: "localhost"))
{

The above code creates a Cassandra Context for the Blog Keyspace on our local Cassandra database. After we have done this we want to get a reference to the family that we are going to execute our saves against. This is done by getting the column family with the CompareWith and CompareSubcolumnWith types we specified in the above storage-conf.xml.

var family = db.GetColumnFamily<UTF8Type, UTF8Type>("Posts");

In the above code the first generic parameter is the CompareWith parameter and the second generic parameter is the CompareSubcolumnsWith parameter. This creates the family repository that can be used to execute CRUD commands against this column family.

Now that we have all this setup lets actually create a post record, with a key called “first-blog-post”.

// create post
dynamic post = family.CreateRecord(key: "first-blog-post");

The easiest way to accomplish this is to use the method provided in the family object for creating the properly typed record for use. This object will be used in a little while but first we need to create two super columns with the details of our blog post and the tags associated with the blog post. This is done by using the CreateSuperColumn method on the post object we just created.

// create post details
dynamic postDetails = post.CreateSuperColumn();
postDetails.Title = "My First Cassandra Post";
postDetails.Body = "Blah. Blah. Blah. about my first post on how great Cassandra is to work with.";
postDetails.Author = "Nick Berardi";
postDetails.PostedOn = DateTimeOffset.Now;

// create post tags
dynamic tags = post.CreateSuperColumn();
tags[0] = "Cassandra";
tags[1] = ".NET";
tags[2] = "Database";
tags[3] = "NoSQL";

This creates two super column objects postDetails and tags that each contain their own set of columns. In the case of the post details it contains information about the posts title, content body, author, and when it was posted on. In the case of the tags it contains an array where each item in the array is a new column. We will talk about why this works in a future post, but accept for now that it does work, even though one is used as an object with a bunch of properties and one is used as an array with a bunch of elements.

Lets now add the details and tags to our post record that we created above.

// add properties to post
post.Details = postDetails;
post.Tags = tags;

Just like the details above we are going to treat the post record as an object with properties. This will complete our entire record that we want to save to the database. Now lets attach it and save our record to the database.

// attach the post to the database
Console.WriteLine("attaching record");
db.Attach(post);

// save the changes
Console.WriteLine("saving changes");
db.SaveChanges();

So we have now done our first Cassandra database insert. But that is only half the fun, lets read it back out of the database. As with the write, we are going to use the same family object to do the read from the database. The first thing we need to do is get the record out of the database using the same key, “first-blog-post”.

// get the post back from the database
Console.WriteLine("getting 'first-blog-post'");
dynamic getPost = family.Get("first-blog-post").FirstOrDefault();

The above code uses the LINQ-like syntax to retrieve the record. This LINQ-like syntax can be started using the method Get on the family object. And it then can be executed with any LINQ operation, in our case above we are using FirstOrDetault method. The next thing we want to see is the details of the post, which can be easily retrieved using the same object structure that we put them in the database as.

// show details
dynamic getPostDetails = getPost.Details;
Console.WriteLine(
    String.Format("=={0} by {1}==\n{2}",
        getPostDetails.Title,
        getPostDetails.Author,
        getPostDetails.Body
    ));

And now for the tags, which we are going to query in a way more suitable for an array.

// show tags
Console.Write("tags:");
foreach (var tag in getPost.Tags)
    Console.Write(String.Format("{0}:{1},", tag.Name, tag.Value));

Finish it off with this code, and we will be ready to run our first Cassandra application.

}

Console.Read();

Step 5)

The first thing we need to do to run our application is to make sure the database is running. This may sound like a no-duh moment, but if you are use to SQL Server development, you really never have to make sure the database is running, so I just like to mention it. If you don’t remember how to do this, go back to Step 1 and look at the instructions for starting the database.

Now lets run the application and see what results. If everything ran correctly you will receive the following output.

attaching record
saving changes
getting 'first-blog-post'
==My First Cassandra Post by Nick Berardi==
Blah. Blah. Blah. about my first post on how great Cassandra is to work with.
tags:0:Cassandra,1:.NET,2:Database,3:NoSQL,

Step 6)

As a follow up exercise, see if you can add comments. Hint you will need a new super column family as defined here:

<ColumnFamily Name="Comments"
    ColumnType="Super"
    CompareWith="TimeUUIDType"
    CompareSubcolumnsWith="UTF8Type" />

Hope this was an interesting exercise, and if you see any way to improve the interface or want to help out on the project please start by going to http://github.com/managedfusion/fluentcassandra.

Don’t forget to check out part 2 of this series.

12 3 ›»