Did you know? DZone has great portals for Python, Cloud, NoSQL, and HTML5!
NoSQL Zone is brought to you in partnership with:

I have been in the Software business for close to 30 years now, and I have 25+ years with SQL-based relational databases. Although I have used Unix even longer than that, I am pretty much an operating system agnostic. Over the years, I have work in many positions, from support engineer to sales engineer and consultant. Anders is a DZone MVB and is not an employee of DZone and has posted 15 posts at DZone. You can read more from them at their website. View Full User Profile

Comparing MongoDB with MySQL Cluster as a Key-Value Store

05.24.2012
Email
Views: 1705
  • submit to reddit
This article is part of the DZone NoSQL Resource Portal, which is brought to you in collaboration with Neo Technology and DataStax. Visit the NoSQL Resource Portal for additional tutorials, videos, opinions, and other resources on this topic.
Whoa, it was a long time since I posted here! But I have been very busy with non-MySQL related issues lately, so that is an explanation I guess.

This week I decided to try a few MySQL things anyway, the plan was to compare MongoDB with MySQL Cluster as a key-value store. We have some data here at Recorded Future that is currently in MongoDB, it will not fit in DynamoDB (it has secondary indexes for example) and I was thinking that maybe MySQL Cluster was an alternative, it was some time ago since I tried Cluster anyway.

At Recorded Future, we run everything on Amazon EC2, but I was thinking that this benchmark should be about another thing than just comparing MySQL Cluster with MongoDB, I wanted to see the difference between EC2 and some hard iron.

So, I started downloading some data to my recently constructed Linux server box. This server is a home brew machine housed in a Chieftech 4U rackmount box. There is Asus M5A88V EVO mobo in it, and on that there is an 8-core AMD CPU and 16 Gb RAM, so for a box at home for experimental purposes, it's quite reasonable.

What is not reasonable is how Linux folks treat hardware and user requirements sometimes. I understand that the developers of Linux to a not small extent does this in free time. Also, I understand that stuff sometimes go wrong. But hey, Ubuntu 10.10 (which we use, despite that it is old) is a pretty common Linux distro. On my Mobo there is a Gigibit LAN thingy, as on all Mobos these days, more or less. One of the most common LAN chipsets is from Realtek, either the 8111 or 8168. Seems insignificant, right? No big deal? Just install Linux and it works, Linux may have issues with very unusual hardware, but not with something as common as the Realtek 8111/8168? Beeep! Wrong answer! Sure, it works, but slowly. If you look carefully, you realize that network performance is dead slow, and further investigation shows that this is due to a lot of dropped packets.

Doing an lsmod you can see that Linux (this is Ubuntu 10.10, but this is not uncommon on other Linuxes either) has determined that it wants to use the driver for the much less common Realtek 8169 Gigibit Eithernet chip. These guys are seemingly compatible, hey it works, but it doesn't work well. Back to the drawing board, in this case: Download the module source, for the 8111/8168 then, from Realtek, make a new Linux module, remove and blacklist the r8169 module and then instead install the r8168 module. Yes, I can live with this. But those you are not developers or administrators and wants to use Linux will have issues with this. Look OSS folks, you are doing a good job, but look at your packaging and how you address users.

That said, it was back to my seemingly how 16 Gb Linux box which Linux was thinking had only 3.2 Gb available. Again, this is a Linux Kernel issue. And again it's not that uncommon, if affects certain AMD Mobos with a certain AMD Chipset. Again the patch from AMD for this is simple, but it does require patching the Linux kernel. I would expect stuff to work better than this and to be better tested, but on the other hand, my Mobo is pretty new and Ubuntu 10.10 is pretty old, so OK. But I have much less issues with hardware related stuff with my Windows machines. And before you reply that those guys are paid, I understand that, but I was hoping the power of the Linux community and the better way of developing software that OSS represents should compensate for that. But that is not so it seems, so I guess Linux stays largely between us Geeks for a while, which might be just as well, as that is how I make my money!

Oh now, what happened to the benchmark? Well, instead of benchmarking I have been busy upgrading, downgrading, rebuilding and patching Linux so this never happened. Now I do have a server where Linux can see all 16 Gb of Memory and where the network seems to work OK (I have to admit it, Realtek Sucks. I have been trying to find an alternative, but most PCI boards also have a Realtek chip on them).

But stay tuned, once my box again is properly VPNed to the Recorded Future network I'll install MongoDB again, reimport the data to it, and the convert and load the data into MySQL Cluster and then I am ready for some simple testing. But this is taking WAY longer than I expected!
Published at DZone with permission of Anders Karlsson, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Neo Technology and DataStax are leading the charge for the NoSQL movement.  You can learn more about the Neo4j Graph Database in the project discussion forums and try out the new Spring Data Neo4j, which enables POJO-based development.  You can also see how Apache Cassandra, a ColumnFamily data store, is pushing the boundaries of persistence with cloud capabilities and deployments at SocialFlow and Netflix.