mongo performance, tokumx

LaszloSebo · September 23, 2014, 8:07pm

Have you guys looked at this?

tokutek.com/products/tokumx-for-mongodb/

LaszloSebo · September 23, 2014, 10:12pm

blog.scrapinghub.com/2013/05/13/ … aped-data/

I wonder if mongo is simply not a good choice for something like deadline…

This pretty much describes our current situation:

“Locking
We have a large volume of short queries which are mostly writes from web crawls. These rarely cause problems as they are fast to execute and the volumes are quite predictable. However, we have a lower volume of longer running queries (e.g. exporting, filtering, bulk deleting, sorting, etc.) and when a few of these run at the same time we get lock contention.
Each MongoDB database (server prior to 2.2) has a Readers-Writer lock. Due to lock contention all the short queries need to wait longer and the longer running queries get much longer! Short queries take so long they time out and are retried. Requests from our website (e.g. users browsing data) take so long that all worker threads in our web server get blocked querying MongoDB. Eventually the website and all web crawls stop working!”

This also applies:

"Poor space efficiency

MongoDB does not automatically reclaim disk space used by deleted objects and it is not feasible (due to locking) to manually reclaim space without substantial downtime. It will attempt to reuse space for newly inserted objects, but we often end up with very fragmented data. Due to locking, it’s not possible for us to defragment without downtime."

Coulter · September 24, 2014, 5:21pm

Here’s my quick take on on it:

TokuMX. I am skeptical of their claims, although fractals are more powerful that b-trees so I suppose it’s possible. If it truly is a “drop-in replacement” for the standard MongoDB, we might be able to run some tests. There are other considerations like drivers, support, and project longevity (will TokuMX be around two years from now?). And it appears they currently only support Linux.

The Scrapinghub blog had some valid points, but some of their points are somewhat dated, as pointed out in the comments section. They also have extremely large databases, and I don’t think the state set that Deadline generates would ever be terabytes large (but anything’s possible). Their solution was to switch to HBase, which makes sense for massive datasets, but Hadoop technology is also very complex to properly deploy and maintain. That said, we never rule out the possibility of switching out the backend for Deadline.

In any case, this is good research material and stimulates thought. Keep it coming!

rrussell · September 24, 2014, 8:11pm

An issue with TokuMX is that it’s appears to be a fork off of Mongo 2.4, so while it might be a drop in replacement for Mongo 2.4, it won’t be for 2.6 (which Deadline 7 uses). Also, some of their features don’t work when sharding.

I did some quick reading on what’s coming in Mongo 2.8 as well, and it looks like we’re finally getting document-level locking!!
eliothorowitz.com/blog/2014/ … he-future/

It’s supposed to be available this year, but I’m guessing that’s for beta testing. Regardless, we’ll be watching for this release.

Cheers,
Ryan

LaszloSebo · September 24, 2014, 8:24pm

Here’s hoping!