There have been some concerns over Mongo’s disk space usage, so this post is here to address those concerns. This is meant to be a summary, but for those wanting a more in-depth read, click here:
mongodb.org/display/DOCS/Exc … Disk+Space
For performance reasons, and to help avoid fragmentation of data, Mongo likes to aggressively pre-allocate storage space on disk. It pre-allocates this space by creating datafiles. It starts by creating a 64 MB datafile, then a 128 MB datafile, then it keeps doubling in size until it creates a 2 GB datafile. After this point, every new datafile will be 2 GB. Mongo will create a new datafile when it adds data to the previously created datafile for the first time.
What this means is that when you start using Deadline, the Mongo database will already be 200 MB in size because you will have a 64 MB datafile and a 128 MB datafile on disk. At this point, the 128 MB datafile is “empty”, and only exists because data has been added to the 64 MB datafile. After submitting and rendering some jobs, the 128 MB datafile will have data added to it, and at that point, Mongo will create the 256 MB datafile.
When you delete jobs from Deadline, Mongo will free up that space in the datafiles, but it will never delete the datafiles themselves. So as long as you’re cleaning up your jobs at a reasonable rate, your database will probably never grow beyond a certain size. Just note that it will never shrink below its current size.
So hopefully this gives you an understanding of how Mongo’s disk usage works. This can give the impression that Mongo will eat up all of your disk space quickly as your job count gets higher, but this is not the case. It’s one thing for us to just say that, so we’ve backed that up with some data.
We used completed jobs for these tests, and each had 100 tasks and 100 log reports (1 log per completed task), so this is a good representation of the average job that has been rendered by Deadline. The green line is the size of the data within the datafiles, and the purple line is the size of the datafiles on disk.
You’ll see that the size on disk was already at 2 GB after only 500 jobs, but it didn’t grow to 4 GB until somewhere between 5000 and 10,000 jobs. Also, it stayed at 4 GB for 20,000 jobs, and didn’t grow to 6 GB until somewhere between 20,000 and 50,000 jobs. So this data proves that Mongo’s diskspace growth is not linear, and tapers off after its initial growth.
Finally, here is some additional info we gathered after hitting the 50,000 job mark:
- Mongo’s memory usage was sitting at about 3 GB. So if 50,000 jobs is on the high end for you, a machine with 8 GB of RAM will probably suffice.
- The Monitor took 60 seconds to load them when initially launched. However, the Monitor was interactive and responsive during this loading.
- During the Monitor’s initial load, Mongo’s CPU usage sat at 0 (and this was with 10 slaves constantly querying it for jobs).
- The slaves found active jobs as fast as they did when there was only a few hundred jobs in the queue.
So if you had concerns about Deadline 6’s scalability, this information should address them!