Mongo Disk Space Usage

rrussell · December 20, 2012, 9:34pm

There have been some concerns over Mongo’s disk space usage, so this post is here to address those concerns. This is meant to be a summary, but for those wanting a more in-depth read, click here:
mongodb.org/display/DOCS/Exc … Disk+Space

For performance reasons, and to help avoid fragmentation of data, Mongo likes to aggressively pre-allocate storage space on disk. It pre-allocates this space by creating datafiles. It starts by creating a 64 MB datafile, then a 128 MB datafile, then it keeps doubling in size until it creates a 2 GB datafile. After this point, every new datafile will be 2 GB. Mongo will create a new datafile when it adds data to the previously created datafile for the first time.

What this means is that when you start using Deadline, the Mongo database will already be 200 MB in size because you will have a 64 MB datafile and a 128 MB datafile on disk. At this point, the 128 MB datafile is “empty”, and only exists because data has been added to the 64 MB datafile. After submitting and rendering some jobs, the 128 MB datafile will have data added to it, and at that point, Mongo will create the 256 MB datafile.

When you delete jobs from Deadline, Mongo will free up that space in the datafiles, but it will never delete the datafiles themselves. So as long as you’re cleaning up your jobs at a reasonable rate, your database will probably never grow beyond a certain size. Just note that it will never shrink below its current size.

So hopefully this gives you an understanding of how Mongo’s disk usage works. This can give the impression that Mongo will eat up all of your disk space quickly as your job count gets higher, but this is not the case. It’s one thing for us to just say that, so we’ve backed that up with some data.

We used completed jobs for these tests, and each had 100 tasks and 100 log reports (1 log per completed task), so this is a good representation of the average job that has been rendered by Deadline. The green line is the size of the data within the datafiles, and the purple line is the size of the datafiles on disk.

You’ll see that the size on disk was already at 2 GB after only 500 jobs, but it didn’t grow to 4 GB until somewhere between 5000 and 10,000 jobs. Also, it stayed at 4 GB for 20,000 jobs, and didn’t grow to 6 GB until somewhere between 20,000 and 50,000 jobs. So this data proves that Mongo’s diskspace growth is not linear, and tapers off after its initial growth.

Finally, here is some additional info we gathered after hitting the 50,000 job mark:

Mongo’s memory usage was sitting at about 3 GB. So if 50,000 jobs is on the high end for you, a machine with 8 GB of RAM will probably suffice.
The Monitor took 60 seconds to load them when initially launched. However, the Monitor was interactive and responsive during this loading.
During the Monitor’s initial load, Mongo’s CPU usage sat at 0 (and this was with 10 slaves constantly querying it for jobs).
The slaves found active jobs as fast as they did when there was only a few hundred jobs in the queue.

So if you had concerns about Deadline 6’s scalability, this information should address them!

im_thatoneguy · December 20, 2012, 10:16pm

Thanks, so it is bad with the 32bit 2GB limit but on a normal 64 bit machine doesn’t really grow much if at all until extremely large dbs.

clutchstudios · February 5, 2013, 7:36pm

I have encountered a problem related to the mongo db file size growth. I understand and see how it increases it’s segments. But it seems my install has jumped quite substantially with very few job submittals. I have been using the same install of mongo since early beta in November. I noticed today when installing DL Beta11 that the mongoDB has increased to over 16GB in the increments you describe. All of our beta testing has been isolated and by no stretch have we submitted 100’s let alone 1000’s of jobs that could grow the database to any appreciable size.
it looks like the majority of data growth occurred in a 2 day period Jan.15-16. (screen grab attached) Would this be due to a slave throwing errors constantly for a couple of days?

mongoDB FEB2013.pdf (119 KB)

is there anyway to compress or at least be sure that it stops it’s exponential growth of 2GB chunks as it did for the two days last month.

if cant compress would it be suggested to create and use a new DB.

Is there procedure to create a new DB once repository has linked to its current mongo instance? And what is the mongo procedure to forget and delete the current DB.

Also, how can I prevent this type of DB growth going forward.

-kind regards

kevin v
Clutch Studios
Chicago USA

jgaudet · February 6, 2013, 9:16pm

Hey Kevin,

It definitely shouldn’t be exploding like that; I’m guessing a Slave must’ve gotten in a real nasty loop that kept creating new records non-stop. Given that we’ve put out a couple builds since then, it’s possible we’ve already fixed it, but obviously we’d like to be 100% sure this won’t happen again (I feel like we put a hard cap on Slave Reports fairly recently, for instance)

Do you know which beta version you were running back then? Can you remember if anything wonky was happening over those two days? Were slaves just churning out thousands upon thousands of error reports or something like that? It might also be worth checking the logs in the ‘C:\mongo\data\logs’ folder to see if it goes back to those couple days and see if there’s anything abnormal-looking in there, if you haven’t already.

If you’re comfortable using a command console, you can help narrow down what type of object is causing this crazy ballooning by connecting to your Mongo database through their client shell (using 'mongo.exe '). Once you’ve connected through the shell, you need to switch to the DeadlineDB with this command: ‘use deadlinedb’. Then, use the command ‘db.printCollectionStats()’ – this should give you a text dump with a lot of collection-specific stats, if you could post those here that’d be a huge help.

While you’re in the shell, you can also try to do a ‘db.repairDatabase()’ if you want to try compacting the database to reclaim some disk space – keep in mind that this will take a while, will impose a global DB write lock, and requires a bunch of extra disk space for the duration of the process.

Cheers,

Jon

clutchstudios · February 11, 2013, 3:49pm

Greetings Jon, Thanks for the reply,

"Do you know which beta version you were running back then? "

I would say I had recently loaded Beta9. We had skipped a couple of builds during the holiday period and were just getting back to testing.

“worth checking the logs”

Log file during that period shows thousands of lines with the following output - (Log has grown to 110MB)

Tue Jan 15 16:51:03 [conn15] info DFM::findAll(): extent 0:da000 was empty, skipping ahead. ns:deadlinedb.SlaveReports
Tue Jan 15 16:51:42 [conn15] info DFM::findAll(): extent 0:da000 was empty, skipping ahead. ns:deadlinedb.SlaveReports
Tue Jan 15 16:52:30 [conn18] info DFM::findAll(): extent 0:da000 was empty, skipping ahead. ns:deadlinedb.SlaveReports
Tue Jan 15 16:53:14 [conn15] info DFM::findAll(): extent 0:da000 was empty, skipping ahead. ns:deadlinedb.SlaveReports
Tue Jan 15 16:53:28 [conn18] update deadlinedb.SlaveInfo query: { _id: “clutchcg4-pc” } update: { _id: “clutchcg4-pc”, LastWriteTime: new Date(1358290403881), Host: “ClutchCG4-PC”, User: “ClutchCG4”, IP: “fe80::cd62:b65c:e6f4:6282%13”, MAC: “00:25:90:24:D1:1F”, Procs: 24, RAM: 25760317440, RAMFree: 9987772416, Disk: 1046206529536, DiskStr: “974.356 GB (96.650 GB on C:, 877.706 GB on D:)”, ProcSpd: 2668, Arch: “x64”, CPU: 0, OS: “Windows 7 Professional (SP1)”, Ver: “v6.0.0.49761 R”, Up: 3365.103515625, Vid: “NVIDIA Quadro 4000”, Name: “CLUTCHCG4-PC”, LicFree: false, LicPerm: true, LicEx: -1, Lic: “(could not detect servers)”, JobPlug: “Modo”, JobUser: “ClutchCG4”, JobName: “FLHTCUTG_OE_BB_Update”, JobId: “50d39bbcfdd5ff39c8fde32c”, JobPri: 50, JobPool: “none”, JobGrp: “none”, TskName: “1”, TskId: “0”, TskProg: “0%”, TskStat: “”, Stat: 1, StatDate: new Date(1358287045801), OnTskComp: “Continue Running”, Pools: “”, Grps: “”, Limits: {}, BadJobs: 0, TskComp: 0, TskFail: 501, RndTime: 3362.915283203125, Msg: “2013/01/15 15:57:18 Slave started”, Pulse: false, Port: 49956 } idhack:1 nupdated:1 keyUpdates:0 locks(micros) w:43 124ms
Tue Jan 15 16:53:54 [conn19] info DFM::findAll(): extent 0:da000 was empty, skipping ahead. ns:deadlinedb.SlaveReports
Tue Jan 15 16:54:37 [conn19] info DFM::findAll(): extent 0:da000 was empty, skipping ahead. ns:deadlinedb.SlaveReports
Tue Jan 15 16:55:10 [conn14] end connection 192.168.7.144:49212 (8 connections now open)
Tue Jan 15 16:55:10 [initandlisten] connection accepted from 192.168.7.144:49216 #23 (10 connections now open)
Tue Jan 15 16:55:22 [conn15] end connection 192.168.7.144:49213 (8 connections now open)
Tue Jan 15 16:55:22 [conn22] info DFM::findAll(): extent 0:da000 was empty, skipping ahead. ns:deadlinedb.SlaveReports
Tue Jan 15 16:55:53 [conn16] end connection 192.168.7.144:49214 (7 connections now open)
Tue Jan 15 16:55:53 [initandlisten] connection accepted from 192.168.7.144:49217 #24 (9 connections now open)
Tue Jan 15 16:56:05 [conn22] info DFM::findAll(): extent 0:da000 was empty, skipping ahead. ns:deadlinedb.SlaveReports
Tue Jan 15 16:56:44 [conn22] info DFM::findAll(): extent 0:da000 was empty, skipping ahead. ns:deadlinedb.SlaveReports
Tue Jan 15 16:56:52 [conn17] end connection 192.168.7.159:50734 (7 connections now open)
Tue Jan 15 16:56:52 [initandlisten] connection accepted from 192.168.7.159:51305 #25 (9 connections now open)
Tue Jan 15 16:57:30 [conn22] info DFM::findAll(): extent 0:da000 was empty, skipping ahead. ns:deadlinedb.SlaveReports
Tue Jan 15 16:57:31 [conn18] end connection 192.168.7.159:50745 (7 connections now open)
Tue Jan 15 16:57:41 [conn19] end connection 192.168.7.159:50750 (6 connections now open)

" use the command ‘db.printCollectionStats()’ -"

see attached

" can also try to do a ‘db.repairDatabase()’ "

Done!-- Worked instantly and ate all the extra segments it had created.

-Cheers!

kevin
Mongo CollectionStatsFEB11.txt (6.91 KB)

jgaudet · February 13, 2013, 5:59pm

Cool, thanks for the stats! It confirmed my suspicion that it was report-related. We’ll be making improvements to reduce their individual footprint, and probably put a hard cap on job reports so that it doesn’t get out of hand quite so quickly.

Cheers,

Jon

im_thatoneguy · March 21, 2013, 6:09pm

Just want to add that we’ve been running MongoDB without any trouble on a 32bit machine since adding the --noprealloc flag. You lose journaling but you gain a DB that actually works for more than a day or two on a 32bit OS.