idle farm, higher db locking?

LaszloSebo · July 9, 2014, 3:58pm

It seems that an idle farm places a much higher load on the database? I think there was a bug that you guys fixed in the 6.0 beta, where just looking at job limits would place a lock on the db. Did that maybe resurface?

You can see how around 6am when the load started to level off, the lock count went up. Most slaves are just cycling this:

Scheduler - The 53bc19cfbb649a1bc4d5a287 limit is maxed out.
Scheduler - The 53bc959bcf71592af473b445 limit is maxed out.
Scheduler - The 53bc877fbe332d1ddc718873 limit is maxed out.
Scheduler - Slave has been marked bad for job 53bc9431c5f29c20d4d7bae0, skipping this job.
Scheduler - The 53bc62bac5f29c262433d3b5 limit is maxed out.
Scheduler - Slave has been marked bad for job 53bc9434c5f29c1b64bcf9c6, skipping this job.
Scheduler - The 53bc9a4dbb649a2a045576d8 limit is maxed out.
Scheduler - The 53baf2f9d3e2b4288ccbc707 limit is maxed out.
Scheduler - Slave has been marked bad for job 53bc19cabb649a0e1481f4d5, skipping this job.
Scheduler - The 53bc959ecf715933e4eeb06a limit is maxed out.
Scheduler - The 53bc411bc5f29c1070b5eb3b limit is maxed out.
Scheduler - The 53bc62b7c5f29c2580a62a48 limit is maxed out.
Scheduler - Slave has been marked bad for job 53bcb70b4433b12a94893940, skipping this job.
Scheduler - The 53bc19cebb649a1464097e53 limit is maxed out.
Scheduler - The 53bb1826d3e2b41ed01efeb5 limit is maxed out.
Scheduler - The 53bc19d1bb649a1ba89d9701 limit is maxed out.
Scheduler - The 53bcb69c4433b11c7048dcb7 limit is maxed out.
Scheduler - Slave has been marked bad for job 53bc9452c5f29c068cca3a21, skipping this job.
Scheduler - The 53bb1821d3e2b402bc610e41 limit is maxed out.
Scheduler - The 53bca48a4433b110aca54bf6 limit is maxed out.
Scheduler - The 53bb1824d3e2b410b8a40290 limit is maxed out.
Scheduler - The 53bcb7184433b128c04bed05 limit is maxed out.
Scheduler - The 53bc19d4bb649a131ca64042 limit is maxed out.
Scheduler - The 53bc19d3bb649a068433a1bb limit is maxed out.
Scheduler - The 53bb2570d3e2b422dc364c43 limit is maxed out.
Scheduler - The 53bcab45a0d91e1ad00484b7 limit is maxed out.
Scheduler - Slave has been marked bad for job 53bc973f4433b119dca4dd41, skipping this job.
Scheduler - The 53bd631ea9c5ca353c873561 limit is maxed out.
Scheduler - Performing Job scan on Secondary Pools with scheduling order Pool, Weighted, Balanced
Scheduler - The 53bd65efa58abb1748444470 limit is maxed out.
Scheduler - The 53bd613513ead411ff9b3464 limit is maxed out.
Scheduler - The 53bd621a769981946c2d5766 limit is maxed out.
Scheduler - The 53bd661da58abb1cf8ce413d limit is maxed out.
Scheduler - The 53bce3d12b632f12c00d49dc limit is maxed out.
Scheduler - The 53bd38c25c5c8a1fd8193e9a limit is maxed out.
Scheduler - The 53bd3bea13ead471671b9437 limit is maxed out.
Scheduler - The 53bd41a113ead4767447345c limit is maxed out.
Scheduler - The 53bd483b26b8b50d287efa0b limit is maxed out.
Scheduler - The 53bd543a26b8b50914dd8bef limit is maxed out.
Scheduler - The 53bd544e5d71fe08fccedca3 limit is maxed out.

Which are read only operations afaik.

jgaudet · July 9, 2014, 4:33pm

We improved it, but looking at anything in the DB requires a read lock; that’s just the way Mongo works. We’ve known for a while that an idle farm can be the worst-case scenario in terms of load – it makes sense if you think about it, all the Slaves are busy looking for jobs, instead of being off on their own rendering their bits of data.

In 7, this should be less of a problem, since we’ve split different data in separate databases entirely (by default), so multiple collections will no longer be on the same lock.

In the meantime, if this is becoming a problem, I’d suggest increasing the time between job checks when a Slave is idle:

im_thatoneguy · July 9, 2014, 4:36pm

This makes sense doesn’t it? An idle slave is pinging the database and requesting the job queue to see if there is a job it should render. That means it has to pull down the entire job queue which is a big DB GET request. While rendering it only needs to ensure that the task hasn’t been requeued or there is a pending remote slave command. That would be my guess. If you’ve got a big 3D Render it could be calculating GI for 10-15 minutes before checking in again.

EDIT: Jon beat me to it.

LaszloSebo · July 9, 2014, 4:47pm

It makes sense for sure, but only in the twisted world of mongo. Reads should not be placing locks on tables that cause a queue to build up. Writes? For sure, that i expect. But there should be no writes happening while the queue is idling. Are the slaves maybe updating their statuses too often when idle?

im_thatoneguy · July 9, 2014, 4:51pm

Locks would be important IMO otherwise wouldn’t two slaves possibly think one task is still “queued”? Getting the state of the task queue doesn’t do you any good if someone else is making their dequeue decisions simultaneously on the exact same information. They’re just going to change it before you get a chance to change it yourself.

nrusch · July 9, 2014, 6:47pm

Off topic, but this is another example of a way in which a central dispatcher would increase efficiency (or reduce inefficiency).

LaszloSebo · July 9, 2014, 7:43pm

That’s not a problem (as its not with any sql database). You get that ‘queued’ state on both machines, then you do a dequeue in an atomic operation, which does apply a write lock. And that will fail on one of the machines.