"spread the wealth" slave distribution

LaszloSebo · October 17, 2013, 2:01am

One of the big changes we noticed today going from assburner to deadline that might potentially be a problem, is that the scheduling algorithm can not be tweaked in a way to spread the wealth properly.

Basically, once the pools, limitgroups, priorities and submission dates are sorted out, its first come first serve.

Assburner tries to add 1 machine to every job at least, then when it gets same priority jobs, it tries to spread the slaves equally amongst them (and not give all to the first job, then to the next etc).

In the grand scheme of things, you might think that this does not matter. But what it comes down to is artists waiting for their renders to pick up. Normally they send it to the farm, within 5-10 mins they have at least 1-2 frames independent of the queue size and they see if its going in the right direction or not.

Currently a couple of bad jobs can hog the farm and queue up all the rest of the jobs.

rrussell · October 17, 2013, 12:44pm

Hey Laszlo,

Deadline has always been a FIFO queuing system, but we do want to redesign our scheduling logic in the future and perhaps we could consider this when doing so. One of the main reasons for the redesign is to support a scriptable scheduling plugin system that would allow for customizations to the scheduling algorithm. The round-robin system could just be an example scheduling plugin that we ship with Deadline, as I’m sure the current FIFO algorithm could be modified to sample all jobs at the current priority first to make sure they all have at least one task rendering

However, you might be able to achieve what you are looking for right now with the Secondary Pool feature we added in 6.1. Another company had actually proposed this as a viable solution when they ran into the same problem, since their previous network manager supported this. What you could do is create a pool for each artist and assign each pool as top priority to one or two slaves (I’m sure you could whip up a script that uses the Deadline API to do this and run it through deadlinecommand, rather than do it manually). When a job is submitted to the farm, you would pick the user’s pool as the primary pool, and the normal pool as the secondary pool. This way, each artist is always guaranteed top priority on a couple machines, but then their job can spread out across other slaves based on the secondary pool.

To set this up in the job info file, you just need to do this:

Pool=userpool
SecondaryPool=normalpool

Cheers,

Ryan

LaszloSebo · October 17, 2013, 2:37pm

Thanks, ill see if we can sort something like this out. Although, not sure how manageable it will be with the number of slaves & artists we have (manageable from a wrangling and IT perspective, as machines regularly go down for maintenance, users select different ram / cpu requirements for their jobs based on the task etc).

Do you think that as a short term solution (that would not require redesigning the scheduling mechanism) could be added, something along the lines of:

a “minimum machine limit” job property, which would basically be 0 by default, and would define how many slaves HAVE to be rendering the job, independent of other job’s priorities. A slave would always prefer a job that does not have its minimum reached yet to a job that already does.

While this would still not give an even distribution, we would not have artists sitting & waiting for other jobs to finish, which is the biggest problem really.