multiple jobs on the same machine

LaszloSebo · August 18, 2014, 9:59pm

Currently we do this by running multiple slaves on the same machine. This presents a pretty large problem, since all slaves do their own scheduling, there is no central dispatches, so making proper decisions when to run and what parallel to other jobs to fill in cpu cycle holes is practically impossible.

I hope we could turn this into a larger discussion.

Optimally, jobs would be allocated cpu & ram based on factors. We could control that ourselves, without expecting deadline to magically figure out what jobs actually need. Then, the slave application could “fill up” a machine, but launching and managing multiple jobs from one slave, based on metrics its collecting about available resources and the jobs.

It could start a max sim job requiring 2 cpus and 12gigs of ram, then look at whats left. Ok, i still have 22 cores and 116 gigs of ram, lets keep launching jobs! That kind of stuff.

Instead, currently we have multiple slaves with absolutely no knowledge of when they should initiate their jobs, and how, and we are thinking of micromanaging all this from ‘overseer’ scripts that randomly wrangle pools groups… Very messy. I dont even want to touch that stuff

nrusch · August 19, 2014, 4:14am

I would love to see this kind of scheduling flexibility added to the slave to remove the need to run multiple processes at once. Tractor allows you to do this kind of thing by basically defining a number of work “slots” of different types on different “classes” of slave. You can have a slave do something like allow one Maya job and two Nuke jobs to run at once, or even have it wait for a Maya job, and then once it has one, make a Nuke slot available.

The classes can be automatically determined at startup based on things like hostname pattern-matching, resource counts, etc. (think a more robust version of Deadline’s Auto-Configuration).

rrussell · August 19, 2014, 2:44pm

The idea of “slots” (or “blocks”, which is what we’re referring to them as internally) is something that we will be discussing when we start redesigning the scheduling system for Deadline 8 (along with various other scheduling-based requests we’ve received in the past). It’s too early to get into any significant details, but ultimately I’d like to see a single slave that can process multiple jobs, as you guys are requesting.

The idea of predefining the “blocks” for a slave (or a class of slaves) is interesting, and might be easier to understand from a wrangler’s perspective. Initially, we were thinking that the “blocks” would be defined at the job level, and a block count would be set for the slaves, and the slaves would try to dynamically keep their blocks full. However, if the packing algorithm is a greedy one, it could starve jobs with high block counts as the slave tries to keep filling in the holes with smaller blocks.

Maybe instead of blocks, we just allow you to define “instances” (I’m sure there is a better name for it) for a single master slave, and each instance can have separate pools/groups to control the types of jobs they can pick up. It’s basically the benefits of running multiple slaves, but you only have to run one slave, and it’s aware of what all of its instances are doing.

We’re VERY early in the design phase right now, so it’s great that we’re having this discussion now.

Cheers,
Ryan

nrusch · August 19, 2014, 5:03pm

Now that you mention it, I seem to remember this being touched on in that “central dispatcher” thread…

Yeah, I would agree that a simple block-packing approach could create other issues like the job starvation you mentioned. That said, I do like the idea of being able to arbitrarily assign block weights to jobs, so maybe some kind of a hybrid approach would be the best answer. Whether that means including different block-assignment algorithms the user could choose from for each slave (or even allowing them to write their own), or simply giving each slave a “preferred” job block size to govern its primary choices, I think that would be a great way to improve flexibility while also simplifying slave management.

I would definitely prefer the block-based approach…

nrusch · August 19, 2014, 5:05pm

Also, I know it could be its own topic, but the idea of allowing slave auto-configuration to handle deeper slave settings like block availability and assignment preferences is still really appealing…

LaszloSebo · August 19, 2014, 5:32pm

I think currently deadline’s scheduling methods result in very low farm utilizations in general, so even a greedy algorithm might pack a large punch.
But yeah, i agree that what we would need is the ability to fine tweak the system by adding properties per slave such as:

maximum parallel jobs of type X
maximum parallel jobs total
compatible plugin combinations (max TP sim+nuke = OK, max Krakatoa sim + max TP sim = OK, max render + nuke = NOT OK, etc)
preferred job type = this could be used to avoid ‘starving’, we could identify groups of machines with preference to heavy jobs. If those types of jobs are queued and the particular machine isn’t rendering one of that type, it could let its other renders ‘run off’ to create empty slots to be able to pick the ‘preferred’ job types.

The result of this would essentially create a ‘filter’ list, kinda like the group/pool settings of the slave.

cheers
laszlo

rrussell · August 19, 2014, 8:02pm

Thanks for the additional feedback! Keep it coming if you have more ideas!

You can do this with the new slave event triggers in Deadline 7. We’ve already ensured this works for auto-setting pools and groups, and it can be used to set other slave settings as well. It requires you to write some python, but it also gives you a lot of control over what you can do.

nrusch · August 19, 2014, 8:35pm

Oh right, sorry, I forgot you already pointed that out in another thread…