Resource to learn Balancer scripting?

Panupat · February 7, 2020, 6:02am

Hi. Is there any articles or blogs that teaches a bit more about to write a Balancer script?

Also I’m curious if I can use it to make sure local faster machines pick up jobs before the slower ones, given that I give the faster machines some indication (such as extra info)?

Justin_B · February 7, 2020, 9:06pm

I think you’ve got the Balancer mixed up - it’s really best for controlling could machines. Also I don’t have a hot clue how to write a Balancer script, we’re really short on a tutorial there.

If all you want is to have the big machines dequeue work first you’ll want to check out setting up Pools and Groups, handy post about that here.

Though I suppose if you’ve got cloud machines created/managed by the Balancer you’d need to set up some Pools and Groups to make certain the right machines are dequeuing the right jobs.

danielskovli · February 7, 2020, 11:01pm

As far as other answers we’ve gotten here and privately to a similar scenario, I don’t believe Deadline has the capacity to schedule jobs in this way.

I’d be really interested in hearing what you come up with!

Our current “solution” is a scaling method we have running as an external cron job (you could hook this in to an Event, but we had too much drama with it). The hook will look at queued/active jobs and enable workers as required – in our preferred order. Eg. scale the available farm workers up and down according to the queue.

Doing it this way you can quite specifically control which workers are of higher value to you, and make sure they are always the first ones to pick up jobs.

We are toggling them on and off by changing workers’ group membership, eg. removing gpu to disable GPU jobs, etc. This isn’t super ideal, since there are many idiosyncrasies in Deadline regarding when and how a worker reacts to having a group removed. For this reason, we also send a relaunch command whenever a worker is no longer required to be active. This is a reasonably new “bug”, where the worker no longer checks its group memberships between tasks in the same job, but nonetheless, stuff like this happens all the time.

You could enable/disable workers by simply disabling them, which may work better for you. You’ll still need to kill the process to actually stop the worker from continuing with the current task though.

TL;DR: we’re doing more or less what you are wanting to do, but in a super hacky way that works “just fine”.

Panupat · February 8, 2020, 4:03am

Thanks. I had a similar system too, only adding/removing pools instead.

Christopher_Kornosky · July 24, 2023, 9:07pm

@danielskovli What drama did you have with events? I was having issue with events before and also stopped trying to use them, mainly it was due to the events locking and wasn’t allowing me to make adjustments. Had a couple instances when a rogue event broke the farm and I was unable to modify the script to fix the issue without support from our CTO, after that I stopped using them and moved on to cron jobs.