AWS Thinkbox Discussion Forums

Weighted Job Scheduling & Node Distribution

Hello all,

I’m looking for some information on the specifics of how Workers choose jobs when there is a “weighted” scheduling option enabled. The relevant section in the manual is here: Job Scheduling — Deadline 10.1.23.6 documentation

I can see how the weight values are calculated, that all makes sense. What I can’t find information on are these topics:

  • Is the “Weight” value displayed in Monitor updated by my local machine, or by Pulse?

  • In my testing on a small farm (~20ish render slaves), when two jobs have a similar calculated weight, e.g. job A’s value being 323290, and job B has a weight value of 311422 – “A few” nodes will hop onto the lower-weighted job, but most will stay on the higher weighted job. That’s great, that’s what I’d expect. But how does a worker make the decision to switch to the lower-weighted job? Is there some “secret” value calculated by each worker based on the difference in Weights? I.e. something like:

(weight_A)/(weight_B)
  • If I increase the priority value (which of course increases the weight value), all the nodes will switch to the higher-weighted job, which tells me that the workers are indeed evaluating the difference in weights to decide where to go.

Essentially - I’m a little new to Deadline and still trying to understand how resources on the farm are distributed with a weighted system. What I’d love to see is something like 60-80% of render nodes bunched up towards the “top” of the queue, with the remaining 20-40% distributed down the list in a cascade-like system. Is this possible?

Thanks for any help!

Workers are always going to work on tasks from highest priority jobs they’re able to pull from. You could set a machine limit on the jobs to force a maximum number of Workers on a given job, but that won’t be dynamic, so they may not be worth the hassle.

You could set up pools for your jobs for different priorities and make 20% of your fleet take those first, and for the rest put the high prio pool first.

Hey Justin,

Thanks for the input! But I don’t think you answered my question about the weight value calculations - since the weight calculation takes into account the priority, the workers should choose jobs based on weight, correct? Especially when multiple jobs have the same priority and pool, right?

I’ve been tinkering around with the Job Scheduling parameters and can’t seem to get any of the “weighted” job scheduling algorithms to work as expected - so maybe I’m misinterpreting the function, or perhaps there is an issue. Can you see if you can decipher what’s happening here?

The farm has 15 render workers (24 at night). I have the repository set to “Pool, weighted, first-in first-out” job scheduling. The weighted values are all set to defaults (priority weight: 100, submission time: 3, error weight: -10, rendering task weight: -100, rendering task buffer: 1).

I also have Pulse running on the same machine as the repository service.

When I submit multiple animations one after the other (separated by maybe 20 seconds each), I would expect that the 15 workers each decide which job to pick up based on the weight calculation (all else being equal with Pool & priority). So you’d think that the oldest job would get the highest number of workers, followed by the second oldest job, etc. etc…

But instead, what I’m seeing is this: First animation is submitted, all workers pick that up at the same time. While they are rendering their first frames of the animation, I submit three more jobs, one right after the other. Now there are four jobs in the queue. The first worker to finish a frame on the first animation then chooses the “newest” job instead of the “second oldest” job. The next worker to become available also chooses the newest job, which leaves the middle two jobs without any workers.

And then it seems like - at least according to the monitor window from my own workstation - the “weight” values for the two active jobs keep getting updated as workers finish frames, but the middle two jobs’ weight values don’t update, so the workers never choose them even though they should be second and third in line for workers.

So in the end, the first job gets the most workers, the fourth job gets a few, and the second and third jobs don’t get any.

My questions are essentially: Am I missing something about how “weighted” Job Scheduling is supposed to work? How can I figure out why a worker chooses a “newer” job over an older one when the weight value for the older job is clearly larger than the newer job?

Thanks for any help you can provide!

1 Like

Justin - a little more information for you.

It’s really weird, it seems like the “weight” values are not being updated properly - and I don’t quite understand which machine calculates the weight values in the first place (Pulse? Workers? both?).

If I submit 2 identical jobs, one right after the other, you’d expect the first job to have a higher weight value. But upon submitting the second job, the weight value of the second job is higher than the first. And that’s the value that whatever worker decides to pick up the job uses to decide. So the result is a backwards queue (jobs submitted later are picked up first), even when I have a positive value set for the Submission Time.

I know the timestamps are being evaluated properly because if I switch the job scheduling to “pool, priority, first-in first-out”, the timestamps are respected and the older jobs get workers first.

I would really love to rely on the Weighted algorithms, but it seems like there is some broken logic somewhere, or something else on my farm preventing the workers from evaluating the weight values of the jobs properly.

1 Like
Privacy | Site terms | Cookie preferences