AWS Thinkbox Discussion Forums

Workers choosing "wrong" job when using weighted scheduling order

Hello all,

I manage a small render farm of ~16 machines (23 at night with workstations), and I am struggling to understand some of the behavior I’m seeing in our render pools when the scheduling order for the repo is set to “Pool, weighted, first-in-first-out”.

The weight values I’m using are set to:

  • Priority Weight: 30
  • Submission Time Weight: 2
  • Rendering Task Weight: -1
  • Error Weight: -5
  • Rendering Task Buffer: 1

When a number of jobs are submitted one after another - let’s say, over the course of like 30 minutes some illustrators send ~6 or 8 jobs - the very first job to be submitted will get workers immediately. That is expected, and all is well.

The problem happens when that first job is finished - the workers are released from the job and instead of choosing the very next job in the list (based on submission time), they will choose a seemingly arbitrary job from the list, sometimes even the most recent job. This happens whether the job is a 3ds Max DBR job or an animation - in either case, the workers are choosing newer jobs over older ones - it’s as if the “weight” values that the workers are using are not being updated all together, so older jobs will appear to have a smaller weight value than a newer job - even though the older job should have a higher weight value and therefore be chosen first.

This doesn’t seem right to me - and I’ve played around with all the weight values to try to understand what’s happening under the hood, but I haven’t had any luck whatsoever. The results are confusing and I haven’t been able to figure out a pattern.

The only clue for me is the fact that when I double-click on a job to open up its properties, then close the properties window, the weight value listed in the Monitor changes immediately to update based on the submission time - but it’s entirely unclear to me whether the workers see the updated value or if they are using a cached value from the repository.

I have Pulse running on the main repository machine, but I don’t think Pulse is updating the weight values very often, if at all.

With a weighted system like this, how does each worker actually evaluate all the jobs and their weight values? And why would a worker choose a newer job over an older job with all else being equal?

Thanks for any help anyone can provide - I’m just tearing my hair out trying to understand this weighted system so that I can more efficiently allocate our small render farm across the jobs! Let me know if there is any more information I can provide to make this easier to diagnose and understand. Thanks!

Did you submit a ticket to support? I had an issue with the weighted/balancing and there was mention of a known bug, which version are you using?

Interesting! What version were you using when you had the issue? I’m using Deadline v.10.3.1.3.

I haven’t submitted a ticket to support yet. Since we are fairly new to Deadline still (switched from Pulze about 6 months ago), I’m kind of assuming that I’m doing something wrong in the configuration, but it would be great if that weren’t the case :smiley:

I checked the release notes page, and I don’t see any mentions of a weighted/balancing bug. Maybe it was small enough to be fixed but not documented? Regardless, there have been two releases since we installed Deadline, maybe that’s the fix.

my ticket was to do with balancing rather than weighting, but I got this response

I’ve seen a couple reports that 10.3.2.1 (at least) isn’t properly respecting some of the job weight options in ordering

Always worth opening a ticket as the more noise for an issue the more likely it is get fixed

Right on, thank you - that is good advice. I’ll open a ticket!

Privacy | Site terms | Cookie preferences