[feature request] per-job throttling

LaszloSebo · August 14, 2017, 9:33am

Hi there,

The global repository slave throttling is a very useful feature. It would be great to see this on a per-job level as well. There are two types of server issues one can run into: bandwidth, and DoS overload. To resolve bandwidth issues, the global setting currently available is great.

However its overkill, for DoS overload situations - where the overall bandwidth usage is low, but a large number of slaves querying the same file results on a DoS type overload. When we submit a nuke job of 500 frames with a 500 slave limit (and happen to have 500 instances of idle slaves), many time we get “file not found” errors for the .nk file being rendered during the initial pickup ‘rush’, a clear sign that the isilon’s samba service is throttling. It would be great to be able to set a per-job pickup queue throttle number, while still leaving the rest of the farm to pick up as many tasks as they like.

eamsler · August 14, 2017, 2:26pm

What about limits with the auto-release feature?

docs.thinkboxsoftware.com/produ … ine-limits

We introduced that handy per-plugin limit setting, so you could apply it to all Nuke jobs. The only downside is that we don’t support applying multiple limits per plugin, so if you’re already relying on limits for your Nuke licenses you’d have to make an OnJobSubmitted event to apply another limit group to those jobs.

Thinking even deeper about it, because every job secretly has a limit, you can apply those “release at progress” settings to each job. If you have a few dozen Nuke jobs though, you may still run afoul of the Isolon’s SMB grumpiness.

LaszloSebo · August 15, 2017, 12:57pm

We regularly use multiple limit groups per job for licensing and other control measures, which we would need to keep.

eamsler · August 15, 2017, 3:20pm

Cool, on OnJobSubmit it is I suppose. At least for now.

Do you think we could shift feature request to “Allow Multiple Limit Groups per plugin”? I’m just wondering if that would hit your reqs well.

I can’t imagine that would be too hard (he said, not being the guy who has to implement it).

LaszloSebo · August 15, 2017, 4:46pm

I think best suited would be a per plugin throttling option thats applied on a per-job level, so we could just make sure no single nuke job ever gets picked by more than X machines at a time (but many nuke jobs can be picked up by unlimited machines).

eamsler · August 16, 2017, 2:30pm

Definitely seems doable. This may be muddying the issue but the fact that you could have unlimited Nuke jobs that are limited to a certain number of machines, is that due to Isilon load characteristics? I’d expect 300 Nuke jobs at 30 machines each would hit it just as hard but I don’t know.

People have asked for this kind of thing before actually (forget about it yesterday). We wrote a script to limit machine count on every job and threw it up on GitHub:
github.com/ThinkboxSoftware/Dea … LimitClamp

I’ll discuss with the dev team at my meeting next week.

LaszloSebo · August 16, 2017, 6:25pm

The isilon nodes seem to have issues when single folders are being access en-masse, but it doesnt affect other folders. Its also hard to repro hah, a pretty elusive issue… but we tend to get 20-30 errors on nuke jobs that are sent to 500 slaves that pick them up at once

eamsler · August 17, 2017, 1:45pm

Huh. That’s completely unexpected. Then yeah, per-job throttle is definitely the way to go. Colour me informed!

MikeOwen · August 17, 2017, 2:14pm

Have you tried submitting the nk file with the job to the repo? Then each job would be a different single folder.

LaszloSebo · August 18, 2017, 9:21am

We already have each .nk file in a dedicated separate folder (we happen to store them with the outputs, for easy lookups)

LaszloSebo · May 10, 2018, 8:17am

Wanted to follow up if this is still on the radar?