Is there some documentation about setting up pools/groups/limit groups to handle licensing?
We have a farm of 50 computers with mostly Max jobs on it all day. We want the farm to pick up Nuke jobs ASAP over any other jobs but still let current max tasks complete. We could pull some machines off and let them sit idle but that is wasted render time. So…
In order to get this to work I have set up a few things.
Job Scheduling Order: Pool_Priority_Date
3D max jobs are all submitted as “job interruptible”
Slaves all have Assigned pools: nuke,max, etc
Nuke jobs are submitted to a Nuke “Limit group” that has a machine limit of 10.
This works pretty well but It seems that the occurance of Nuke licensing errors has increased using this setup. Even though we use the “Nuke Limit group” I think it is due to a machine not quite releasing a license whilst another machine has started up and is trying to check out a license. How long does it take for a machine to release a license? Does a task sitting @ 100% only have its status changed to complete once something specific has happened like the release of license maybe?
Is there a smarter configuration? Am I completely over complicating this and there is a magic “look after all licenses” button (so I can go home and have a beer)?..
Sorry for all the questions, thanks for your time!
The only thing I would change is to not set the 3dsmax jobs to interruptible, as this will kill a 3dsmax render in the middle of a task when a higher priority job (ie: a Nuke job) comes around, instead of letting the task finish first. Other than that, everything you’ve described sounds correct, so I’m surprised you’re still running into licensing problems.
Slaves use Limit Groups by acquiring a “stub” when they pick up job that requires one. The slave will hold onto the stub during the time that they have the job, which will count against the limit. The slave will release it once it has released the job, which would be after Nuke has shut down.
Which version of Deadline are you running? In version 4.0, there was a chance that Nuke would be left running after a job completed, which would likely count against your license total. This issue has been fixed in Deadline 4.0 SP1.
We are running Deadline Version: 4.0.0.39717.
I had done some testing and it seems once a machine has picked up a job it will continue to complete any task attached to the job until the whole job has finished, even if a higher priority job is sitting in the queue (a slave will pick up a higher priority job if it is restarted).
Does this sound like normal slave behaviour?
I understood the “job interuptible” function as; tasks are not interruptible but jobs are. If it is going to interrupt a task then I would of thought it would be called “tasks interuptible”. At the moment the slaves working on current tasks seem to complete the task/s before moving on to a higher priority job.
That isn’t normal behavior. When a job with higher priority is submitted, the slave should finish its current task, and then move on to the higher priority job. Can you do the following test:
Create a new pool called “testing”, and assign it at top priority to one slave. Then restart the slave application on that slave machine so it recognizes the pool change. This is the slave we will be using for this test.
Submit a very simple job (with a very small scene file) to Deadline with priority set at 50, pool set to “testing”, and frame range set at 1-100. Make sure the job is not set to be interruptible.
Once the slave has picked up the first task for the job, submit the same job again with priority set at 51, pool set to “testing”, and frame range set to 1-100.
You should see that once the slave has finished its current task for Job 1 (priority 50), it will move on to Job 2 (priority 51).
That is not proper behavior either. If a job is interruptible, then its tasks can be killed in the middle of rendering so that it can move on to the higher priority job right away. I’ve confirmed this behavior here. Note though that the slave only checks for higher priority jobs to interrupt the current job every 5 minutes, so depending on how fast the frames render for a job, it’s definitely possible that a task will finish before it is interrupted.
I agree that the documentation for this feature can be a bit misleading, so we’ll change the description to say that “tasks for this job can be interrupted during rendering by a job with higher priority”.
Ok looks like I have been able to emulate the correct behaviour with that test.
Looks like the 5min wait on the checking of higher priority jobs (or maybe not testing properly) is what was throwning me off, is this configurable?..
Could I make all jobs interruptible for a limit group?..
This is not configurable. Based on your original posting though, I don’t think you want to use the Interruptible option. This was from your first post:
I think we may be getting a bit off track from what your original goal was, so I’ll propose a configuration that should help you achieve what you’re looking for.
Correct me if I’m wrong, but it sounds like you want Nuke jobs to always be given priority on the farm, but you also don’t want any machines sitting idle when there are only Max jobs available. You can achieve this using Pools (and it sounds like you already have nuke and max pools). You have a couple options:
Assign the pools ‘nuke’ and ‘max’ to all slaves, in that order. With this configuration, your entire farm will always prefer Nuke jobs, but will still render Max jobs when there are no Nuke jobs available. If you avoid using the Interruptible option, slaves will finish their current task for Max jobs before moving on to a Nuke job.
The same as (1), but assign pools ‘nuke’ and ‘max’ to some slaves in reverse order. With this configuration, part of your farm will prefer Nuke jobs, and the other part will prefer Max jobs. This way each job type gets some priority, but your farm will still be saturated if only one job type is in the queue. Again, if you avoid using the Interruptible option, slaves will finish their current task for their jobs before moving on to a higher priority job.
You should continue to use the Nuke limit group the way you are to avoid licensing issues. That being said, you might be running into a bug in Deadline 4.0 that can occasionally cause Nuke to not exit properly after it finishes a job (thus holding on to its license). To work around this problem, I would highly suggest upgrading to Deadline 4.0 SP1, and then applying this patch: viewtopic.php?f=57&t=3856
This will likely fix that issue (assuming it’s the same problem that you’re running into). Deadline 4.0 SP1 does not require a new license if you are already running 4.0. You can download it from here: software.primefocusworld.com/sof … /download/
Looks like that may fix it but 1 week from deadline I am a bit too nervous to change how everything is currently configured. Hopefully at end of poject we can do a few upgrades and tests and will let you know… Thanks for your help.
As a continuation of this post, I’m looking for some advice on how to best setup Nuke Rendering via Deadline when I have less Nuke Render Licenses than available rendernodes.
My office has 40 rendermachines. These machines are setup for rendering any of the following: 3dsmax, Aftereffects, and Nuke.
We have 5 Nuke Render Licenses.
What we want:
To be able to send multiple nuke job to all 40 machines, have the first available machines pickup the nuke jobs and render them in order based on priority.
What I have done:
The Repository is set to: Priority, First In First Out
I made a group of the 40 Rendernodes: nuke_render
Made a Limit in Monitor: nuke_limit_rendernode_5
[list]limit = 5
whitelisted all 40 rendernodes
did not set “Return at Task Progress” as ive only seen tasks for nuke jobs at 0% or 100% so didn’t think it would do anything.[/list:u]
For Nuke submission:
select the group “nuke_render”
set priority (first is 100, second 99, third 98)
set machine limit to 5
set the limits to “nuke_limit_rendernode_5”
then click ok to send
Even with these, we are still having issues with RLM errors for Nuke.
For Example, if i send only one job with a priority of 100, everything seems to be fine.
If i send three jobs, one at 100, one at 99, one at 98, I will get the licensing error.
Its seems that with multiple jobs, deadline doesn’t know there is already a job (priority 100) taking up the five licenses?
Any ideas on how to get this to work how we want?
I guess i could set dependencies on the jobs (essentially daisy chain them) but it seems to defeat the purpose of setting a limit. (If i am required to set dependencies, why not just set the machine limit to 5 then daisy chain dependencies based on priorities?)
Wow! Really old thread! Could you confirm the exact version/build of Deadline you are using, as it may well be a simple bug that got fixed ages ago. Essentially, “limits” (used to be called “limit groups”) should work exactly as you describe and stop you going over license limit budget!
As your already on v6.2.0.32, no new license is required. So, please do reach out to our sales team via email: sales [at] thinkboxsoftware [dot] com who will be able to check your annual support contract and provide the download link. Incidentally, Deadline is now at v7.0.2.3, so you might also want to consider a full upgrade at some point as well, although this will require more work on your part of installation. However, if your on current annual support, then it’s all available to you with lots of new features!