I’ve done a search and couldn’t find a discussion pertaining to this. If it was discussed, please link me to the conversation.
We have been using the 0 (zero) Machine Limit setting until lately, when we were suggested to limit each job to use just a few machines (in our case, 3-machines). This does make sense, as it allows more jobs to start rendering, rather than just a few jobs to take all of the farm machines.
However, it’s been reported that, even when the farm has many idle machines, render jobs still stick to their assigned machine limits. Is this the correct behavior of Deadline’s machine assignment? Or are we needing to flip a switch in the Deadline settings somewhere? I would assume that Deadline would see that there are 2 jobs on the farm using 3-machines each, and automatically assign idle/free machines to these jobs.
Alternative solution(which we use) would be to divide the farm on pools that have preference over certain part of the farm. If farm is empty each pool will then utilise the free machines, or give them up if a job starts in a pool that has priority over that part of the farm.
In addition to the default Pool/Priority/Date algorithm everybody is using by default, you should be aware of the huge multitude of options available in Deadline: docs.thinkboxsoftware.com/produc … uling.html
For example, the Pool/Balanced mode lets you still use pools to control the scheduling of jobs, but also evenly distribute the Slaves to the Jobs so each job gets processed. So for example in the default Pool/Priority/Date without any limits, if you have 10 machines in Pool A, and 10 machines in Pool B, and two jobs are sent to Pool A and 5 to Pool B, normally the job with Pool A and highest priority that is submitted first will take on all 10 machines, and a job with Pool B with the highest priority will take over the 10 machines in Pool B. But with the Pool/Balanced mode, each of the two jobs in Pool A will get 5 machines, and each of the 5 jobs in Pool B will get 2 machines… And this is just one of many options. You can then look at Pool/Priority/Balanced, where both Pool and Priority are respected, but jobs with the same Pool and Priority split the Slaves in a balanced manner. If any Slaves are left in the Pool after that, they will be distributed to the jobs with lower Priority in the Pool. Otherwise, when the Slaves finish with the higher Priority jobs, they will be distributed to the lower Priority jobs…
Read the topic I linked to and see if any of the available modes sound like a good idea for your workflow.
The Machine Limit is (usually) a hard limit - “never exceed this number”. It is useful when rendering thousands of tasks on a large farm, preventing a single job from taking over the farm. Keep in mind that in typical production, there are hundreds or even thousands of jobs fighting for resources, so the case where there is only one active job on the farm and the Limit is preventing it from using all resources is a rather rare one for most people…
Of course, on small farms with very few jobs it might happen. So Machine Limits would be the wrong tool in that situation.
Are you wanting to control the ‘limit’ across multiple jobs, ie: across all your slaves at any one time? If so, then “Machine Limit” on a job, ONLY affects that single job. If you want to control a global limit so that typically, X number of Slaves do not try to render X number of tasks from X number of jobs, (say in the case of you owning a limited number of licenses of a particular piece of render software), then you should use “Limits”: docs.thinkboxsoftware.com/produc … tml#limits
Using “Limits” there are also a number of ‘tricks’ you can achieve by careful configuration. This recent developer blog post on how limits work in Deadline may prove useful here: deadline.thinkboxsoftware.com/fe … -resources
From the sounds of it, I don’t think you need to mess with the job scheduling settings at all…
We are a smaller studio with a total of 20 blades and about 40 staff PCs. We don’t typically have the entire render farm being utilized 24/7. When we were using the Machine Limit 0 setting for submitted Maya jobs, the first couple of jobs would take almost the entire farm, as the machines were idle. Since our rendering requests go in peaks and valleys, the first jobs kicked off would just take the entire farm.
Now, that’s fine if the first couple of jobs do take the brunt of the machines, but as more jobs are kicked off, I would like Deadline to now spread the machine usage to the newer jobs, as well. However, it seemed as though those first, machine-hoarding jobs would take the machines until their renders were done.
Another interesting topic is whether you want some jobs to be interruptible by other more import jobs. This can be highly inefficient, but is possible here: docs.thinkboxsoftware.com/produc … erruptible
When a Slave finishes a task of a job, it will look for other higher priority jobs before continuing. So if you have a job with 1000 frames that took all 20 blades, and each frame takes an hour, in about an hour the 20 Slaves will check to see if any higher priority jobs have been submitted. If there are any, it will drop the current job and go over. Otherwise, if the Pool and Priority are the same, the earlier submitted job will keep on using all Slaves until all its tasks are done. This is actually a Good Thing - you don’t want to submit a new job with a higher priority, and have a Slave that has rendered for 45 minutes drop what it is doing and go working on the new job. This would be a waste of render time.
That being said, Deadline supports this option via the Interruptible property of the job - you can even specify the percentage that is acceptable for a task to have reached before it becomes non-interruptible… But that’s generally a Bad Practice.
In most pipelines, the queue is processed first in - first out. In other words, it is expected that a job submitted earlier will be processed in full before the next jobs, assuming Pools and Priorities are the same. When a job needs to go in front of the Queue, the render wrangler person might bump up its Priority (or the artist might submit with a higher Priority, risking getting flak from the rest of the company). However, as long as the other jobs are not interruptible, the new job will have to wait for some Slaves to finish what they are doing before they discover that a more important job has appeared.
The Balanced option would not help in your example case if a job has already taken over all Slaves for an hour. But once the Slaves start finishing tasks of the first job, they would find other jobs with the same priority and start going to them even if they were submitted later to ensure all jobs get some attention…
There are so many ways to set up things, you really need to define exactly how you want your queue to run, and then find the right settings to get that behavior.
Thanks for your guys’ advice. I guess the issue I am dealing with is the various leads of small projects wanting a “fair” usage of the farm. We’ve set up each ongoing project in the Pools, and tried to distribute the priority of each project as fairly as possible. We’ve also introduced the Job is Interruptible option (at 25%), in case there are any jobs working on another project’s machines when that project needs the render power. And during today’s advice from you guys, I’ve changed the Job Scheduling settings in the repo to be “Pool, Priority, Balanced” instead of “Pool, Priority, First-In First-Out”.
I will keep watching the farm closely. I actually reckon, as Boris had lightly mentioned, that the first jobs taken by the farm had eaten up a large number of machines, and maybe this created uneasiness for the staff of other projects. But once those machines start to finish their jobs, they should be picking up higher priority ones on other projects, if there are any. Maybe I just need to explain this.
Why can’t we just work with machines? Things can be much simpler. Only joking!