priorities out of wack

a52_admin · May 1, 2013, 3:54am

DL beta 19, win7 x64 maya 2013

Looking at the attached screen shot, you can see the one job has a priority of 100, and is set to pick up the faster machines. The other jobs are at a lower priority, and set to pick up any available farm machine. The lower priority job is taking machines from the high priority job. I requeue the frames rendering on the machines I want to go to the higher job, but they go back to the 50 job.

bonus issue: notice z068 is idle. I’ve restarted the slave service several times. Any thoughts as to why it’s idle?

Cheers,

-ctj

a52_admin · May 1, 2013, 3:54am

crud…hate forgetting the attachment…

Bobo · May 1, 2013, 7:46am

Stupid question (I obviously cannot see all the data you see, and I suspect you know about farm management more than me), but is it possible that the one machine going to the 50 priority job is not part of the z044_up group for some reason? What is the name of the lonely machine working on SUM60_290_c03_a05_I05 and what groups besides “farm” is it part of?

You could try checking the “Job Candidate Filter” icon above the Slave list to see what machines are eligible to process it. Is the machine picking up the 50 priority job on the list of machines able to work on the 100 priority job?

rrussell · May 1, 2013, 1:03pm

From the screen shot, you can see that z068 isn’t in the ‘all’ pool, which the current queued and rendering jobs are assigned to. That’s why it’s sitting idle.

For the priority issue, we’ll have to dig a bit deeper. As Bobo mentioned, using the Job Candidate Filter in the slave list will allow you to see which slaves are eligible for a job by clicking on the job. With this filter enabled, it would be interesting to see which slaves are shown that can render it.

Another thing to check is if the job has a machine limit set. You can see this in the job’s properties.

Finally, does the job use any Limits? If so, maybe check if one or more of those Limits are maxed out.

Cheers,

Ryan

a52_admin · May 1, 2013, 5:26pm

DOH! thx Ryan. added z068 to all pool. Must have been offline when I was updating the pools.

Thank you for the slave filter tip. That’s a really nice feature. So using this, if you can visualize a venn diagram where the priority 50 job can render on anything, and the priority 100 job can only render on fast machines, I would imagine the 50 priority would pick up slow machines and 100 would pick up all the fast machines, then when 100 is done, 50 would pick up the remaining available fast machines. Is that correct?

Ryan/Bobo, thanks for the tips and help. Much appreciated. We’re coming from using Rush, and not used to all the UI options, so it’s a learning process.

-ctj

rrussell · May 1, 2013, 6:23pm

You should be able to use pools to achieve this goal. Here’s the documentation in case you want to refer to it later:
thinkboxsoftware.com/deadlin … cheduling/

Note that there is a similar page in the Deadline 6 user manual that you can download from the RC1 downloads thread:
viewtopic.php?f=85&t=9524

In your example, you could create a pool called “fast” and place your fast machines in this pool. You wouldn’t need to create an “all” pool, since all machines are already in the “none” pool, and jobs in the “none” pool always have lower priority than jobs in another pool. So in this case, your fast machines would prefer jobs in the “fast” pool, but then would fall back to jobs in the “none” pool once all the fast jobs are done.

Hope that helps!

Ryan

Bobo · May 1, 2013, 7:31pm

I would also recommend this article:
thinkboxsoftware.com/deadlin … imits.html

a52_admin · May 1, 2013, 9:26pm

This brings up another question. I thought I removed all the pools and subsequently updated everything into groups, so everyone would be rendering in the same pool, but the groups would break up the farm into fast, medium slow, etc… After I removed all the pools I added, the pools still show up on the monitor for a given slave. How do I get rid of pools?

now back to the above topic. If everyone is assigned to the same pool, but different groups, and there’s overlap in the goups (all, fast…fast is a subset of all), and a job A has priority 50 and in the ‘all’ group, and job B has priority 100 and in the ‘faster’ group, shouldn’t job B get all the ‘faster’ procs it wants, then job A starts picking up ‘faster’ procs as job B finishes?

Thank you for the links to the pools/groups explanation. I think the above scenario should work given those documents.

-scratches head…

Bobo · May 1, 2013, 9:52pm

Yes, that should work, but your language is backwards

The main thing you should remember is that Deadline does not have a centralized Manager, and does not use a Push mechanism. In other words, it is not the Jobs that are picking the machines, but the machines that are picking the jobs (Pull!). So you are defining rules for the machines to make decisions based on priority criteria (Pool, Priority, Date) and some filters (Pool, Group, Lists, Limits).

In the case you described, the Group “faster” REQUIRES that a machine that is looking for a job in the Repository is part of that group in order to work on Job B with priority 100. If the machine is not in the “faster” group, it will skip the job and go to Job A. Thus every machine that is part of Group “faster” and is looking for a job should prefer Job B because of its numeric priority, unless other rules prevent it from picking that job.

If you are not seeing this, please let us know.

a52_admin · May 1, 2013, 10:00pm

…sigh…story of my life, man. ;^)

Bobo:

The main thing you should remember is that Deadline does not have a centralized Manager, and does not use a Push mechanism. In other words, it is not the Jobs that are picking the machines, but the machines that are picking the jobs (Pull!). So you are defining rules for the machines to make decisions based on priority criteria (Pool, Priority, Date) and some filters (Pool, Group, Lists, Limits).

In the case you described, the Group “faster” REQUIRES that a machine that is looking for a job in the Repository is part of that group in order to work on Job B with priority 100. If the machine is not in the “faster” group, it will skip the job and go to Job A. Thus every machine that is part of Group “faster” and is looking for a job should prefer Job B because of its numeric priority, unless other rules prevent it from picking that job.

If you are not seeing this, please let us know.

That makes perfect sense. So a ‘faster’ slave should look at the available jobs and chose job B due to the fact job B has a higher priority. Machines NOT in ‘faster’ will chose job A because job B is not in their group.

It would seem what’s happening is job A is taking all the procs, ‘faster’ included, regardless of the priority of job B.

jgaudet · May 1, 2013, 11:49pm

That’s correct. Slaves only use Groups to determine if they can render a certain Job, they have no preference over which group they render, all else being equal. If you’re not using Pools, the Priority of the Job will be the primary way of choosing Jobs, with Submit Date being the tie-breaker (assuming you’re still using the default Scheduling Order).

If Job A is being picked over Job B in the scenario you described above, and they’re in the same Pool, there’s definitely something wonky going on. I would check your Scheduling Order setting to make sure it didn’t get flipped to Date → Pool → Priority or something. You can check this in the ‘Repository Options’ (under the ‘Tools’ menu as Super User), under the ‘Job Settings’ section. There should be a drop down for the ‘Job Scheduling Order’; the default (and the one we’ve been assuming so far in this thread) is ‘Pool, Priority, Submission Date’.

Bobo · May 1, 2013, 11:56pm

Ok, that IS unexpected.

Can you please confirm something first?
*Enable SuperUser mode in Monitor > Tools menu.
*Select “Configure Repository Options”
*In the dialog, go to the “Job Settings” entry on the left list
*See whether “Job Scheduling Order” is set to “Pool, Priority, Submission Date”.

If it isn’t, then that would explain the strange behavior. Since it defaults to “Pool, Priority, Submission Date”, I don’t see why it would be different, but I wanted to make sure it is set up right before we dig deeper…

EDIT: Looks like Jon beat me to it, and had the same idea

a52_admin · May 2, 2013, 2:01am

Thank you Jon/Bobo.

Setting is Pool, Priority, Submission Date.

There is an issue though of the Pools. I originally hand all my farm broken up into pools, then I saw the light and changed it to groups, removed all the pools, but I still see pools associated with machines even though there is only an ‘all’ pool. I’m seeing now that I should remove the ‘all’ pool as well, since we currently don’t let anyone else use our farm…stingy, I know, but hey…there’s no use right now for pools. When we decide to let the graphics guys render their after effects work on here, I’ll create CG and Design pools.

So I’m going to clear out the slave machines tonight/early morning after all renders are off the farm and run a similar test once we expunge all pools from the list.

Thanks, and I’ll keep you posted.

-ctj