Slaves not following the priority correctly

We are a Pool, Priority, In/Out repository but just completely randomly slaves will decide “Nope, I’m going to render this other job.” If I requeu them they’ll go back to the correct job. But it’s really puzzling me as to why some slaves just switch jobs in mid stream and will start rendering a lower priority job. I didn’t see this behavior before 7.1

Found the relevant Verbose slave details, sorry took a few minutes to sanitize :wink:

2015-08-03 17:31:56: 0: INFO: Lightning: CallCurRendererRenderFrame returned code 1 2015-08-03 17:31:56: 0: INFO: Lightning: Render done 2015-08-03 17:31:56: 0: INFO: Lightning: Saved image to XXXXXXXXXXXXXXXXXXXXXXXXXXX 2015-08-03 17:31:56: 0: INFO: Lightning: Checking render elements 2015-08-03 17:31:56: 0: Render time for frame(s): 2.811 m 2015-08-03 17:31:56: 0: Total time for task: 3.050 m 2015-08-03 17:31:56: 0: Saving task log... 2015-08-03 17:31:57: Scheduler Thread - Render Thread 0 completed its task 2015-08-03 17:31:57: Scheduler Thread - Seconds before next job scan: 2 2015-08-03 17:31:59: Scheduler - Performing Job scan on Primary Pools with scheduling order Pool, Priority, First-in First-out 2015-08-03 17:31:59: Scheduler - Limit Stub for '55c00724b5b7e13f6c369241' (held by 'daedalus') was removed. 2015-08-03 17:31:59: Scheduler - Successfully dequeued 1 task(s). Returning. 2015-08-03 17:31:59: 0: Shutdown 2015-08-03 17:31:59: 0: Shutdown 2015-08-03 17:31:59: 0: Exited ThreadMain(), cleaning up... 2015-08-03 17:31:59: 0: INFO: End Job called - shutting down 3dsmax plugin 2015-08-03 17:32:00: 0: Shutdown 2015-08-03 17:32:00: 0: Shutdown 2015-08-03 17:32:00: 0: INFO: Disconnecting socket connection to 3dsmax 2015-08-03 17:32:00: 0: INFO: Waiting for 3dsmax to shut down 2015-08-03 17:32:00: 0: INFO: 3dsmax has shut down 2015-08-03 17:32:00: 0: Stopped job: FIRSTJOBXXXXXXXXXXXXXXXXXXXXXXXXXXX 2015-08-03 17:32:00: 0: Unloading plugin: 3dsmax 2015-08-03 17:32:01: Scheduler - Returning limit stubs not in use. 2015-08-03 17:32:01: Loading event plugin ConfigSlave (\\XXXXX\deadlinerepository7\custom\events\ConfigSlave) 2015-08-03 17:32:01: Scheduler Thread - Synchronizing job auxiliary files from \\XXXXX\deadlinerepository7\jobs\55bff8643a40ed28709e2c27 2015-08-03 17:32:02: Scheduler Thread - Synchronization time for job files: 847.047 ms 2015-08-03 17:32:02: Scheduler Thread - Synchronizing plugin files from \\XXXXX\deadlinerepository7\plugins\3dsmax 2015-08-03 17:32:03: Scheduler Thread - Synchronization time for plugin files: 659.037 ms 2015-08-03 17:32:03: Loading event plugin ConfigSlave (\\XXXXX\deadlinerepository7\custom\events\ConfigSlave) 2015-08-03 17:32:04: 0: Got task! 2015-08-03 17:32:04: 0: Plugin will be reloaded because a new job has been loaded, or one of the job files or plugin files has been modified 2015-08-03 17:32:04: Constructor: 3dsmax 2015-08-03 17:32:04: 0: Loaded plugin 3dsmax (\\XXXXX\deadlinerepository7\plugins\3dsmax) 2015-08-03 17:32:04: 0: Start Job timeout is disabled. 2015-08-03 17:32:04: 0: Task timeout is disabled. 2015-08-03 17:32:04: 0: Loaded job: SECONDJOBXXXXXXXXXXXXXXXXXXXXXXXXXXX (55bff8643a40ed28709e2c27) 2015-08-03 17:32:04: 0: INFO: Executing plugin script C:\Users\XXXXX\AppData\Local\Thinkbox\Deadline7\slave\Daedalus\plugins\55bff8643a40ed28709e2c27\3dsmax.py

EDIT: Gah… It did it again. It just really really doesn’t want to render that job. It did one frame successfully then inexplicably quit.

Hello Gavin,

Can you verify if your jobs are being submitted with the interruptible flag enabled? If so, can you verify if the job which supplanted the original one being worked on had a higher place in the scheduling pecking order?

Neither Job is interruptible.
Repository/Job Settings/JobScheduling Order: Pool, Priority, First-In First-Out

Job #1: Group “Global” Pool: “Global” Priority:60 Submission TC: First
Job #2: Group “i7_v02” Pool: “Global” Priority:70 Submission TC: Second

Slave was started up and picked up Job #2 (Pool ==, Priority >: END)
Slave finished one task and then picked up Job #1.
Slave was requeue and went back to Job #2 for one task and then returned to Job #1.

Silly question, but maybe it’s limit groups?

If the higher priority job can’t get the limit it needs then it could take a lower priority.

Pulse used to have a really nice dump of the dequeue process to show you what it was thinking. I’ll talk to the guys and see if we could put that crazy output into the Slave so we can troubleshoot this.

Also found another bug today where I actually did use a limit group. In that instance it renders correctly but the slave filter (Only Show Slaves That Can Render selected job) doesn’t show a whole group of slaves which can (and are in fact rendering it). Removing the limit filter includes them in the filter list but I need the limit filter and it’s purely cosmetic.