We are a Pool, Priority, In/Out repository but just completely randomly slaves will decide “Nope, I’m going to render this other job.” If I requeu them they’ll go back to the correct job. But it’s really puzzling me as to why some slaves just switch jobs in mid stream and will start rendering a lower priority job. I didn’t see this behavior before 7.1
Found the relevant Verbose slave details, sorry took a few minutes to sanitize
2015-08-03 17:31:56: 0: INFO: Lightning: CallCurRendererRenderFrame returned code 1
2015-08-03 17:31:56: 0: INFO: Lightning: Render done
2015-08-03 17:31:56: 0: INFO: Lightning: Saved image to XXXXXXXXXXXXXXXXXXXXXXXXXXX
2015-08-03 17:31:56: 0: INFO: Lightning: Checking render elements
2015-08-03 17:31:56: 0: Render time for frame(s): 2.811 m
2015-08-03 17:31:56: 0: Total time for task: 3.050 m
2015-08-03 17:31:56: 0: Saving task log...
2015-08-03 17:31:57: Scheduler Thread - Render Thread 0 completed its task
2015-08-03 17:31:57: Scheduler Thread - Seconds before next job scan: 2
2015-08-03 17:31:59: Scheduler - Performing Job scan on Primary Pools with scheduling order Pool, Priority, First-in First-out
2015-08-03 17:31:59: Scheduler - Limit Stub for '55c00724b5b7e13f6c369241' (held by 'daedalus') was removed.
2015-08-03 17:31:59: Scheduler - Successfully dequeued 1 task(s). Returning.
2015-08-03 17:31:59: 0: Shutdown
2015-08-03 17:31:59: 0: Shutdown
2015-08-03 17:31:59: 0: Exited ThreadMain(), cleaning up...
2015-08-03 17:31:59: 0: INFO: End Job called - shutting down 3dsmax plugin
2015-08-03 17:32:00: 0: Shutdown
2015-08-03 17:32:00: 0: Shutdown
2015-08-03 17:32:00: 0: INFO: Disconnecting socket connection to 3dsmax
2015-08-03 17:32:00: 0: INFO: Waiting for 3dsmax to shut down
2015-08-03 17:32:00: 0: INFO: 3dsmax has shut down
2015-08-03 17:32:00: 0: Stopped job: FIRSTJOBXXXXXXXXXXXXXXXXXXXXXXXXXXX
2015-08-03 17:32:00: 0: Unloading plugin: 3dsmax
2015-08-03 17:32:01: Scheduler - Returning limit stubs not in use.
2015-08-03 17:32:01: Loading event plugin ConfigSlave (\\XXXXX\deadlinerepository7\custom\events\ConfigSlave)
2015-08-03 17:32:01: Scheduler Thread - Synchronizing job auxiliary files from \\XXXXX\deadlinerepository7\jobs\55bff8643a40ed28709e2c27
2015-08-03 17:32:02: Scheduler Thread - Synchronization time for job files: 847.047 ms
2015-08-03 17:32:02: Scheduler Thread - Synchronizing plugin files from \\XXXXX\deadlinerepository7\plugins\3dsmax
2015-08-03 17:32:03: Scheduler Thread - Synchronization time for plugin files: 659.037 ms
2015-08-03 17:32:03: Loading event plugin ConfigSlave (\\XXXXX\deadlinerepository7\custom\events\ConfigSlave)
2015-08-03 17:32:04: 0: Got task!
2015-08-03 17:32:04: 0: Plugin will be reloaded because a new job has been loaded, or one of the job files or plugin files has been modified
2015-08-03 17:32:04: Constructor: 3dsmax
2015-08-03 17:32:04: 0: Loaded plugin 3dsmax (\\XXXXX\deadlinerepository7\plugins\3dsmax)
2015-08-03 17:32:04: 0: Start Job timeout is disabled.
2015-08-03 17:32:04: 0: Task timeout is disabled.
2015-08-03 17:32:04: 0: Loaded job: SECONDJOBXXXXXXXXXXXXXXXXXXXXXXXXXXX (55bff8643a40ed28709e2c27)
2015-08-03 17:32:04: 0: INFO: Executing plugin script C:\Users\XXXXX\AppData\Local\Thinkbox\Deadline7\slave\Daedalus\plugins\55bff8643a40ed28709e2c27\3dsmax.py
EDIT: Gah… It did it again. It just really really doesn’t want to render that job. It did one frame successfully then inexplicably quit.
Hello Gavin,
Can you verify if your jobs are being submitted with the interruptible flag enabled? If so, can you verify if the job which supplanted the original one being worked on had a higher place in the scheduling pecking order?
Neither Job is interruptible.
Repository/Job Settings/JobScheduling Order: Pool, Priority, First-In First-Out
Job #1: Group “Global” Pool: “Global” Priority:60 Submission TC: First
Job #2: Group “i7_v02” Pool: “Global” Priority:70 Submission TC: Second
Slave was started up and picked up Job #2 (Pool ==, Priority >: END)
Slave finished one task and then picked up Job #1.
Slave was requeue and went back to Job #2 for one task and then returned to Job #1.
Silly question, but maybe it’s limit groups?
If the higher priority job can’t get the limit it needs then it could take a lower priority.
Pulse used to have a really nice dump of the dequeue process to show you what it was thinking. I’ll talk to the guys and see if we could put that crazy output into the Slave so we can troubleshoot this.
Also found another bug today where I actually did use a limit group. In that instance it renders correctly but the slave filter (Only Show Slaves That Can Render selected job) doesn’t show a whole group of slaves which can (and are in fact rendering it). Removing the limit filter includes them in the filter list but I need the limit filter and it’s purely cosmetic.