We have an odd issue that surfaces from time to time here for us… Our sequential, single machine jobs sometimes pick up on 2 machines at the same time. I’ve suspected one to be stalled, and thus another one becomes active, but we have some logs now that suggest otherwise.
For example:
lapro1813 dequeuing task 0:
2015-09-04 19:32:04: 0: Got task!
2015-09-04 19:32:04: 0: Plugin will be reloaded because a new job has been loaded, or one of the job files or plugin files has been modified
2015-09-04 19:32:04: Constructor: 3dsmax
2015-09-04 19:32:04: 0: Loaded plugin 3dsmax (\\inferno2\deadline\repository7\plugins\3dsmax)
2015-09-04 19:32:04: 0: Start Job timeout is disabled.
2015-09-04 19:32:04: 0: Task timeout is 12000 seconds (Regular Task Timeout)
2015-09-04 19:32:04: 0: Loaded job: [INDY] FIS_000_0130_v0016_str_Rework_cache_flowline_Cloud_0 (55ea53c802949897e0146966)
... preload script execution, max init, file load activity
2015-09-04 19:35:34: 0: Plugin rendering frame(s): 1000-1004
2015-09-04 19:35:36: 0: INFO: Render Tasks called
2015-09-04 19:35:36: 0: INFO: STARTED
lapr0790 dequeuing task 1 right after:
2015-09-04 19:33:13: 0: Got task!
2015-09-04 19:33:13: 0: Plugin will be reloaded because a new job has been loaded, or one of the job files or plugin files has been modified
2015-09-04 19:33:13: Constructor: 3dsmax
2015-09-04 19:33:13: 0: Loaded plugin 3dsmax (\\inferno2\deadline\repository7\plugins\3dsmax)
2015-09-04 19:33:13: 0: Start Job timeout is disabled.
2015-09-04 19:33:13: 0: Task timeout is 12000 seconds (Regular Task Timeout)
2015-09-04 19:33:13: 0: Loaded job: [INDY] FIS_000_0130_v0016_str_Rework_cache_flowline_Cloud_0 (55ea53c802949897e0146966)
... preload script execution, max init, file load activity
2015-09-04 19:36:36: 0: Plugin rendering frame(s): 1005-1009
2015-09-04 19:36:37: 0: INFO: Render Tasks called
2015-09-04 19:36:37: 0: INFO: STARTED
lapro1813 finishing its task0:
2015-09-04 19:37:27: 0: Render time for frame(s): 5.393 m
2015-09-04 19:37:27: 0: Total time for task: 5.598 m
2015-09-04 19:37:30: 0: Saving task log...
2015-09-04 19:37:31: Scheduler Thread - Render Thread 0 completed its task
2015-09-04 19:37:31: Scheduler Thread - Seconds before next job scan: 2
2015-09-04 19:37:33: Scheduler - Performing Job scan on Primary Pools with scheduling order Pool, Weighted, Balanced
2015-09-04 19:37:33: Scheduler - Using enhanced scheduler balancing
2015-09-04 19:37:33: Scheduler - Limit Stub for '55ea53c802949897e0146966' (held by 'lapro1813') was removed.
2015-09-04 19:37:33: Scheduler - Preliminary check: Slave is not whitelisted for 55d61247bb649a8bbcaa02d1 limit.
2015-09-04 19:37:33: Scheduler - Preliminary check: Slave is not whitelisted for 55de16bdbb649a2750a438dd limit.
2015-09-04 19:37:33: Scheduler - Returning 55e8e454b25c5ae6a8cc84b7
2015-09-04 19:37:33: Scheduler - Returning 55e8f6a3b25c5a2178faf52f
2015-09-04 19:37:33: Scheduler - Preliminary check: The 55ea2a9310f85222e0820cd7 limit is maxed out.
2015-09-04 19:37:33: Scheduler - Preliminary check: The 55ea2a9710f852c50c58cc58 limit is maxed out.
2015-09-04 19:37:33: Scheduler - Preliminary check: The 55ea2c5f10f852305c31e2be limit is maxed out.
... etc...
It dequeues another job after this.
Lapro0790 keeps going as expected:
2015-09-04 19:39:50: 0: Saving task log...
2015-09-04 19:39:51: Scheduler Thread - Render Thread 0 completed its task
2015-09-04 19:39:51: Scheduler Thread - Seconds before next job scan: 2
2015-09-04 19:39:53: Scheduler - Performing Job scan on Primary Pools with scheduling order Pool, Weighted, Balanced
2015-09-04 19:39:53: Scheduler - Using enhanced scheduler balancing
2015-09-04 19:39:53: Scheduler - Successfully dequeued 1 task(s). Returning.
2015-09-04 19:39:53: Scheduler - Returning limit stubs not in use.
2015-09-04 19:39:53: ScanlineEventListener _init__ called, wiping PYTHONPATH from deadlineslave's embedded python
2015-09-04 19:39:53: Enabling render stat mail
2015-09-04 19:39:53: Loading event plugin Scanline (\\inferno2\deadline\repository7\events\Scanline)
2015-09-04 19:39:53: Scheduler Thread - Synchronizing job auxiliary files from \\inferno2\deadline\repository7\jobs\55ea53c802949897e0146966
2015-09-04 19:39:53: Scheduler Thread - All job files are already synchronized
2015-09-04 19:39:53: ScanlineEventListener _init__ called, wiping PYTHONPATH from deadlineslave's embedded python
2015-09-04 19:39:53: Enabling render stat mail
2015-09-04 19:39:53: Loading event plugin Scanline (\\inferno2\deadline\repository7\events\Scanline)
2015-09-04 19:39:56: 0: Got task!
2015-09-04 19:39:56: 0: Plugin already loaded: 3dsmax
2015-09-04 19:39:56: 0: Start Job timeout is disabled.
2015-09-04 19:39:56: 0: Task timeout is 12000 seconds (Regular Task Timeout)
2015-09-04 19:39:56: 0: Job already loaded: [INDY] FIS_000_0130_v0016_str_Rework_cache_flowline_Cloud_0
2015-09-04 19:39:56: 0: Plugin rendering frame(s): 1010-1014
2015-09-04 19:39:57: 0: INFO: Render Tasks called
2015-09-04 19:39:57: 0: INFO: STARTED
Is it possible that the dequeue and actual limit stub acquirement is not happening at the same time? Could an excessively long preload script delay the stub acquiry?