Hi there,
On occasion we find a slave freezing up during an IO call in the 3dsmax.py pre-render steps:
Only a manual restart of the slave helped. Pulse never even tries to mark the slave as stalled for some reason …
I would expect either the slave itself detect a timeout, or pulse to eventually mark the slave stalled. Neither happened.
Last lines of the active log (note, its customized functionality, these lines are followed by some Directory.Exists calls which is likely where the freeze happened)
2016-11-19 15:28:31: 0: INFO: Uh oh, this is on the network! Lets copy it local to: "C:\Users\scanlinevfx\AppData\Local\Thinkbox\Deadline8\slave\lapro1874\jobsData\5830dc2903d961224c176600"
2016-11-19 15:28:31: 0: INFO: Getting value of SCL_LOCALIZATION_DISABLE_NFS
2016-11-19 15:28:31: 0: INFO: Value of SCL_LOCALIZATION_DISABLE_NFS =
Then later when this freeze was noticed, and the task manually requeued:
2016-11-20 12:02:08: BEGIN - LAPRO1874\scanlinevfx
2016-11-20 12:02:08: Scheduler Thread - Task "44_1019-1019" could not be found because task has been modified:
2016-11-20 12:02:08: current status = Rendering, new status = Rendering
2016-11-20 12:02:08: current slave = LAPRO1874, new slave = LAPRO0709
2016-11-20 12:02:08: current frames = 1019-1019, new frames = 1019-1019
2016-11-20 12:02:08: Scheduler Thread - Cancelling task...
2016-11-20 12:03:40: Scheduler Thread - Task "44_1019-1019" could not be found because task has been modified:
2016-11-20 12:03:40: current status = Rendering, new status = Completed
2016-11-20 12:03:40: current slave = LAPRO1874, new slave = LAPRO0709
2016-11-20 12:03:40: current frames = 1019-1019, new frames = 1019-1019
2016-11-20 12:03:40: Scheduler Thread - Cancelling task...
2016-11-20 12:05:11: Scheduler Thread - Task "44_1019-1019" could not be found because task has been modified:
2016-11-20 12:05:11: current status = Rendering, new status = Completed
2016-11-20 12:05:11: current slave = LAPRO1874, new slave = LAPRO0709
2016-11-20 12:05:11: current frames = 1019-1019, new frames = 1019-1019
2016-11-20 12:05:11: Scheduler Thread - Cancelling task...
2016-11-20 12:06:41: Scheduler Thread - Task "44_1019-1019" could not be found because task has been modified:
2016-11-20 12:06:41: current status = Rendering, new status = Completed
2016-11-20 12:06:41: current slave = LAPRO1874, new slave = LAPRO0709
2016-11-20 12:06:41: current frames = 1019-1019, new frames = 1019-1019
2016-11-20 12:06:41: Scheduler Thread - Cancelling task...
2016-11-20 12:08:13: Scheduler Thread - Task "44_1019-1019" could not be found because task has been modified:
2016-11-20 12:08:13: current status = Rendering, new status = Completed
2016-11-20 12:08:13: current slave = LAPRO1874, new slave = LAPRO0709
2016-11-20 12:08:13: current frames = 1019-1019, new frames = 1019-1019
2016-11-20 12:08:13: Scheduler Thread - Cancelling task...
2016-11-20 12:09:47: Scheduler Thread - Task "44_1019-1019" could not be found because task has been modified:
2016-11-20 12:09:47: current status = Rendering, new status = Completed
2016-11-20 12:09:47: current slave = LAPRO1874, new slave = LAPRO0709
2016-11-20 12:09:47: current frames = 1019-1019, new frames = 1019-1019
2016-11-20 12:09:47: Scheduler Thread - Cancelling task...
2016-11-20 12:11:21: Scheduler Thread - Task "44_1019-1019" could not be found because task has been modified:
2016-11-20 12:11:21: current status = Rendering, new status = Completed
2016-11-20 12:11:21: current slave = LAPRO1874, new slave = LAPRO0709
2016-11-20 12:11:21: current frames = 1019-1019, new frames = 1019-1019
2016-11-20 12:11:21: Scheduler Thread - Cancelling task...
2016-11-20 12:12:54: Scheduler Thread - Task "44_1019-1019" could not be found because task has been modified:
2016-11-20 12:12:54: current status = Rendering, new status = Completed
2016-11-20 12:12:54: current slave = LAPRO1874, new slave = LAPRO0709
2016-11-20 12:12:54: current frames = 1019-1019, new frames = 1019-1019
2016-11-20 12:12:54: Scheduler Thread - Cancelling task...
2016-11-20 12:14:27: Scheduler Thread - Task "44_1019-1019" could not be found because task has been modified:
2016-11-20 12:14:27: current status = Rendering, new status = Completed
2016-11-20 12:14:27: current slave = LAPRO1874, new slave = LAPRO0709
2016-11-20 12:14:27: current frames = 1019-1019, new frames = 1019-1019
2016-11-20 12:14:27: Scheduler Thread - Cancelling task...
2016-11-20 12:16:00: Scheduler Thread - Task "44_1019-1019" could not be found because task has been modified:
2016-11-20 12:16:00: current status = Rendering, new status = Completed
2016-11-20 12:16:00: current slave = LAPRO1874, new slave = LAPRO0709
2016-11-20 12:16:00: current frames = 1019-1019, new frames = 1019-1019
2016-11-20 12:16:00: Scheduler Thread - Cancelling task...
Eventually the slave process was manually killed and restarted:
2016-11-20 12:20:32: BEGIN - LAPRO1874\scanlinevfx
2016-11-20 12:20:32: Deadline Slave 8.0 [v8.0.10.4 Release (c19fd2cef)]
2016-11-20 12:20:36: Auto Configuration: A ruleset has been received from Pulse
2016-11-20 12:20:37: Plugin sandboxing will not be used because it is disabled in the Repository Options.
2016-11-20 12:20:37: Info Thread - Created.
2016-11-20 12:20:37: Slave 'LAPRO1874' has stalled because of an unexpected exit. Performing house cleaning...
Any advise on how we could avoid having tasks stuck like that till someone notices? May be some timeout settings are not properly configured, etc?
The job’s max task rendertime limit was set to 40 mins. and the global 3dsmax timeouts are at:
Loading 3dsmax: 1000 seconds
Starting job: 3600 seconds
Progress Updates: 8000 seconds
This is with a mixed 8.0.5 (pulse) / 8.0.10 (slaves) environment.