In particular Maxwell. Using DL installed as a service using NT Authority/NetworkService user.
If I use auto timeout or a slave hangs without it and the tasks are requeued, then the running instance of Maxwell.exe is not closed down. Please see attachment for process window. There should only be 4 instances of Maxwell per job and as you can see there are 12 instances of Maxwell open.
Using auto timeout settings in the repository options caused chaos across my farm, as each node seemed to timeout after 2 minutes in some cases and in certain instances did not even start one task out of 40 specified for a job, but failed all of them. After a night of rendering, each node could not even open a new window as the process window had over 40 Maxwell.exes and over 40 matching conhost.exes and the log reports state unable to open window:-
2013-09-12 21:45:11: 2: INFO: Executing plugin script C:\Windows\ServiceProfiles\NetworkService\AppData\Local\Thinkbox\Deadline6\slave\R08\plugins\Maxwell.py
2013-09-12 21:45:11: 2: INFO: About: Maxwell Plugin for Deadline
2013-09-12 21:45:11: 2: INFO: The current environment will be used for rendering
2013-09-12 21:45:11: 2: Plugin rendering frame(s): 15
2013-09-12 21:45:11: 3: INFO: Executing plugin script C:\Windows\ServiceProfiles\NetworkService\AppData\Local\Thinkbox\Deadline6\slave\R08\plugins\Maxwell.py
2013-09-12 21:45:11: 3: INFO: About: Maxwell Plugin for Deadline
2013-09-12 21:45:11: 3: INFO: The current environment will be used for rendering
2013-09-12 21:45:11: 3: Plugin rendering frame(s): 16
2013-09-12 21:45:11: 1: STDOUT: QEventDispatcher: Failed to create QEventDispatcherWin32 internal window: 1400
2013-09-12 21:45:11: 1: STDOUT: Qt: Could not initialize OLE (error 80070583)
2013-09-12 21:45:11: 1: STDOUT: QWidget::create: Failed to create window (Cannot find window class.)
2013-09-12 21:45:11: 1: STDOUT: QWidget::create: Failed to create window (Cannot find window class.)
2013-09-12 21:45:11: 1: STDOUT: QWidget::create: Failed to create window (Cannot find window class.)
2013-09-12 21:45:11: 1: STDOUT: QWidget::create: Failed to create window (Cannot find window class.)
2013-09-12 21:45:11: 1: INFO: Process exit code: 255
Ignore the different thread numbers in the log above please, each thread states the same errors.
Additional to show that the slave is trying to unload the plugin and exit main thread:-
2013-09-16 14:03:34: 1: In the process of canceling current task: ignoring exception thrown by PluginLoader
2013-09-16 14:03:34: 1: Unloading plugin: Maxwell
2013-09-16 14:03:47: 1: Shutdown
2013-09-16 14:03:47: 1: Exited ThreadMain(), cleaning up...
As you can see from the process window, this is not happening despite it being reported as such in the log.
Even without auto timeout, the processes are not exiting when either:-
I select a node in the monitor and chose ‘Cancel current task’
I requeue tasks in the monitor for a particular job when the slave has not started rendering. (With Maxwell this happens after bitmaps have loaded and before voxelization begins)
I fail tasks in the monitor and blacklist the slave that rendered them.
I have to manually shutdown the machine via the front power button as these running processes will not allow Windows Server 2008 R2 to kill the processes during OS shutdown or OS machine reboot.
Cheers,
Tim.