AWS Thinkbox Discussion Forums

Network service failing to exit render engine processes

In particular Maxwell. Using DL installed as a service using NT Authority/NetworkService user.

If I use auto timeout or a slave hangs without it and the tasks are requeued, then the running instance of Maxwell.exe is not closed down. Please see attachment for process window. There should only be 4 instances of Maxwell per job and as you can see there are 12 instances of Maxwell open.


Using auto timeout settings in the repository options caused chaos across my farm, as each node seemed to timeout after 2 minutes in some cases and in certain instances did not even start one task out of 40 specified for a job, but failed all of them. After a night of rendering, each node could not even open a new window as the process window had over 40 Maxwell.exes and over 40 matching conhost.exes and the log reports state unable to open window:- 2013-09-12 21:45:11: 2: INFO: Executing plugin script C:\Windows\ServiceProfiles\NetworkService\AppData\Local\Thinkbox\Deadline6\slave\R08\plugins\Maxwell.py 2013-09-12 21:45:11: 2: INFO: About: Maxwell Plugin for Deadline 2013-09-12 21:45:11: 2: INFO: The current environment will be used for rendering 2013-09-12 21:45:11: 2: Plugin rendering frame(s): 15 2013-09-12 21:45:11: 3: INFO: Executing plugin script C:\Windows\ServiceProfiles\NetworkService\AppData\Local\Thinkbox\Deadline6\slave\R08\plugins\Maxwell.py 2013-09-12 21:45:11: 3: INFO: About: Maxwell Plugin for Deadline 2013-09-12 21:45:11: 3: INFO: The current environment will be used for rendering 2013-09-12 21:45:11: 3: Plugin rendering frame(s): 16 2013-09-12 21:45:11: 1: STDOUT: QEventDispatcher: Failed to create QEventDispatcherWin32 internal window: 1400 2013-09-12 21:45:11: 1: STDOUT: Qt: Could not initialize OLE (error 80070583) 2013-09-12 21:45:11: 1: STDOUT: QWidget::create: Failed to create window (Cannot find window class.) 2013-09-12 21:45:11: 1: STDOUT: QWidget::create: Failed to create window (Cannot find window class.) 2013-09-12 21:45:11: 1: STDOUT: QWidget::create: Failed to create window (Cannot find window class.) 2013-09-12 21:45:11: 1: STDOUT: QWidget::create: Failed to create window (Cannot find window class.) 2013-09-12 21:45:11: 1: INFO: Process exit code: 255 Ignore the different thread numbers in the log above please, each thread states the same errors.

Additional to show that the slave is trying to unload the plugin and exit main thread:-

2013-09-16 14:03:34: 1: In the process of canceling current task: ignoring exception thrown by PluginLoader 2013-09-16 14:03:34: 1: Unloading plugin: Maxwell 2013-09-16 14:03:47: 1: Shutdown 2013-09-16 14:03:47: 1: Exited ThreadMain(), cleaning up...
As you can see from the process window, this is not happening despite it being reported as such in the log.

Even without auto timeout, the processes are not exiting when either:-
I select a node in the monitor and chose ‘Cancel current task’
I requeue tasks in the monitor for a particular job when the slave has not started rendering. (With Maxwell this happens after bitmaps have loaded and before voxelization begins)
I fail tasks in the monitor and blacklist the slave that rendered them.

I have to manually shutdown the machine via the front power button as these running processes will not allow Windows Server 2008 R2 to kill the processes during OS shutdown or OS machine reboot.

Cheers,

Tim.

Hi Tim,

Have you confirmed whether or not this only happens when Deadline is running as a service? If not, can you run the slave as a normal application on one of your nodes and try to reproduce this behavior?

Thanks!

  • Ryan

Hi Ryan, I have installed beta 4 without services and will run an animation job with the same timeout settings tonight to try and reproduce this. So far, the 20 or so jobs I’ve run as slaves seem to be ok.

Cheers,

Tim.

So far no issues Ryan, although I didn’t have to cancel any tasks. I still need to re-implement the auto task time out options though. Will set this today and run more tests and hopefully with beta 6.

Tim.

So, since posting my reply yesterday, the same has happened. This is without any timeouts but using cancel current task from right clicking the slaves in their section of the monitor. Beta 4 still as I haven’t managed to install the latest yet. (Win server 2008 R2 SP1 x64)

Do you need any logs from me, although there is nothing to show really?

Thanks,

Tim.

Hi Tim,

I’m not having any luck reproducing this here. I’ve tried a job with 4 concurrent tasks, and I’ve tried canceling it from the Remote Control menu in the Monitor, from the Slave’s Control menu, and by suspending a job.

How reliably can you reproduce this on your end?

Also, do you guys use power management there? I’m wondering if maybe the slave crashes (leaving its maxwell processes running), and then power management starts it back up again. Maybe the problem you’re seeing isn’t related to task canceling…

Cheers,

  • Ryan
Privacy | Site terms | Cookie preferences