AWS Thinkbox Discussion Forums

Plugin CoronaDR is not killing DrServer process

I’ve noticed that CoronaDR is not properly closing DrServer.exe process after job is marked as completed, suspended, failed or requeued.

First of all I checked the log file and found that all the CoronaDR jobs are finishing with error which is explicitly silenced. See the last line of the log below. I’m not sure if this is good or bad just wanted to point this out.

2017-04-04 15:14:38:  0: Plugin will be reloaded because a new job has been loaded.
2017-04-04 15:14:38:  0: Loading Job's Plugin timeout is Disabled
2017-04-04 15:14:40:  0: Loaded plugin CoronaDR
2017-04-04 15:14:40:  0: Executing plugin command of type 'Sync Files for Job'
2017-04-04 15:14:40:  0: All job files are already synchronized
2017-04-04 15:14:40:  0: Plugin CoronaDR was already synchronized.
2017-04-04 15:14:40:  0: Done executing plugin command of type 'Sync Files for Job'
2017-04-04 15:14:40:  0: Executing plugin command of type 'Initialize Plugin'
2017-04-04 15:14:40:  0: INFO: Executing plugin script '[...]\AppData\Local\Thinkbox\Deadline9\slave\[...]\plugins\58e3904d9066041694d1c1c6\CoronaDR.py'
2017-04-04 15:14:41:  0: INFO: Corona DR Plugin Initializing...
2017-04-04 15:14:41:  0: INFO: About: Corona DR Plugin for Deadline
2017-04-04 15:14:41:  0: INFO: Render Job As User disabled, running as current user '[...]'
2017-04-04 15:14:41:  0: INFO: The job's environment will be merged with the current environment before rendering
2017-04-04 15:14:41:  0: Done executing plugin command of type 'Initialize Plugin'
2017-04-04 15:14:41:  0: Start Job timeout is disabled.
2017-04-04 15:14:41:  0: Task timeout is disabled.
2017-04-04 15:14:41:  0: Loaded job: Untitled - Corona DR Job (58e3904d9066041694d1c1c6)
2017-04-04 15:14:41:  0: Executing plugin command of type 'Start Job'
2017-04-04 15:14:41:  0: INFO: Executable: C:\Program Files\Corona\DrServer.exe
2017-04-04 15:14:41:  0: INFO: Existing DR Process: Kill On Existing Process
2017-04-04 15:14:41:  0: INFO: DR Auto Close: True
2017-04-04 15:14:41:  0: INFO: DR Close Timeout: 1800 seconds
2017-04-04 15:14:41:  0: Done executing plugin command of type 'Start Job'
2017-04-04 15:14:41:  0: Plugin rendering frame(s): 1
2017-04-04 15:14:41:  0: Executing plugin command of type 'Render Task'
2017-04-04 15:14:41:  0: INFO: Starting monitored managed process CoronaDr
2017-04-04 15:14:41:  0: INFO: Corona DrServer job starting...
2017-04-04 15:14:41:  0: INFO: Stdout Redirection Enabled: True
2017-04-04 15:14:41:  0: INFO: Stdout Handling Enabled: True
2017-04-04 15:14:41:  0: INFO: Popup Handling Enabled: True
2017-04-04 15:14:41:  0: INFO: QT Popup Handling Enabled: False
2017-04-04 15:14:41:  0: INFO: WindowsForms10.Window.8.app.* Popup Handling Enabled: False
2017-04-04 15:14:41:  0: INFO: Using Process Tree: True
2017-04-04 15:14:41:  0: INFO: Hiding DOS Window: True
2017-04-04 15:14:41:  0: INFO: Creating New Console: False
2017-04-04 15:14:41:  0: INFO: Running as user: [...]
2017-04-04 15:14:41:  0: INFO: Executable: "C:\Program Files\Corona\DrServer.exe"
2017-04-04 15:14:41:  0: INFO: Argument: 
2017-04-04 15:14:41:  0: INFO: Full Command: "C:\Program Files\Corona\DrServer.exe"
2017-04-04 15:14:41:  0: INFO: Startup Directory: "C:\Program Files\Corona"
2017-04-04 15:14:41:  0: INFO: Process Priority: BelowNormal
2017-04-04 15:14:41:  0: INFO: Process Affinity: default
2017-04-04 15:14:41:  0: INFO: Process is now running
2017-04-04 15:14:43:  0: STDOUT: 2017-04-04 15:14:41   DR server started
2017-04-04 15:14:43:  0: STDOUT: 2017-04-04 15:14:41   Starting UDP socket, local hostname: [...]
2017-04-04 15:14:43:  0: STDOUT: 2017-04-04 15:14:41   Running Corona DrServer build Nov 14 2016 on ports UDP19666, TCP19668, loopback TCP19667
2017-04-04 15:14:43:  0: STDOUT: 2017-04-04 15:14:41   Available 3dsmax versions:
2017-04-04 15:14:43:  0: STDOUT: 2017-04-04 15:14:41   2017: C:/Program Files/Autodesk/3ds Max 2017/
2017-04-04 15:14:43:  0: STDOUT: 2017-04-04 15:14:41   Corona Distributed Rendering Server is running in UAC elevated mode ("Run as Administrator"). This can cause problems with missing textures during rendering. New builds of DR server (including this one) no longer require running as admin, so you should launch the application again without elevation. See https://corona-renderer.com/link/2011 for more details.
2017-04-04 15:15:12:  0: Executing plugin command of type 'Cancel Task'
2017-04-04 15:15:12:  0: Done executing plugin command of type 'Cancel Task'
2017-04-04 15:15:13:  0: Done executing plugin command of type 'Render Task'
2017-04-04 15:15:13:  0: In the process of canceling current task: ignoring exception thrown by PluginLoader

Then I checked the CoronaDR.py file and found that there is an error. In the CoronaDRPlugin class there is a definition for CoronaDrProcess attribute (which is used in Cleanup method), but it is never initialized. Instead in RenderTask method you are initializing self.CoronaProcess attribute. Because of this the CoronaDRPlugin.Cleanup method never calls CoronaDrProcess.Cleanup method.

Original code fragment with comments

class CoronaDRPlugin( DeadlinePlugin ):

    Executable = ""
    CoronaDrProcess = None

** cut **

    def Cleanup( self ):
        del self.InitializeProcessCallback
        del self.StartJobCallback
        del self.RenderTasksCallback
        del self.EndJobCallback

        # this is never executed due to being None
        if self.CoronaDrProcess:
            self.CoronaDrProcess.Cleanup()
            del self.CoronaDrProcess
            
** cut **

    def RenderTasks( self ):
        # different name for attributed used to store reference to corona process
        self.CoronaProcess = CoronaDrProcess( self, self.Executable )
 
        if self.DRAutoClose:
        
** cut **

So, I fixed this by changing the CoronaDrProcess attribute name to CoronaProcess. In the effect the CoronaDrProcess.Cleanup was called properly at he end of the job. But it didn’t solve my problem. The DrServer.exe process was still being orphaned. This leads me to EndJob method of the CoronaDRPlugin class. I’ve noticed that this method is never called. Not sure if this is because of the error I found in the log. This something you should check, because I was unable to find out why this method is not being called.

    def EndJob( self ):
        self.LogInfo( "Ending Corona DrServer Job" )
        self.ShutdownMonitoredManagedProcess( self.ProcessName )

Finally I fixed my problem by extending the Cleanup method of CoronaDRPlugin.Cleanup to include call to the ShutdownMonitoredManagedProcess method.

    def Cleanup( self ):
        del self.InitializeProcessCallback
        del self.StartJobCallback
        del self.RenderTasksCallback
        del self.EndJobCallback

        if self.CoronaProcess:
            self.CoronaProcess.Cleanup()
            del self.CoronaProcess
            self.ShutdownMonitoredManagedProcess( self.ProcessName )

This is pretty complicated so I’ve attached CoronaDR.py (as txt file) with fix that I’ve made, so you can run proper diff on it.
CoronaDR.txt (10.4 KB)

Hope this will help you. If you have any questions feel free to ask will try to help as much as I can.

Further work shows that almost the same problem applies to VRay Spawner plugin. Stopping the VRay Spawner task does not kill vrayspawnerXXXX.exe process on render node.

This time there is no problem with attribute naming mismatch. But the VraySpawnerPlugin.EndJob method is not being called which should kill the process.

In slave log I found the same error as in CoronaDR, but I don’t know if it matters.

** cut **

 STDOUT: VRAY: Broadcasting TM_SERVER_STARTED after start up
2017-04-05 08:27:49:  0: Executing plugin command of type 'Cancel Task'
2017-04-05 08:27:49:  0: Done executing plugin command of type 'Cancel Task'
2017-04-05 08:27:50:  0: Done executing plugin command of type 'Render Task'
2017-04-05 08:27:50:  0: In the process of canceling current task: ignoring exception thrown by PluginLoader

To fix this I’ve added explicit process kill to VraySpawnerPlugin.Cleanup method

    def Cleanup( self ):
        del self.InitializeProcessCallback
        del self.StartJobCallback
        del self.RenderTasksCallback
        del self.EndJobCallback

        if self.VrayProcess:
            self.VrayProcess.Cleanup()
            del self.VrayProcess
            # FIX: kill process 
            self.ShutdownMonitoredManagedProcess( self.ProcessName )

This problem was frustrating, because we are using VRay Spawner and 3dsCmd jobs with DBR offloading daily. As a result of this error the vrayspawnerXXXX.exe process left by VRay Spawner plugin immediately crashed the 3dsCmd job with DBR with the following error message:

Error: Error: V-Ray DBR: V-Ray Spawner is already running, please shut it down before rendering with Deadline
   w Deadline.Plugins.PluginWrapper.RenderTasks(String taskId, Int32 startFrame, Int32 endFrame, String& outMessage, AbortLevel& abortLevel)

Maybe in future version you will add the option “Handle existing DR/DBR process: Kill on existing process” to 3dmax and 3dCmd plugins. This would be great :slight_smile:

So it looks like I’ve made wrong assumption regarding EndJob method call.

I’ve run some tests on custom 3dscmd plugin to check if EndJob is ever invoked. It looks like the EndJob method is invoked when the job is finished by slave, which works as documented.

From documentation

Deadline.Plugins.DeadlinePlugin Class Reference

** cut ** 

EndJobCallback
 	This is for Advanced plugins only.If a function is assigned to this callback, it will be called when the slave finishes up the job. More...
 	

This means that EndJob is not called when the job is manually marked as finished from monitor (requeue, fail, completed etc). CoronaDR and VRay Submission are by default stopped manually from monitor (auto-timeout was not tested). Because of this the EndJob is not called and corresponding process is not killed.

Well that’s surprising! No one’s reported the spawner hanging around before.

I have to guess this is a core issue… Which version of Deadline are you running?

All above tests were made on the DL 9.0.0.18. This problem was bothering us in previous versions too, but I didn’t have time to check into this. All the above problems should be replicable in DL 8.0.13.3.

For your information. This problem still persists in DL 9.0.2.0

I did some testing with the standalone V-Ray plugin and that seems to work fine. I’d missed that the issue was within the Max plugin, so I’m waiting on some testing over here.

We did just recently add support for shutting down the DR servers (V-Ray, Corona) if they are already running, but I still don’t see us forcibly closing the spawner in the code. We forcably close Max, but not the spawner. I’ll make sure we’ve got a dev issue for that.

Thanks Tomasz for all your testing notes. The typo in the CoronaProcess name has now been fixed up internally and we have identified a possible issue in our EndJob code which we will investigate further.

Privacy | Site terms | Cookie preferences