Invalid packet size exception?

We had commented out the code because we improved ShutdownMonitoredManagedProcess to clean things up better, but maybe there are still cases where things slip through the crack. Maybe it’s worth trying to uncomment some of the code to see if that helps. Here is the original code:

			#processIDs = GetMonitoredManagedProcessIDs( self.ProgramName )
			#LogInfo( "3dsmax process has %d objects" % len( processIDs ) )
			
			LogInfo( "Waiting for 3dsmax to shut down" )
			ShutdownMonitoredManagedProcess( self.ProgramName )
			#Sleep( 10000 )
			
			#LogInfo( "Terminating 3dsmax child processes" )
			#for id in processIDs:
			#	KillParentAndChildProcesses( id )
			
			#ShutdownMonitoredManagedProcess( self.ProgramName )
			
			LogInfo( "3dsmax has shut down" )

You can probably uncomment everything except the last ShutdownMonitoredManagedProcess, since that won’t really do anything now. Here’s what the new code should look like:

			processIDs = GetMonitoredManagedProcessIDs( self.ProgramName )
			LogInfo( "3dsmax process has %d objects" % len( processIDs ) )
			
			LogInfo( "Waiting for 3dsmax to shut down" )
			ShutdownMonitoredManagedProcess( self.ProgramName )
			Sleep( 10000 )
			
			LogInfo( "Terminating 3dsmax child processes" )
			for id in processIDs:
				KillParentAndChildProcesses( id )
			
			#ShutdownMonitoredManagedProcess( self.ProgramName )
			
			LogInfo( "3dsmax has shut down" )

If after making this change you guys start to notice an improvement, then we will just do this in the upcoming 5.1 release.

Cheers,

  • Ryan

Hi,
OK, I made the changes Ryan highlighted above exactly 1 week ago now and I would say, the results are looking pretty ‘reasonable’.
The number of occurrences of the:

An error occurred in StartJob(): 3dsmax startup: Did not receive expected token from lightning plugin. (got "") - check the install of lightningMax.dlx

have definitely gone down. Interestingly, I do think this has resulted in more “stalled slaves” in my setup, which I believe were still happening but I wasn’t getting the actual reporting back to me that they had stalled and instead were just hanging.
Overall, I think things are better with the Process kill function.

On the “maxscript auto-load errors” issue, my current thinking is focused around the Ribbon and a possible corruption…WIP. However, I don’t ever see this error for the Deadline 3dsMax plugin BUT only for a custom plugin of mine. So, YMMV.

Mike

Hey Mike,

Thanks for the update! We will roll this change out with 5.1.

Cheers,

  • Ryan

Sorry, I forgot to add that for each of the stalled slaves, not only does the 3dsMax get killed but the Deadline Slave gets killed as well.
This is now consistent for any stalled slave as a result of the 3dsmax plugin going bang.
Is it possible that the “KillParentAndChildProcesses” function is killing the Slave app as well, somehow?
Mike

Highly unlikely. Note that “KillParentAndChildProcesses” is called when any 3dsmax job completes, so if the slave process was somehow included, it would always get killed.

It could be that the slave is just a casualty of whatever is causing 3dsmax to explode in the first place.

Ah ha…and that cause is for me, most likely, the Mental Ray memory issues I have been having recently with certain scene files.

Thanks,
Mike

A very interesting discussion, indeed.

However, in my case, I can’t think of any reason why Max would go belly-up. :confused: I don’t use mr and my scenes very rarely go above 4GB of RAM (8GB available on my nodes).

FYI.
After a reasonable period of time testing this new code (terminate 3dsMax process = enabled), yesterday I have now reverted the code back to the original setup where the terminate 3dsMax process code is commented out. I’m now going to leave it a week or so for more testing. Last week we got so many stalled slaves on a regular basis, it was killing us having to go in and restart the slave as it had been killed AND the launcher as well had been killed. So, reverted the code and now we wait to see…as we are still hitting the farm very heavy at the moment, so its a good time to stress test :slight_smile:
Mike

Best of luck, man. :slight_smile:

Sounds good Mike! Keep us posted.

Hi All,
Update for everyone.
In the last 7 days, we have been rendering quite heavily and I’ve not seen a single stalled slave, beyond a couple of ‘known’ machines which I know have hardware issues, which can be discounted. So, for me, I’m keeping the plugin code the way it was in the beginning. I believe Thinkbox has got it right and 3dsMax is closed down properly and there is NO need to force kill the 3dsmax process as for me, it seems 50% of the time, also take down the Deadline Slave & launcher app’s which actually causes me a lot more work to fix. So, by keeping the code as it was originally and ensuring you regularly restart all your render nodes, ie: at least once a week, I think this is the optimum setup :slight_smile:
I’ll continue to keep my eye open for any patterns.
Mike

Hey Mike,

Thanks for the update. We’ll keep the plugin code the way it was.

Cheers,

  • Ryan