Our max load timeout is set to 1000 seconds (which equates to roughly 16 mins). But we are noticing that this is not a hard limit. In some cases it takes 30-40 mins for deadline to realize its already timed out:
Note that this too 37 mins:
2015-09-29 02:17:04: 0: INFO: Process Priority: BelowNormal
2015-09-29 02:17:04: 0: INFO: Process Affinity: default
2015-09-29 02:17:04: 0: INFO: Process is now running
2015-09-29 02:17:04: 0: INFO: Waiting for connection from 3dsmax
2015-09-29 02:17:07: 0: STDOUT: - 02:17:07.000 INFO: Begin setting up global configuration
....
2015-09-29 02:17:09: 0: STDOUT: - 02:17:08.748 INFO: End registering loaded plugins
2015-09-29 02:17:11: 0: INFO: Ignoring popup "MAXScript Debugger"
2015-09-29 02:54:31: 0: An exception occurred: Error: 3dsmax startup: Timed out waiting for 3ds max to start - consider increasing the LoadMaxTimeout in the 3dsmax plugin configuration
2015-09-29 02:54:31: at Deadline.Plugins.ScriptPlugin.StartJob(Job job, String& outMessage, AbortLevel& abortLevel) (Deadline.Plugins.RenderPluginException)
2015-09-29 02:54:31: 0: Unloading plugin: 3dsmax
2015-09-29 02:54:32: Scheduler Thread - Render Thread 0 threw a major error:
2015-09-29 02:54:32: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2015-09-29 02:54:32: Exception Details
2015-09-29 02:54:32: RenderPluginException -- Error: 3dsmax startup: Timed out waiting for 3ds max to start - consider increasing the LoadMaxTimeout in the 3dsmax plugin configuration
2015-09-29 02:54:32: at Deadline.Plugins.ScriptPlugin.StartJob(Job job, String& outMessage, AbortLevel& abortLevel)
2015-09-29 02:54:32: RenderPluginException.Cause: JobError (2)
2015-09-29 02:54:32: RenderPluginException.Level: Major (1)
2015-09-29 02:54:32: RenderPluginException.HasSlaveLog: True
2015-09-29 02:54:32: RenderPluginException.SlaveLogFileName: C:\ProgramData\Thinkbox\Deadline7\logs\deadlineslave_renderthread_0-LAPRO1360-0000.log
2015-09-29 02:54:32: Exception.Data: ( )
2015-09-29 02:54:32: Exception.TargetSite: Void StartJob(Deadline.Jobs.Job)
2015-09-29 02:54:32: Exception.Source: deadline
2015-09-29 02:54:32: Exception.HResult: -2146233088
2015-09-29 02:54:32: Exception.StackTrace:
2015-09-29 02:54:32: at Deadline.Plugins.Plugin.StartJob(Job job)
2015-09-29 02:54:32: at Deadline.Slaves.SlaveRenderThread.a(TaskLogWriter A_0)
2015-09-29 02:54:32: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Another thats close to 37 mins:
2015-09-29 03:22:47: 0: INFO: Process is now running
2015-09-29 03:22:47: 0: INFO: Waiting for connection from 3dsmax
2015-09-29 03:22:51: 0: STDOUT: - 03:22:50.000 INFO: Begin setting up global configuration
..
2015-09-29 03:22:52: 0: STDOUT: - 03:22:51.764 INFO: End registering loaded plugins
2015-09-29 03:22:55: 0: INFO: Ignoring popup "MAXScript Debugger"
2015-09-29 04:00:14: 0: An exception occurred: Error: 3dsmax startup: Timed out waiting for 3ds max to start - consider increasing the LoadMaxTimeout in the 3dsmax plugin configuration
2015-09-29 04:00:14: at Deadline.Plugins.ScriptPlugin.StartJob(Job job, String& outMessage, AbortLevel& abortLevel) (Deadline.Plugins.RenderPluginException)
Another 37 min timeout:
2015-09-29 04:09:19: 0: STDOUT: - 04:09:19.608 INFO: Begin registering loaded plugins
2015-09-29 04:09:19: 0: STDOUT: - 04:09:19.764 INFO: End registering loaded plugins
2015-09-29 04:09:22: 0: INFO: Ignoring popup "MAXScript Debugger"
2015-09-29 04:46:41: 0: An exception occurred: Error: 3dsmax startup: Timed out waiting for 3ds max to start - consider increasing the LoadMaxTimeout in the 3dsmax plugin configuration
Any ideas?
Hi Laszlo,
Just to confirm a few things here as the timeout seems to be working ok for me here. You are using the 3dsmax.py plugin file as ships in Deadline v7.2.0.18, with support to handle any Qt based pop-up dialogs? This occurs on machines running a single Slave instance? Could I see a full job log report so I can get some more context here and see if anything else stands out and also to confirm your various setup/3dsmax/BB versions, etc.
We are on 7.1.0.32 with some customizations. There were no popups in this case as far as i can tell - i can remote into the machine and max is just hanging there with one cpu going 100%.
This particular example was from a machine that does not have a secondary slave running.
Using max 2015. Attached a full log, i hope it helps
lapro1360_hang_log.txt (56.2 KB)
Hi Laszlo,
Thanks for the log/info. It helped me put things more into context.
So, locally, I’ve been unable to get 3ds Max 2015 to ‘hang’ on startup beyond the hard timeout limit. So, few more thoughts/questions:
-
Is Windows Firewall enabled on these rendernodes? If so, can you make sure these exceptions are applied:
docs.thinkboxsoftware.com/produc … iderations
-
Any pattern to this issue? Only certain machines? Only rendernodes and not workstations? Does it only ever happen with 3ds Max 2015?
-
Do these machines have routable access to the outside world? Can they get to the public internet?
-
Has 3ds Max 2015 been started up at least once on these ‘troublesome’ machines as Administrator? In the docs (same section as firewall considerations) we mention this as follows:
Now, I’m not entirely sure why the above helps, but I have noticed that once it’s been started up at least once like this, 3ds Max does seem to behave itself more and I think it also indirectly fixes the ADSK Comms Center by giving it enough permission to ‘sort itself’ from there on in.
-
If at all possible, can you login to one of these machines if you notice it taking too long to startup 3ds Max and has most definitely gone past your 1000 timeout limit and see in Task Manager if any process which starts with the name: “WSCommCntr*” is running? If you kill it, does 3ds Max suddenly come back to life?
-
In the past, when I’ve seen this issue, it’s been the ADSK Comms Center process getting stuck trying to ‘phone home’. Hence the question above about internet access. You could have a look at the log file:
C:\Users\%username%\AppData\Local\Autodesk\3dsMax\2015 - 64bit\ENU\InfoCenter.log
This is healthy output, anything else is generally bad and points at this ADSK process getting stuck (which in turn causes 3dsMax from starting up):
29/09/2015 14:03:07: Configuration Read from: C:\Users\owenm\AppData\Local\Autodesk\3dsMax\2016 - 64bit\ENU\infocenter.xml
29/09/2015 14:03:29: Polling:3 threads started
29/09/2015 14:03:29: SubStatus: Polling...
29/09/2015 18:55:13: Configuration Read from: C:\Users\owenm\AppData\Local\Autodesk\3dsMax\2016 - 64bit\ENU\infocenter.xml
29/09/2015 18:55:33: Polling:3 threads started
29/09/2015 18:55:33: SubStatus: Polling...
If by chance this is it, then we added a setting to the 3dsmax plugin config level called: “Kill ADSK Comms Center Process”, which was greatly improved in v7.1.0.something (so you have it already) based on my feedback from a couple of 3dsMax studios here in London, who were encountering the same issue as you in 3ds Max 2015.
docs.thinkboxsoftware.com/produc … figuration
Let us know the answers to these questions/your thoughts and we can dig deeper if none of this helps!
Mike
Hey Laszlo,
I took a look at the code this morning, and the problem could be the way we’re determining a timeout. The code is using a method that we abandoned long ago in our core code. It sets the count down to X milliseconds, and then subtracts from that value over a loop until it reaches 0. When it reaches 0, a timeout has occurred. We discovered a while ago that this isn’t very reliable, especially as the timeout duration gets longer.
The better way to do it is simply compare the current timestamp to one that was set at the start of the process, and see if the interval between them is longer than the timeout value.
I’ve updated the five places we handle timeouts in the 3dsmax.py file to use this improved method, and attached the modified file here. Since you’ll probably want to merge the changes into your customized 3dsmax.py file, these are the functions that we updated:
EndMaxJob
Run3dsmaxCommand
LaunchMax
LoadMaxFile
PollUntilComplete
Give it a try and let us know if this makes a difference.
Cheers,
Ryan
3dsmax.zip (31 KB)
Much appreciated Ryan, ill roll this out today/tomorrow!
cheers
laszlo
Hi Ryan,
We are still experiencing some odd behavior with the startup timeout, but i believe this is a different issue.
Before we run max, using the jobprestart.py, we do some configuration to the machine. It appears that if this execution hangs up, deadline is not detecting the timeout, nor does it mark the machine stalled.
The task was manually requeued after 12+ hours finally.
The attached log shows the very last lines of this slave’s processing.
When the manual requeue happened, this is what it started logging:
2015-10-25 14:39:11: 0: INFO: FLOWLINE2_BASECFG_INI is set to: \\S2\exchange\software\managed\flowline\FLOWLINE2_BASECFG_INI_16threads.ini
2015-10-25 14:39:11: 0: INFO: Farm mode: using farm config
2015-10-25 14:39:11: 0: PYTHON: Overriding config path to: \\s2.scanlinevfxla.com\exchange\software\
2015-10-25 14:39:11: 0: INFO: Scanline 3ds Max config file: \\s2.scanlinevfxla.com\exchange\software\managed\SoftwareConfiguration\Max2015\ConfigurationSettings\FlowlineDevRelease_LatestVray_Prestart_Max2015_64Bit_Farm.bat
2015-10-26 07:43:47: Scheduler Thread - Task "392_1279-1279" could not be found because task has been modified:
2015-10-26 07:43:47: current status = Rendering, new status = Rendering
2015-10-26 07:43:47: current slave = LAPRO2067, new slave = LAPRO3183
2015-10-26 07:43:47: current frames = 1279-1279, new frames = 1279-1279
2015-10-26 07:43:47: Scheduler Thread - Cancelling task...
2015-10-26 07:46:53: Scheduler Thread - Task "392_1279-1279" could not be found because task has been modified:
2015-10-26 07:46:53: current status = Rendering, new status = Rendering
2015-10-26 07:46:53: current slave = LAPRO2067, new slave = LAPRO3183
2015-10-26 07:46:53: current frames = 1279-1279, new frames = 1279-1279
2015-10-26 07:46:53: Scheduler Thread - Cancelling task...
2015-10-26 07:49:58: Scheduler Thread - Task "392_1279-1279" could not be found because task has been modified:
2015-10-26 07:49:58: current status = Rendering, new status = Rendering
2015-10-26 07:49:58: current slave = LAPRO2067, new slave = LAPRO3183
2015-10-26 07:49:58: current frames = 1279-1279, new frames = 1279-1279
2015-10-26 07:49:58: Scheduler Thread - Cancelling task...
2015-10-26 07:53:07: Scheduler Thread - Task "392_1279-1279" could not be found because task has been modified:
2015-10-26 07:53:07: current status = Rendering, new status = Rendering
2015-10-26 07:53:07: current slave = LAPRO2067, new slave = LAPRO3183
2015-10-26 07:53:07: current frames = 1279-1279, new frames = 1279-1279
2015-10-26 07:53:07: Scheduler Thread - Cancelling task...
2015-10-26 07:56:15: Scheduler Thread - Task "392_1279-1279" could not be found because task has been modified:
2015-10-26 07:56:15: current status = Rendering, new status = Rendering
2015-10-26 07:56:15: current slave = LAPRO2067, new slave = LAPRO3183
2015-10-26 07:56:15: current frames = 1279-1279, new frames = 1279-1279
Its still looping with that.
deadlineslave-LAPRO2067-2015-10-25-0000_partial.log (13.4 KB)
I’m pretty sure this is the case where the slave gets stuck because of a stuck python script on the main thread (couldn’t find the related forum thread, but I know there is at least one out there).
This is the type of issue that will be fixed in Deadline 8 because of the way we’re sandboxing the python process.
Cheers,
Ryan