Hi,
some of the machines on our render farm have their deadline slaves keep stalling/crashing. On deadline monitor they show as rendering but will display that they have taken a much longer time than another machine of similar spec. When remote desktoping to the machine Windows will say that the slave has crashed, the slave will still show activity and it is possible to view tabs and interact with the slave window but the render will never finish.
I copied and saved the logs from when the machines stopped rendering, if any are needed please specify.
Many Thanks
Gideon
Hi Gideon,
It sounds like the thread that updates the slave’s state is dying. If you could send us a slave log from a session where the slave died, that would be great!
You can find the log folder on the machine from the Slave application by selecting Help -> Explore Log Folder. Hopefully the slave log will contain info explaining the crash.
Thanks!
Hi Ryan,
Thank you for the reply, there are a few logs near the time of the crash they are named “deadlineslave_Rf88(Rf88)-2012-08-10-0000, deadlineslave_Rf88(Rf88)-2012-08-10-0001, deadlineslave_Rf88(Rf88)-2012-08-10-0003 and deadlineslave_Rf88(Rf88)-2012-08-10-0004”, this is the last one:
File name: deadlineslave_Rf88(Rf88)-2012-08-10-0004.log
2012-08-10 15:21:06: BEGIN - RF88\renfar
2012-08-10 15:21:06: Start-up
2012-08-10 15:21:06: 2012-08-10 15:21:05
2012-08-10 15:21:06: Deadline Slave 5.2 [v5.2.0.47700 R]
2012-08-10 15:21:09: Auto Configuration: No auto configuration could be detected, using local configuration
2012-08-10 15:21:11: slave initialization beginning.
2012-08-10 15:21:11: Slave ‘Rf88’ has stalled because it has not updated its state in 37.797 s. Performing house cleaning…
2012-08-10 15:21:16: Found task class: :[2677-2677]
2012-08-10 15:21:16: Task is still being rendered!
2012-08-10 15:21:16: It’s time to requeue this baby.
2012-08-10 15:21:41: Info Thread - Created.
2012-08-10 15:22:05: Trying to connect using license server ‘\lon2\Deadline\flexlmTools\Thinkbox_license.lic’
2012-08-10 15:22:05: The license file being used will expire in 202 days.
2012-08-10 15:22:06: Checking repository integrity
2012-08-10 15:22:06: Purging old job auxiliary files
2012-08-10 15:22:16:
2012-08-10 15:22:16: Scheduler Thread - Synchronizing job files
2012-08-10 15:22:16: Scheduler Thread - Synchronization time for job files: 0 s
2012-08-10 15:22:16: Scheduler Thread - Synchronizing plugin files
2012-08-10 15:22:16: Scheduler Thread - Synchronization time for plugin files: 15.635 ms
2012-08-10 15:22:17: Constructor: MayaCmd
2012-08-10 15:22:17: 0: Task timeout is disabled.
2012-08-10 15:22:17: 0: Loaded job: Bravia_HX950_Lit_COLOR_V03_ENV (999_052_003_70561ec2)
2012-08-10 15:22:18: 0: INFO: StartJob: initializing script plugin MayaCmd
2012-08-10 15:22:18: Caught unhandled exception: Unable to cast object of type ‘System.Reflection.Module’ to type ‘System.Reflection.Emit.ModuleBuilder’. (System.InvalidCastException)
at System.Reflection.Emit.AssemblyBuilderData.GetInMemoryAssemblyModule()
at System.Reflection.Emit.AssemblyBuilder.SetCustomAttributeNoLock(CustomAttributeBuilder customBuilder)
at System.Reflection.Emit.AssemblyBuilder.SetCustomAttribute(CustomAttributeBuilder customBuilder)
at System.AppDomain.InternalDefineDynamicAssembly(AssemblyName name, AssemblyBuilderAccess access, String dir, Evidence evidence, PermissionSet requiredPermissions, PermissionSet optionalPermissions, PermissionSet refusedPermissions, StackCrawlMark& stackMark, IEnumerable1 unsafeAssemblyAttributes) at System.AppDomain.DefineDynamicAssembly(AssemblyName name, AssemblyBuilderAccess access, IEnumerable
1 assemblyAttributes)
at System.Reflection.Emit.DynamicMethod.GetDynamicMethodsModule()
at System.Reflection.Emit.DynamicMethod.Init(String name, MethodAttributes attributes, CallingConventions callingConvention, Type returnType, Type[] signature, Type owner, Module m, Boolean skipVisibility, Boolean transparentMethod)
at System.Reflection.Emit.DynamicMethod…ctor(String name, Type returnType, Type[] parameterTypes, Boolean restrictedSkipVisibility)
at Microsoft.Scripting.Ast.Compiler.LambdaCompiler…ctor(AnalyzedTree tree, LambdaExpression lambda)
at Microsoft.Scripting.Ast.Compiler.LambdaCompiler.Compile(LambdaExpression lambda, DebugInfoGenerator debugInfoGenerator)
at IronPython.Runtime.FunctionCode.CompileLambda(LambdaExpression code, EventHandler`1 handler)
at IronPython.Runtime.FunctionCode.UpdateDelegate(PythonContext context, Boolean forceCreation)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading._ThreadPoolWaitCallback.PerformWaitCallbackInternal(_ThreadPoolWaitCallback tpWaitCallBack)
at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback(Object state)
2012-08-10 15:33:40: Enqueing: &Copy
2012-08-10 15:33:40: Dequeued: &Copy
2012-08-10 15:33:48: Enqueing: Select &All
2012-08-10 15:33:48: Dequeued: Select &All
2012-08-10 15:33:49: Enqueing: &Copy
2012-08-10 15:33:49: Dequeued: &Copy
2012-08-10 15:34:01: Enqueing: &Explore Log Folder
2012-08-10 15:34:01: Dequeued: &Explore Log Folder
Hmm, I’ve never seen that error before, and it’s happening in the main render thread (rather than the one that updates the slave state).
Can you zip up and upload all of the slave logs you listed? We’ll search through them to see if anything stands out.
Also, you mentioned this only happens on some machines. Is there anything about these machines that is different then the ones that don’t have the problem? Do these machines have all their Windows updates?
Thanks!
Ahhh some of the problem machines haven’t had windows update on auto, I will update and do a test render to see if it works before I send the error logs. These machines have identicals that do not have a problem at the moment I think the windows updates are the only difference. Ill let you know if it works!
Thanks,
Gideon
Here are the logs just incase you wanted to have a look. The problem did seem to be a few machines that had missed a lot of windows updates, they all seem to be working now.
Thanks again!
slave crash.rar (18.9 KB)