Hi there,
We had the deadline slave stall (freeze) over the weekend (it had no activity, no jobs). I would like to send you the logs, but i’m not sure where they are. Could you point me to the right place?
cheers,
laszlo
Hi there,
We had the deadline slave stall (freeze) over the weekend (it had no activity, no jobs). I would like to send you the logs, but i’m not sure where they are. Could you point me to the right place?
cheers,
laszlo
Found one log from last friday, in the monitor. Maybe this helps? Not a lot of info… the deadline slave is frozen on the machine (window is up, but its screen is all black)
STALLED SLAVE REPORT
STALLED SLAVE REPORT
Current House Cleaner Information
Machine Performing Cleanup: vcpro1014
Version: v6.0.0.49608 R
Stalled Slave: LAPRO0315
Last Slave Update: 2013-01-25 15:26:12
Current Time: 2013-01-25 15:36:38
Time Difference: 10.441 m
Maximum Time Allowed Between Updates: 10.000 m
Current Job Name: tboa Animation Publish: TST_003_0002_maya_animation_layout.ma version: 64
Current Job ID: 510314328b1b0f26747a763e
Current Job User: ScanlineVfx_user
Current Task Names: 1
Current Task Ids: 0
Searching for job with id “510314328b1b0f26747a763e”
Found possible job: tboa Animation Publish: TST_003_0002_maya_animation_layout.ma version: 64
Searching for task with id “0”
Found possible task: :[1-1]
Task’s current slave:
Slave machine names do not match, continuing search
Associated Job Not Found.
Setting slave’s status to Stalled.
Setting last update time to now.
Slave state updated.
Date: 2013/01/25 15:36:38
Frames:
Elapsed Time: 00:00:00:00
Slave Name: LAPRO0315
What operating system did the crash occur on?
You can find the local application logs by selecting Help -> Explore Log Folder from any of the Deadline applications.
Cheers,
It was windows xp.
The log doesnt have much information, just a legit error message. These are the last lines from the day it hung (that temp mel command was indeed missing, i was running some tests):
2013-01-25 15:25:05: 0: STDOUT: mel: mel: READY FOR INPUT
2013-01-25 15:25:05: 0: INFO: Executing script: c:\users\scanli~2\appdata\local\temp\tmp88mvse\farmCommand.mel
2013-01-25 15:25:05: 0: INFO: Waiting for script to finish
2013-01-25 15:25:05: 0: INFO: Deadline is ignoring error: "mel: Error: " because plugin setting Strict Error Checking is enabled and this error is not usually fatal.
2013-01-25 15:25:05: 0: STDOUT: mel: Error:
2013-01-25 15:25:05: 0: STDOUT: Error: Cannot find file “c:/users/scanli~2/appdata/local/temp/tmp88mvse/farmCommand.mel” for source statement.
There were 2 lines of logs from monday:
2013-01-28 13:47:08: BEGIN - LAPRO0315\ScanlineVFX
2013-01-28 13:47:08: Scheduler Thread - Could not update task timeout because: Object reference not set to an instance of an object. (System.NullReferenceException)
But nothing inbetween.
Thanks for the logs. We should be able to fix that NullReferenceException for beta 11.
The slave shouldn’t crash when a render error occurs, so something isn’t right here. Would it be possible to try and reproduce this exact problem on your end? Just submit a job that produces the same error and see if it brings the slave down.
Just to confirm, did this happen on a bunch of slaves, or just one?
Thanks,
It happened on one machine only, but thats the only one that was testrendering that particular job.
I cant repro sadly,… :\ it was happening with beta8, and it seems that recopying the job folder/file from the beta8 repo to the beta10 repo doesnt yet allow to requeue the job. I get this error:
Exception Details
RenderPluginException – Error in StartJob: GetPluginInfoEntry: Script accessed non-existent plugin info key Version (Deadline.Plugins.RenderPluginException)
at Deadline.Plugins.DeadlinePlugin.GetPluginInfoEntry(String key)
at Python.Runtime.Dispatcher.TrueDispatch(ArrayList args)
at Python.Runtime.Dispatcher.Dispatch(ArrayList args)
at Deadline.Plugins.DeadlinePlugin.StartJob()
at Deadline.Plugins.ScriptPlugin.StartJob(Job job, String& outMessage, AbortLevel& abortLevel)
at Deadline.Plugins.ScriptPlugin.StartJob(Job job, String& outMessage, AbortLevel& abortLevel)
RenderPluginException.Cause: JobError (2)
RenderPluginException.Level: Major (1)
RenderPluginException.HasSlaveLog: True
Exception.Data: ( )
Exception.TargetSite: Void StartJob(Deadline.Jobs.Job)
Exception.Source: deadline
Exception.StackTrace:
at Deadline.Plugins.Plugin.StartJob(Job job)
at Deadline.Slaves.SlaveRenderThread.a(TaskLogWriter A_0)
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Yeah, that makes sense because jobs from beta 8 won’t render in beta 10. Let’s keep an eye out for it happening again with beta 10 (or later) and go from there.