AWS Thinkbox Discussion Forums

32 stalled slaves over the weekend

About 20% of our deadline farm. Some are reported as stalled, some just say they are rendering (when they are not)

But some new errors.

Machine reporting stalled:

2013-02-22 11:13:10: Scheduler Thread - Synchronization time for plugin files: 62.500 ms
2013-02-22 11:13:10: Constructor: MayaBatch
2013-02-22 11:13:11: 0: Task timeout is being ignored because this is a Post Job Script Task
2013-02-22 11:13:11: 0: Loaded job: [HEAT] Software Render: HA_260_2510_maya_animation_animation.ma version: v0120 (5127b6beba5e7815f052dd81)
2013-02-22 11:13:11: 0: Plugin executing post job script
2013-02-22 11:13:11: 0: Executing Post Job Script: “//inferno2/exchange/software/managed/pythonScripts/site-packages/scl/farm/postScripts/removeSoftwareDepth.py”
2013-02-22 11:13:12: 0: INFO: Executing plugin script C:\Documents and Settings\ScanlineVFX\Local Settings\Application Data\Thinkbox\Deadline6\slave\LAPRO0241\plugins\MayaBatch.py
2013-02-22 11:13:12: 0: INFO: About: Maya Batch Plugin for Deadline
2013-02-22 11:13:12: 0: INFO: The job’s environment will be merged with the current environment before rendering
2013-02-22 11:13:12: 0: INFO: Executing post job script //inferno2/exchange/software/managed/pythonScripts/site-packages/scl/farm/postScripts/removeSoftwareDepth.py
2013-02-22 11:13:18: Events plugin names could not be collected from the repository because: Could not find a part of the path ‘\inferno2\deadline\repository6\custom\events’. (System.IO.DirectoryNotFoundException)
2013-02-22 11:13:30: Exception on Startup: System.Exception: Python Error: Object reference not set to an instance of an object. (System.NullReferenceException)
2013-02-22 11:13:30: Stack Trace:
2013-02-22 11:13:30: at Python.Runtime.ClassBase.tp_dealloc(IntPtr ob)
2013-02-22 11:13:30: at Python.Runtime.Runtime.PyImport_ExecCodeModule(String name, IntPtr code)
2013-02-22 11:13:30: at Python.Runtime.PythonEngine.ModuleFromString(String name, String code)
2013-02-22 11:13:30: at FranticX.Scripting.PythonNetScriptEngine.ExecuteScript(String scriptName, String script)
2013-02-22 11:13:30: —> System.NullReferenceException: Object reference not set to an instance of an object.
2013-02-22 11:13:30: at Python.Runtime.ClassBase.tp_dealloc(IntPtr ob)
2013-02-22 11:13:30: at Python.Runtime.Runtime.PyImport_ExecCodeModule(String name, IntPtr code)
2013-02-22 11:13:30: at Python.Runtime.PythonEngine.ModuleFromString(String name, String code)
2013-02-22 11:13:30: at FranticX.Scripting.PythonNetScriptEngine.ExecuteScript(String scriptName, String script)
2013-02-22 11:13:30: — End of inner exception stack trace —
2013-02-22 11:13:30: at FranticX.Scripting.PythonNetScriptEngine.a(Exception A_0)
2013-02-22 11:13:30: at FranticX.Scripting.PythonNetScriptEngine.ExecuteScript(String scriptName, String script)
2013-02-22 11:13:30: at a.a(String[] A_0)
2013-02-22 11:13:30: Deadline Slave will now exit.
2013-02-22 11:13:30: Traceback (most recent call last):
2013-02-22 11:13:30: File “…\Libraries\ThinkboxUI\UI\Controls\FilterLogBoxView.py”, line 199, in writeLine
2013-02-22 11:13:30: AttributeError: ‘RepositoryStorage’ object has no attribute ‘DataController’

Machine looking ‘active’, but in fact hanging for 3 days:

Last lines of log (these are regular, expected outputs):
2013-02-22 16:28:22: 0: STDOUT: PyQt4.uic.uiparser : DEBUG : pop widget QWidget centralwidget
2013-02-22 16:28:22: 0: STDOUT: PyQt4.uic.uiparser : DEBUG : new topwidget MainWindow
2013-02-22 16:28:22: 0: STDOUT: PyQt4.uic.uiparser : DEBUG : pop widget QMainWindow MainWindow
2013-02-22 16:28:22: 0: STDOUT: PyQt4.uic.uiparser : DEBUG : new topwidget None
2013-02-22 16:28:22: 0: STDOUT: Fri Feb 22 16:28:21 2013 (+889ms) : debug : Registering Node Class : “MayaRigCacheExporter”
2013-02-22 16:28:22: 0: STDOUT: Fri Feb 22 16:28:21 2013 (+0ms) : debug : Registering Node Class : “MayaRigCacheImporter”

Machine stalled:

2013-02-21 20:16:10: 0: STDOUT: Thu Feb 21 20:16:10 2013 (+0ms) : debug : Images.render3d Turntable not selected.
2013-02-21 20:16:10: 0: STDOUT: Thu Feb 21 20:16:10 2013 (+0ms) : debug : Images.render3d Software render selected.
2013-02-21 20:16:10: 0: STDOUT: Thu Feb 21 20:16:10 2013 (+0ms) : debug : Images.render3d Farm playblast begin
2013-02-21 20:16:10: 0: STDOUT: cached
2013-02-21 20:16:10: 0: STDOUT: Thu Feb 21 20:16:10 2013 (+16ms) : critical error : PUBLISH : *** Unhandled publish exception ***
2013-02-21 20:16:10: 0: STDOUT: Thu Feb 21 20:16:10 2013 (+0ms) : debug : New log file created using scLog v0.10
2013-02-21 20:16:12: 0: STDOUT: Error Traceback: Traceback (most recent call last):
2013-02-21 20:16:12: 0: STDOUT: File “\S2\exchange\software\managed\SoftwareConfiguration\mayaShared\env\configs\default\modules\scNodes\publish.py”, line 588, in publish
2013-02-21 20:16:12: 0: STDOUT: if oNode.activate():
2013-02-21 20:16:12: 0: STDOUT: File “\S2\exchange\software\managed\SoftwareConfiguration\mayaShared\env\configs\default\modules\scNodes\images.py”, line 926, in activate
2013-02-21 20:16:12: 0: STDOUT: if(not self.doFarmPlayBlast()):
2013-02-21 20:16:12: 0: STDOUT: File “\S2\exchange\software\managed\SoftwareConfiguration\mayaShared\env\configs\default\modules\scNodes\images.py”, line 808, in doFarmPlayBlast
2013-02-21 20:16:12: 0: STDOUT: sDir = self.getFilePath(sPassName)
2013-02-21 20:16:12: 0: STDOUT: File “\S2\exchange\software\managed\SoftwareConfiguration\mayaShared\env\configs\default\modules\scNodes\images.py”, line 255, in getFilePath
2013-02-21 20:16:12: 0: STDOUT: return (’%s/%s_%s_%s’ %(self.getAttr(‘sExportDir’).replace(’\’,’/’), sPassName, self.getColorTag(), self.getResolutionTag()))
2013-02-21 20:16:12: 0: STDOUT: File “\S2\exchange\software\managed\SoftwareConfiguration\mayaShared\env\configs\default\modules\scNodes\images.py”, line 241, in getResolutionTag
2013-02-21 20:16:12: 0: STDOUT: return renderAPI.getResolutionTagByName(self.getAttr(‘sResolutionTag’)).sName
2013-02-21 20:16:12: 0: STDOUT: File “\S2\exchange\software\managed\SoftwareConfiguration\mayaShared\env\configs\default\modules\renderAPI.py”, line 324, in getResolutionTagByName
2013-02-21 20:16:12: 0: STDOUT: return getResolutionTags()[0]
2013-02-21 20:16:12: 0: INFO: Deadline is ignoring error: “IndexError: list index out of range” because plugin setting Strict Error Checking is enabled and this error is not usually fatal.
2013-02-21 20:16:12: 0: STDOUT: IndexError: list index out of range
2013-02-21 20:16:12: 0: INFO: Deadline is ignoring error: “Warning: PUBLISH : *** Unhandled publish exception *** Error Traceback: Traceback (most recent call last):” because plugin setting Strict Error Checking is enabled and this error is not usually fatal.
2013-02-21 20:16:12: 0: STDOUT: Warning: PUBLISH : *** Unhandled publish exception *** Error Traceback: Traceback (most recent call last):
2013-02-21 20:16:12: 0: STDOUT: File “\S2\exchange\software\managed\SoftwareConfiguration\mayaShared\env\configs\default\modules\scNodes\publish.py”, line 588, in publish
2013-02-21 20:16:12: 0: STDOUT: if oNode.activate():
2013-02-21 20:16:12: 0: STDOUT: File “\S2\exchange\software\managed\SoftwareConfiguration\mayaShared\env\configs\default\modules\scNodes\images.py”, line 926, in activate
2013-02-21 20:16:12: 0: STDOUT: if(not self.doFarmPlayBlast()):
2013-02-21 20:16:12: 0: STDOUT: File “\S2\exchange\software\managed\SoftwareConfiguration\mayaShared\env\configs\default\modules\scNodes\images.py”, line 808, in doFarmPlayBlast
2013-02-21 20:16:12: 0: STDOUT: sDir = self.getFilePath(sPassName)
2013-02-21 20:16:12: 0: STDOUT: File “\S2\exchange\software\managed\SoftwareConfiguration\mayaShared\env\configs\default\modules\scNodes\images.py”, line 255, in getFilePath
2013-02-21 20:16:12: 0: STDOUT: return (’%s/%s_%s_%s’ %(self.getAttr(‘sExportDir’).replace(’\’,’/’), sPassName, self.getColorTag(), self.getResolutionTag()))
2013-02-21 20:16:12: 0: STDOUT: File “\S2\exchange\software\managed\SoftwareConfiguration\mayaShared\env\configs\default\modules\scNodes\images.py”, line 241, in getResolutionTag
2013-02-21 20:16:12: 0: STDOUT: return renderAPI.getResolutionTagByName(self.getAttr(‘sResolutionTag’)).sName
2013-02-21 20:16:12: 0: STDOUT: File “\S2\exchange\software\managed\SoftwareConfiguration\mayaShared\env\configs\default\modules\renderAPI.py”, line 324, in getResolutionTagByName
2013-02-21 20:16:12: 0: STDOUT: return getResolutionTags()[0]
2013-02-21 20:16:12: 0: INFO: Deadline is ignoring error: “IndexError: list index out of range” because plugin setting Strict Error Checking is enabled and this error is not usually fatal.
2013-02-21 20:16:12: 0: STDOUT: IndexError: list index out of range
2013-02-21 20:16:12: 0: STDOUT: Thu Feb 21 20:16:12 2013 (+2250ms) : Enabling viewports
2013-02-21 20:16:14: 0: STDOUT: Error: Render failed
2013-02-21 20:16:17: Scheduler Thread - Render Thread 0 threw a major error:
2013-02-21 20:16:17: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2013-02-21 20:16:17: Exception Details
2013-02-21 20:16:17: RenderPluginException – Error in RenderTasks: Deadline caught error “Error: Render failed”. If this error message is unavoidable but not fatal to your render, please email support@thinkboxsoftware.com with the error message, and disable the Maya job setting Strict Error Checking.
2013-02-21 20:16:17: at Deadline.Plugins.ScriptPlugin.RenderTasks(String taskId, Int32 startFrame, Int32 endFrame, String& outMessage, AbortLevel& abortLevel)
2013-02-21 20:16:17: RenderPluginException.Cause: JobError (2)
2013-02-21 20:16:17: RenderPluginException.Level: Major (1)
2013-02-21 20:16:17: RenderPluginException.HasSlaveLog: True
2013-02-21 20:16:17: Exception.Data: ( )
2013-02-21 20:16:17: Exception.TargetSite: Void RenderTask(System.String, Int32, Int32)
2013-02-21 20:16:17: Exception.Source: deadline
2013-02-21 20:16:17: Exception.StackTrace:
2013-02-21 20:16:17: at Deadline.Plugins.Plugin.RenderTask(String taskId, Int32 startFrame, Int32 endFrame)
2013-02-21 20:16:17: at Deadline.Slaves.SlaveRenderThread.a(TaskLogWriter A_0)
2013-02-21 20:16:17: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2013-02-21 20:24:14: Scheduler Thread - Synchronizing job auxiliary files from \inferno2\deadline\repository6\jobs\5126c529de3c960be87af3f5
2013-02-21 20:24:14: Scheduler Thread - Synchronization time for job files: 46.875 ms
2013-02-21 20:24:14: Scheduler Thread - Synchronizing plugin files from \inferno2\deadline\repository6\plugins\MayaBatch
2013-02-21 20:24:14: Scheduler Thread - Synchronization time for plugin files: 468.750 ms
2013-02-21 20:24:15: Constructor: MayaBatch
2013-02-21 20:24:16: 0: Task timeout is being ignored because this is a Post Job Script Task
2013-02-21 20:24:16: 0: Loaded job: [TBOA] Software Render: FB_083_0220_maya_animation_Layout.ma version: v0039 (5126c529de3c960be87af3f5)
2013-02-21 20:24:16: 0: Plugin executing post job script
2013-02-21 20:24:16: 0: Executing Post Job Script: “//inferno2/exchange/software/managed/pythonScripts/site-packages/scl/farm/postScripts/removeSoftwareDepth.py”
2013-02-21 20:24:24: Scheduler Thread - Cancelling task because task “-1_0” could not be found
2013-02-21 20:24:24: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-02-21 20:24:24: 0: INFO: Executing plugin script C:\Documents and Settings\ScanlineVFX\Local Settings\Application Data\Thinkbox\Deadline6\slave\LAPRO0235\plugins\MayaBatch.py
2013-02-21 20:24:24: 0: INFO: About: Maya Batch Plugin for Deadline
2013-02-21 20:24:24: 0: INFO: The job’s environment will be merged with the current environment before rendering
2013-02-21 20:24:24: 0: INFO: Executing post job script //inferno2/exchange/software/managed/pythonScripts/site-packages/scl/farm/postScripts/removeSoftwareDepth.py

Machine reporting stalled:

2013-02-22 12:29:42: 0: INFO: Process Affinity: default
2013-02-22 12:29:42: 0: INFO: Process is now running
2013-02-22 12:29:42: 0: Plugin rendering frame(s): 1331-1334
2013-02-22 12:29:42: 0: INFO: Waiting until maya is ready to go
2013-02-22 12:29:49: Scheduler Thread - Cancelling task because task “22_1331-1334” could not be found
2013-02-22 12:29:49: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-02-22 12:29:50: 0: In the process of canceling current task: ignoring exception thrown by PluginLoader
2013-02-22 12:29:50: Scheduler Thread - In the process of canceling current tasks: ignoring exception thrown by render thread 0
2013-02-22 12:46:16: Traceback (most recent call last):
2013-02-22 12:46:16: File “DeadlineSlave\UI\Forms\MainWindowSlave.py”, line 291, in update
2013-02-22 12:46:16: File “DeadlineSlave\UI\Forms\MainWindowSlave.py”, line 315, in updateJobInfo
2013-02-22 12:46:16: InvalidOperationException: Collection was modified; enumeration operation may not execute.
2013-02-22 12:46:16: at System.Collections.Generic.Dictionary`2.KeyCollection.Enumerator.MoveNext()
2013-02-22 12:46:16: at Deadline.Slaves.Slave.GetLimitGroupStubs()
2013-02-22 13:46:11: Purging old statistics
2013-02-22 13:46:12: Purging slave statistics that are older than Oct 25/12 13:47:40
2013-02-22 13:46:12: Purging repository statistics that are older than Oct 25/12 13:47:40
2013-02-22 13:46:52: Purging obsolete slaves
2013-02-22 13:49:28: Purging obsolete slaves
2013-02-22 13:49:57: Purging repository temp files
2013-02-22 13:54:32: Slave - An error occurred while updating the slave’s info: An unexpected error occurred while interacting with the database (deadline.scanlinevfxla.com:27017):
2013-02-22 13:54:32: Exception of type ‘System.OutOfMemoryException’ was thrown. (FranticX.Database.DocumentException)
2013-02-22 13:54:52: Slave - An error occurred while updating the slave’s info: An unexpected error occurred while interacting with the database (deadline.scanlinevfxla.com:27017):
2013-02-22 13:54:52: Exception of type ‘System.OutOfMemoryException’ was thrown. (FranticX.Database.DocumentException)
2013-02-22 13:55:01: Slave - An error occurred while updating the slave’s info: An unexpected error occurred while interacting with the database (deadline.scanlinevfxla.com:27017):
2013-02-22 13:55:01: Exception of type ‘System.OutOfMemoryException’ was thrown. (FranticX.Database.DocumentException)
2013-02-22 13:55:07: Slave - An error occurred while updating the slave’s info: An unexpected error occurred while interacting with the database (deadline.scanlinevfxla.com:27017):
2013-02-22 13:55:07: Exception of type ‘System.OutOfMemoryException’ was thrown. (FranticX.Database.DocumentException)
2013-02-22 13:55:14: Slave - An error occurred while updating the slave’s info: An unexpected error occurred while interacting with the database (deadline.scanlinevfxla.com:27017):
2013-02-22 13:55:14: Exception of type ‘System.OutOfMemoryException’ was thrown. (FranticX.Database.DocumentException)
2013-02-22 13:55:23: Slave - An error occurred while updating the slave’s info: An unexpected error occurred while interacting with the database (deadline.scanlinevfxla.com:27017):
2013-02-22 13:55:23: Exception of type ‘System.OutOfMemoryException’ was thrown. (FranticX.Database.DocumentException)

Stalled machine, first log (last lines):

2013-02-21 20:03:54: 0: STDOUT: PyQt4.uic.uiparser : DEBUG : push QStatusBar statusbar
2013-02-21 20:03:54: 0: STDOUT: PyQt4.uic.uiparser : DEBUG : pop widget QStatusBar statusbar
2013-02-21 20:03:54: 0: STDOUT: PyQt4.uic.uiparser : DEBUG : new topwidget MainWindow
2013-02-21 20:03:54: 0: STDOUT: PyQt4.uic.uiparser : DEBUG : pop widget QMainWindow MainWindow
2013-02-21 20:03:54: 0: STDOUT: PyQt4.uic.uiparser : DEBUG : new topwidget None
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:53 2013 (+781ms) : debug : Registering Node Class : “RetimeExporter”
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:53 2013 (+0ms) : debug : Registering Node Class : “RetimeImporter”
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:53 2013 (+15ms) : debug : Evaluating all presets
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:53 2013 (+0ms) : debug : Registering Preset : “Export Master camera”
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:53 2013 (+0ms) : debug : Registering Preset : “Export Track camera”
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:53 2013 (+0ms) : debug : Registering Preset : “Export Projection camera”
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:53 2013 (+0ms) : debug : Registering Preset : “Create Rig exporter based on current scene”
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:53 2013 (+0ms) : debug : Registering Preset : “Create Playblast based on current scene”
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:53 2013 (+0ms) : debug : Registering Checks
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:53 2013 (+0ms) : Importing animation
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:53 2013 (+0ms) : debug : Import Checks folder: sceneJanitor.checks module: animation
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:54 2013 (+141ms) : Importing asset
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:54 2013 (+0ms) : debug : Import Checks folder: sceneJanitor.checks module: asset
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:54 2013 (+0ms) : Importing geometry
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:54 2013 (+0ms) : debug : Import Checks folder: sceneJanitor.checks module: geometry
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:54 2013 (+16ms) : Importing naming
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:54 2013 (+0ms) : debug : Import Checks folder: sceneJanitor.checks module: naming
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:54 2013 (+14ms) : Importing rig
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:54 2013 (+0ms) : debug : Import Checks folder: sceneJanitor.checks module: rig
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:54 2013 (+0ms) : Importing scene
2013-02-21 20:03:54: 0: STDOUT: Thu Feb 21 20:03:54 2013 (+0ms) : debug : Import Checks folder: sceneJanitor.checks module: scene
2013-02-21 21:52:23: Scheduler Thread - Cancelling task because task “13_1196-1210” could not be found

Then it had another log from the next day with only these lines:
2013-02-22 16:03:24: BEGIN - LAPRO0252\ScanlineVFX
2013-02-22 16:03:24: 0: Slave timed out – canceling current task…

It seems that the launcher is sometimes not capable of restarting stalled slaves.

One machine that stalled out had these in its launcher logs, but no actual slave running:

2013-02-22 23:32:03: Launcher Thread - Received command: LaunchSlave LAPRO0258
2013-02-22 23:32:03: Local version file: C:\Program Files\Thinkbox\Deadline6\bin\Version
2013-02-22 23:32:03: Network version file: \inferno2\deadline\repository6\bin\Windows\Version
2013-02-22 23:32:03: Comparing version files…
2013-02-22 23:32:03: Version files match
2013-02-22 23:32:03: Launcher Thread - Responded with: Success|
2013-02-22 23:37:12: 172.18.2.209 has connected
2013-02-22 23:37:12: Launcher Thread - Received command: LaunchSlave LAPRO0258
2013-02-22 23:37:12: Local version file: C:\Program Files\Thinkbox\Deadline6\bin\Version
2013-02-22 23:37:12: Network version file: \inferno2\deadline\repository6\bin\Windows\Version
2013-02-22 23:37:12: Comparing version files…
2013-02-22 23:37:12: Version files match
2013-02-22 23:37:12: Launcher Thread - Responded with: Success|
2013-02-22 23:42:22: 172.18.2.209 has connected
2013-02-22 23:42:22: Launcher Thread - Received command: LaunchSlave LAPRO0258
2013-02-22 23:42:22: Local version file: C:\Program Files\Thinkbox\Deadline6\bin\Version
2013-02-22 23:42:22: Network version file: \inferno2\deadline\repository6\bin\Windows\Version
2013-02-22 23:42:22: Comparing version files…
2013-02-22 23:42:22: Version files match
2013-02-22 23:42:22: Launcher Thread - Responded with: Success|
2013-02-22 23:47:32: 172.18.2.209 has connected
2013-02-22 23:47:32: Launcher Thread - Received command: LaunchSlave LAPRO0258
2013-02-22 23:47:32: Local version file: C:\Program Files\Thinkbox\Deadline6\bin\Version
2013-02-22 23:47:32: Network version file: \inferno2\deadline\repository6\bin\Windows\Version
2013-02-22 23:47:32: Comparing version files…
2013-02-22 23:47:32: Version files match
2013-02-22 23:47:32: Launcher Thread - Responded with: Success|
2013-02-22 23:52:35: 172.18.2.209 has connected
2013-02-22 23:52:35: Launcher Thread - Received command: LaunchSlave LAPRO0258
2013-02-22 23:52:35: Local version file: C:\Program Files\Thinkbox\Deadline6\bin\Version
2013-02-22 23:52:35: Network version file: \inferno2\deadline\repository6\bin\Windows\Version
2013-02-22 23:52:35: Comparing version files…
2013-02-22 23:52:35: Version files match
2013-02-22 23:52:35: Launcher Thread - Responded with: Success|
2013-02-22 23:57:39: 172.18.2.209 has connected
2013-02-22 23:57:39: Launcher Thread - Received command: LaunchSlave LAPRO0258
2013-02-22 23:57:39: Local version file: C:\Program Files\Thinkbox\Deadline6\bin\Version
2013-02-22 23:57:39: Network version file: \inferno2\deadline\repository6\bin\Windows\Version
2013-02-22 23:57:39: Comparing version files…
2013-02-22 23:57:39: Version files match
2013-02-22 23:57:39: Launcher Thread - Responded with: Success|

The last slave log lines:

-02-22 00:45:28: BEGIN - LAPRO0258\ScanlineVFX
2013-02-22 00:45:28: Scheduler Thread - Cancelling task because task “1_1-1” could not be found
2013-02-22 16:51:06: 0: Slave timed out – canceling current task…

One slave had a runtime error


Microsoft Visual C++ Runtime Library

Runtime Error!

Program: C:\Program Files\Thinkbox\Deadline6\bin\deadlineslave.exe

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application’s support team for more information.


OK

Log is endlessly printing this:

2013-02-25 11:23:42: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-02-25 11:23:51: Scheduler Thread - Cancelling task because task “-1_0” could not be found
2013-02-25 11:23:51: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-02-25 11:23:56: Scheduler Thread - Cancelling task because task “-1_0” could not be found
2013-02-25 11:23:56: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-02-25 11:24:02: Scheduler Thread - Cancelling task because task “-1_0” could not be found
2013-02-25 11:24:02: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-02-25 11:24:10: Scheduler Thread - Cancelling task because task “-1_0” could not be found
2013-02-25 11:24:10: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-02-25 11:24:15: Scheduler Thread - Cancelling task because task “-1_0” could not be found
2013-02-25 11:24:15: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.

(reports ‘rendering’ in the deadline monitor, in fact its been stalled for 3 days)

First occurance of the cancelling message:

2013-02-22 19:34:52: Scheduler Thread - Synchronizing plugin files from \inferno2\deadline\repository6\plugins\MayaBatch
2013-02-22 19:34:52: Scheduler Thread - Synchronization time for plugin files: 31.250 ms
2013-02-22 19:34:53: Constructor: MayaBatch
2013-02-22 19:34:53: 0: Task timeout is being ignored because this is a Post Job Script Task
2013-02-22 19:34:53: 0: Loaded job: [TBOA] Software Render: CB_036_0486_maya_animation_layout.ma version: v0009 (51283515fc6d5c050ce4e093)
2013-02-22 19:34:53: 0: Plugin executing post job script
2013-02-22 19:34:53: 0: Executing Post Job Script: “//inferno2/exchange/software/managed/pythonScripts/site-packages/scl/farm/postScripts/removeSoftwareDepth.py”
2013-02-22 19:34:55: 0: INFO: Executing plugin script C:\Documents and Settings\ScanlineVFX\Local Settings\Application Data\Thinkbox\Deadline6\slave\LAPRO0261\plugins\MayaBatch.py
2013-02-22 19:34:55: 0: INFO: About: Maya Batch Plugin for Deadline
2013-02-22 19:34:55: 0: INFO: The job’s environment will be merged with the current environment before rendering
2013-02-22 19:34:55: 0: INFO: Executing post job script //inferno2/exchange/software/managed/pythonScripts/site-packages/scl/farm/postScripts/removeSoftwareDepth.py
2013-02-22 19:35:01: Scheduler Thread - Cancelling task because task “-1_0” could not be found
2013-02-22 19:35:01: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-02-22 19:35:10: Scheduler Thread - Cancelling task because task “-1_0” could not be found
2013-02-22 19:35:10: Scheduler Thread - The task has either been changed externally (and requeued), or the Job has been deleted.
2013-02-22 19:35:20: Scheduler Thread - Cancelling task because task “-1_0” could not be found

Slave reporting ‘rendering’ but in fact hanging for the 3rd day. mayabatch was still running on the machine, deadlineslave also running at 100% cpu usage:

13-02-22 13:56:35: 0: STDOUT: Fri Feb 22 13:56:34 2013 (+0ms) : Registering maya specific queueSubmitter functions
2013-02-22 13:56:35: 0: STDOUT: Fri Feb 22 13:56:34 2013 (+0ms) : Setting up callbacks
2013-02-22 13:56:35: 0: STDOUT: Fri Feb 22 13:56:34 2013 (+0ms) : mayaStartup completed
2013-02-22 13:56:35: 0: STDOUT: Fri Feb 22 13:56:34 2013 (+0ms) : debug : Registering Checks for heat
2013-02-22 15:14:49: Scheduler Thread - Cancelling task because task “10_989-1003” could not be found

then some log from the next day:

2013-02-23 09:56:26: BEGIN - LAPRO0273\ScanlineVFX
2013-02-23 09:56:26: 0: Slave timed out – canceling current task…

Then nothing after

i killed the mayabatch process, deadline did not recover. Had to kill that as well

Another slave, reporting stalled, while deadline slave was in fact still running (process was there anyway)

last log:

2013-02-22 00:46:31: BEGIN - LAPRO0271\ScanlineVFX
2013-02-22 00:46:31: Scheduler Thread - Cancelling task because task “3_1046-1060” could not be found
2013-02-22 07:31:10: 0: Slave timed out – canceling current task…

mayabatch was still active. Had to kill both mayabatch and the slave manually

Machine reporting rendering, in fact its hanging. Last log lines:

2013-02-22 16:31:05: 0: INFO: Maya has shut down
2013-02-22 17:21:43: Scheduler Thread - Synchronizing job auxiliary files from \inferno2\deadline\repository6\jobs\512815f5322ecd0a8427d4a5
2013-02-22 17:21:43: Scheduler Thread - Synchronization time for job files: 0.000 s
2013-02-22 17:21:43: Scheduler Thread - Synchronizing plugin files from \inferno2\deadline\repository6\plugins\MayaBatch
2013-02-22 17:21:43: Scheduler Thread - Synchronization time for plugin files: 296.886 ms
2013-02-22 17:21:44: Constructor: MayaBatch
2013-02-22 17:21:44: 0: Task timeout is being ignored because this is a Post Job Script Task
2013-02-22 17:21:44: 0: Loaded job: [HEAT] Software Render: HA_260_2160_maya_animation_animation.ma version: v0043 (512815f5322ecd0a8427d4a5)
2013-02-22 17:21:44: 0: Plugin executing post job script
2013-02-22 17:21:44: 0: Executing Post Job Script: “//inferno2/exchange/software/managed/pythonScripts/site-packages/scl/farm/postScripts/removeSoftwareDepth.py”
2013-02-22 17:21:45: 0: INFO: Executing plugin script C:\Documents and Settings\ScanlineVFX\Local Settings\Application Data\Thinkbox\Deadline6\slave\LAPRO0274\plugins\MayaBatch.py
2013-02-22 17:21:45: 0: INFO: About: Maya Batch Plugin for Deadline
2013-02-22 17:21:45: 0: INFO: The job’s environment will be merged with the current environment before rendering
2013-02-22 17:21:45: 0: INFO: Executing post job script //inferno2/exchange/software/managed/pythonScripts/site-packages/scl/farm/postScripts/removeSoftwareDepth.py
2013-02-22 17:21:49: Scheduler Thread - Render Thread 0’s task “-1_0” could not be found
2013-02-22 17:21:49: Scheduler Thread - The task has either been changed externally, or the Job has been deleted.
2013-02-22 17:21:52: 0: Task timeout is being ignored because this is a Post Job Script Task
2013-02-22 17:21:52: 0: Loaded job: [HEAT] Software Render: HA_260_2160_maya_animation_animation.ma version: v0043 (512815f5322ecd0a8427d4a5)
2013-02-22 17:21:52: 0: Plugin executing post job script
2013-02-22 17:21:52: 0: Executing Post Job Script: “//inferno2/exchange/software/managed/pythonScripts/site-packages/scl/farm/postScripts/removeSoftwareDepth.py”
2013-02-22 17:21:52: 0: INFO: Executing plugin script C:\Documents and Settings\ScanlineVFX\Local Settings\Application Data\Thinkbox\Deadline6\slave\LAPRO0274\plugins\MayaBatch.py
2013-02-22 17:21:52: 0: INFO: About: Maya Batch Plugin for Deadline
2013-02-22 17:21:52: 0: INFO: The job’s environment will be merged with the current environment before rendering
2013-02-22 17:21:52: 0: INFO: Executing post job script //inferno2/exchange/software/managed/pythonScripts/site-packages/scl/farm/postScripts/removeSoftwareDepth.py

Could it be access timeout to inferno? what happens if the machine loses access temporarily? This particular slave i could shut down regularly with the slave GUI (it was interactive), and just restart manually. So it seems like all threads where healthy, but it was timing out on the post job script maybe

Slave reporting render, has an all white screen, hanging:

Last log lines (this is the full log for the 22nd):

2013-02-22 00:48:14: BEGIN - LAPRO0299\ScanlineVFX
2013-02-22 00:48:14: Scheduler Thread - Cancelling task because task “0_1001-1015” could not be found
2013-02-22 14:57:56: 0: Slave timed out – canceling current task…

Some apparently false positive attempts to restart the slave in the launcher lkog:

2013-02-23 06:21:09: Launcher Thread - Received command: LaunchSlave LAPRO0299
2013-02-23 06:21:11: Local version file: C:\Program Files\Thinkbox\Deadline6\bin\Version
2013-02-23 06:21:11: Network version file: \inferno2\deadline\repository6\bin\Windows\Version
2013-02-23 06:21:11: Comparing version files…
2013-02-23 06:21:11: Version files match
2013-02-23 06:21:11: Launcher Thread - Responded with: Success|
2013-02-23 08:14:41: 172.18.2.209 has connected
2013-02-23 08:14:41: Launcher Thread - Received command: LaunchSlave LAPRO0299
2013-02-23 08:14:45: Local version file: C:\Program Files\Thinkbox\Deadline6\bin\Version
2013-02-23 08:14:45: Network version file: \inferno2\deadline\repository6\bin\Windows\Version
2013-02-23 08:14:46: Comparing version files…
2013-02-23 08:14:47: Version files match
2013-02-23 08:14:54: Launcher Thread - Responded with: Success|

mayabatch process still in the task manager (0 cpu usage), deadlineslave is at 100% cpu usage.

required process kill/restart

Slave shows as ‘rendering’, but slave app is at 100% cpu usage, minimized (cant restore/maximize).

The last lines of the deadline slave log:

2013-02-23 14:24:16: 0: STDOUT: PyQt4.uic.uiparser : DEBUG : pop widget QGroupBox groupBox
2013-02-23 14:24:16: 0: STDOUT: PyQt4.uic.uiparser : DEBUG : new topwidget Dialog
2013-02-23 14:24:16: 0: STDOUT: PyQt4.uic.properties : DEBUG : setting property text
2013-02-23 14:24:16: 0: STDOUT: PyQt4.uic.uiparser : DEBUG : push QCheckBox m_uTestImport
2013-02-23 14:24:16: 0: STDOUT: PyQt4.uic.uiparser : DEBUG : pop widget QCheckBox m_uTestImport
2013-02-23 14:24:16: 0: STDOUT: PyQt4.uic.uiparser : DEBUG : new topwidget Dialog
2013-02-23 14:24:16: 0: STDOUT: PyQt4.uic.uiparser : DEBUG : pop layout QVBoxLayout verticalLayout_3
2013-02-23 14:24:16: 0: STDOUT: PyQt4.uic.uiparser : DEBUG : pop widget QDialog Dialog
2013-02-23 14:24:16: 0: STDOUT: PyQt4.uic.uiparser : DEBUG : new topwidget None
2013-02-23 14:24:16: 0: STDOUT: Sat Feb 23 14:24:16 2013 (+188ms) : debug : Registering Node Class : “MocapAnimationExporter”
2013-02-23 14:24:16: 0: STDOUT: Sat Feb 23 14:24:16 2013 (+0ms) : debug : Registering Node Class : “MocapAnimationImporter”
2013-02-23 14:24:16: 0: STDOUT: Sat Feb 23 14:24:16 2013 (+14ms) : debug : Registering Node Class : “RigCacheExporter”
2013-02-23 14:24:16: 0: STDOUT: Sat Feb 23 14:24:16 2013 (+32ms) : debug : Registering Node Class : “ScanlineResourceList”
2013-02-23 14:24:16: 0: STDOUT: Sat Feb 23 14:24:16 2013 (+14ms) : debug : Registering Node Class : “MayaSceneExporter”
2013-02-23 14:24:16: 0: STDOUT: Sat Feb 23 14:24:16 2013 (+0ms) : debug : Registering Node Class : “MayaSceneImporter”
2013-02-23 14:24:16: 0: STDOUT: Sat Feb 23 14:24:16 2013 (+0ms) : debug : Registering Node Class : “NullAnimationExporter”
2013-02-23 14:24:16: 0: STDOUT: Sat Feb 23 14:24:16 2013 (+0ms) : debug : Registering Node Class : “NullAnimationImporter”
2013-02-23 14:24:16: 0: STDOUT: Sat Feb 23 14:24:16 2013 (+78ms) : debug : Registering Node Class : “XMeshExporter”
2013-02-23 14:24:16: 0: STDOUT: Sat Feb 23 14:24:16 2013 (+0ms) : debug : Registering Node Class : “XMeshImporter”

From our internal log, we know thats not the last line of activity:

Sat Feb 23 14:24:16 2013 (+0ms) : debug : Registering Node Class : “NullAnimationExporter”
Sat Feb 23 14:24:16 2013 (+0ms) : debug : Registering Node Class : “NullAnimationImporter”
Sat Feb 23 14:24:16 2013 (+78ms) : debug : Registering Node Class : “XMeshExporter”
Sat Feb 23 14:24:16 2013 (+0ms) : debug : Registering Node Class : “XMeshImporter”
Sat Feb 23 14:24:17 2013 (+485ms) : debug : Registering Node Class : “MayaRigCacheExporter”
Sat Feb 23 14:24:17 2013 (+0ms) : debug : Registering Node Class : “MayaRigCacheImporter”
Sat Feb 23 14:24:17 2013 (+467ms) : debug : Registering Node Class : “RetimeExporter”
Sat Feb 23 14:24:17 2013 (+0ms) : debug : Registering Node Class : “RetimeImporter”
Sat Feb 23 14:24:17 2013 (+0ms) : debug : Evaluating all presets
Sat Feb 23 14:24:17 2013 (+0ms) : debug : Registering Preset : “Export Master camera”
Sat Feb 23 14:24:17 2013 (+0ms) : debug : Registering Preset : “Export Track camera”
Sat Feb 23 14:24:17 2013 (+0ms) : debug : Registering Preset : “Export Projection camera”
Sat Feb 23 14:24:17 2013 (+0ms) : debug : Registering Preset : “Create Rig exporter based on current scene”
Sat Feb 23 14:24:17 2013 (+0ms) : debug : Registering Preset : “Create Playblast based on current scene”
Sat Feb 23 14:24:17 2013 (+0ms) : debug : Registering Checks
Sat Feb 23 14:24:17 2013 (+0ms) : Importing animation
Sat Feb 23 14:24:17 2013 (+0ms) : debug : Import Checks folder: sceneJanitor.checks module: animation
Sat Feb 23 14:24:17 2013 (+62ms) : Importing asset
Sat Feb 23 14:24:17 2013 (+0ms) : debug : Import Checks folder: sceneJanitor.checks module: asset
Sat Feb 23 14:24:17 2013 (+16ms) : Importing geometry
Sat Feb 23 14:24:17 2013 (+0ms) : debug : Import Checks folder: sceneJanitor.checks module: geometry
Sat Feb 23 14:24:17 2013 (+14ms) : Importing naming
Sat Feb 23 14:24:17 2013 (+0ms) : debug : Import Checks folder: sceneJanitor.checks module: naming
Sat Feb 23 14:24:17 2013 (+0ms) : Importing rig
Sat Feb 23 14:24:17 2013 (+0ms) : debug : Import Checks folder: sceneJanitor.checks module: rig
Sat Feb 23 14:24:17 2013 (+16ms) : Importing scene
Sat Feb 23 14:24:17 2013 (+0ms) : debug : Import Checks folder: sceneJanitor.checks module: scene
Sat Feb 23 14:24:17 2013 (+0ms) : debug : Not creating comet menu in batch mode
Sat Feb 23 14:24:17 2013 (+0ms) : Registering maya specific queueSubmitter functions
Sat Feb 23 14:24:17 2013 (+0ms) : Setting up callbacks
Sat Feb 23 14:24:17 2013 (+0ms) : mayaStartup completed
Sat Feb 23 14:24:17 2013 (+0ms) : debug : Registering Checks for tboa
Sat Feb 23 14:24:17 2013 (+0ms) : Importing correctOceanBuffer
Sat Feb 23 14:24:17 2013 (+0ms) : debug : Import Checks folder: sanityChecks module: correctOceanBuffer
Sat Feb 23 14:24:17 2013 (+47ms) : Importing PublishException localOceanCache
Sat Feb 23 14:24:17 2013 (+46ms) : Finding plugis to autoload: //S2/exchange/software/managed/SoftwareConfiguration/mayaShared/env/projects/tboa/plugins;//S2/exchange/software/managed/SoftwareConfiguration/mayaShared/env/configs/default/plugins;C:/Program Files/Autodesk/Maya2012/bin/plug-ins;;C:/Program Files/Autodesk/Maya2012/bin/plug-ins;…/plugins/MsvTools042/plug-ins;C:/Program Files/NVIDIA Corporation/PhysX for Maya/Maya 2012/plug-ins;C:/Documents and Settings/ScanlineVFX/My Documents/maya/2012-x64/plug-ins;C:/Documents and Settings/ScanlineVFX/My Documents/maya/plug-ins;C:/Program Files/Autodesk/Maya2012/bin/plug-ins;…/plugins/MsvTools042/plug-ins;C:/Program Files/NVIDIA Corporation/PhysX for Maya/Maya 2012/plug-ins
Sat Feb 23 14:24:17 2013 (+0ms) : Dir starts with //s2, checking for files: //S2/exchange/software/managed/SoftwareConfiguration/mayaShared/env/projects/tboa/plugins
Sat Feb 23 14:24:17 2013 (+0ms) : Appending to list: DifferenceMesh.mll
Sat Feb 23 14:24:17 2013 (+0ms) : Dir starts with //s2, checking for files: //S2/exchange/software/managed/SoftwareConfiguration/mayaShared/env/configs/default/plugins
Sat Feb 23 14:24:17 2013 (+0ms) : Appending to list: AbcExport.mll
Sat Feb 23 14:24:17 2013 (+0ms) : Appending to list: AbcImport.mll
Sat Feb 23 14:24:17 2013 (+0ms) : Appending to list: IntersectRay.mll
Sat Feb 23 14:24:17 2013 (+0ms) : Appending to list: Maya2012ExocortexAlembic1.1.mll
Sat Feb 23 14:24:17 2013 (+0ms) : Appending to list: SOuP.mll
Sat Feb 23 14:24:17 2013 (+0ms) : Appending to list: XMeshLoader.mll
Sat Feb 23 14:24:17 2013 (+0ms) : Appending to list: closestPointOnCurve.py
Sat Feb 23 14:24:17 2013 (+0ms) : Appending to list: fbxmaya.mll
Sat Feb 23 14:24:17 2013 (+0ms) : Appending to list: pointOnGridMesh.mll
Sat Feb 23 14:24:17 2013 (+0ms) : Appending to list: poseDeformer.mll
Sat Feb 23 14:24:17 2013 (+0ms) : Appending to list: poseReader.mll
Sat Feb 23 14:24:17 2013 (+0ms) : Appending to list: rigBlueprint.py
Sat Feb 23 14:24:17 2013 (+0ms) : Appending to list: rigConfig.py
Sat Feb 23 14:24:17 2013 (+0ms) : Appending to list: rigRoot.py
Sat Feb 23 14:24:17 2013 (+0ms) : Appending to list: savePc2Cmd.mll
Sat Feb 23 14:24:17 2013 (+0ms) : Appending to list: cacheFilePc2.mll
Sat Feb 23 14:24:17 2013 (+0ms) : Appending to list: relaxNode.mll
Sat Feb 23 14:24:17 2013 (+0ms) : Loading plugin: DifferenceMesh.mll
Sat Feb 23 14:24:17 2013 (+78ms) : Loading plugin: AbcExport.mll
Sat Feb 23 14:24:17 2013 (+92ms) : Loading plugin: AbcImport.mll
Sat Feb 23 14:24:18 2013 (+93ms) : Loading plugin: IntersectRay.mll
Sat Feb 23 14:24:18 2013 (+94ms) : Loading plugin: Maya2012ExocortexAlembic1.1.mll
Sat Feb 23 14:24:18 2013 (+108ms) : Loading plugin: SOuP.mll
Sat Feb 23 14:24:19 2013 (+844ms) : Loading plugin: XMeshLoader.mll
Sat Feb 23 14:24:19 2013 (+108ms) : Loading plugin: closestPointOnCurve.py
Sat Feb 23 14:24:19 2013 (+78ms) : Loading plugin: fbxmaya.mll
Sat Feb 23 14:24:19 2013 (+312ms) : Loading plugin: pointOnGridMesh.mll
Sat Feb 23 14:24:19 2013 (+77ms) : Loading plugin: poseDeformer.mll
Sat Feb 23 14:24:19 2013 (+94ms) : Loading plugin: poseReader.mll
Sat Feb 23 14:24:19 2013 (+78ms) : Loading plugin: rigBlueprint.py
Sat Feb 23 14:24:19 2013 (+77ms) : Loading plugin: rigConfig.py
Sat Feb 23 14:24:20 2013 (+78ms) : Loading plugin: rigRoot.py
Sat Feb 23 14:24:20 2013 (+78ms) : Loading plugin: savePc2Cmd.mll
Sat Feb 23 14:24:20 2013 (+93ms) : Loading plugin: cacheFilePc2.mll
Sat Feb 23 14:24:20 2013 (+94ms) : Loading plugin: relaxNode.mll
Sat Feb 23 14:24:20 2013 (+108ms) : Loading plugin: Mayatomr
Sat Feb 23 14:24:21 2013 (+625ms) : Scanline Pipeline Maya Setup Complete!

no more lines in our log

Another one with the same behaviour:

Last lines in the slave log:

2013-02-22 14:52:31: 0: STDOUT: Fri Feb 22 14:52:31 2013 (+0ms) : debug : Registering Node Class : “MocapAnimationImporter”
2013-02-22 14:52:31: 0: STDOUT: Fri Feb 22 14:52:31 2013 (+14ms) : debug : Registering Node Class : “RigCacheExporter”
2013-02-22 14:52:31: 0: STDOUT: Fri Feb 22 14:52:31 2013 (+46ms) : debug : Registering Node Class : “ScanlineResourceList”
2013-02-22 14:52:31: 0: STDOUT: Fri Feb 22 14:52:31 2013 (+16ms) : debug : Registering Node Class : “MayaSceneExporter”
2013-02-22 14:52:31: 0: STDOUT: Fri Feb 22 14:52:31 2013 (+0ms) : debug : Registering Node Class : “MayaSceneImporter”
2013-02-22 14:52:31: 0: STDOUT: Fri Feb 22 14:52:31 2013 (+15ms) : debug : Registering Node Class : “NullAnimationExporter”
2013-02-22 14:52:31: 0: STDOUT: Fri Feb 22 14:52:31 2013 (+0ms) : debug : Registering Node Class : “NullAnimationImporter”
2013-02-22 14:52:31: 0: STDOUT: Fri Feb 22 14:52:31 2013 (+78ms) : debug : Registering Node Class : “XMeshExporter”
2013-02-22 14:52:31: 0: STDOUT: Fri Feb 22 14:52:31 2013 (+0ms) : debug : Registering Node Class : “XMeshImporter”
2013-02-22 15:15:57: Scheduler Thread - Cancelling task because task “4_1061-1075” could not be found

Last lines in our log:

Fri Feb 22 14:52:31 2013 (+78ms) : debug : Registering Node Class : “XMeshExporter”
Fri Feb 22 14:52:31 2013 (+0ms) : debug : Registering Node Class : “XMeshImporter”
Fri Feb 22 14:52:31 2013 (+639ms) : debug : Registering Node Class : “MayaRigCacheExporter”
Fri Feb 22 14:52:31 2013 (+0ms) : debug : Registering Node Class : “MayaRigCacheImporter”
Fri Feb 22 14:52:32 2013 (+625ms) : debug : Registering Node Class : “RetimeExporter”
Fri Feb 22 14:52:32 2013 (+0ms) : debug : Registering Node Class : “RetimeImporter”
Fri Feb 22 14:52:32 2013 (+0ms) : debug : Evaluating all presets
Fri Feb 22 14:52:32 2013 (+0ms) : debug : Registering Preset : “Export Master camera”
Fri Feb 22 14:52:32 2013 (+0ms) : debug : Registering Preset : “Export Track camera”
Fri Feb 22 14:52:32 2013 (+0ms) : debug : Registering Preset : “Export Projection camera”
Fri Feb 22 14:52:32 2013 (+0ms) : debug : Registering Preset : “Create Rig exporter based on current scene”
Fri Feb 22 14:52:32 2013 (+0ms) : debug : Registering Preset : “Create Playblast based on current scene”
Fri Feb 22 14:52:32 2013 (+0ms) : debug : Registering Checks
Fri Feb 22 14:52:32 2013 (+0ms) : Importing animation
Fri Feb 22 14:52:32 2013 (+0ms) : debug : Import Checks folder: sceneJanitor.checks module: animation
Fri Feb 22 14:52:32 2013 (+47ms) : Importing asset
Fri Feb 22 14:52:32 2013 (+0ms) : debug : Import Checks folder: sceneJanitor.checks module: asset
Fri Feb 22 14:52:32 2013 (+16ms) : Importing geometry
Fri Feb 22 14:52:32 2013 (+0ms) : debug : Import Checks folder: sceneJanitor.checks module: geometry
Fri Feb 22 14:52:32 2013 (+14ms) : Importing naming
Fri Feb 22 14:52:32 2013 (+0ms) : debug : Import Checks folder: sceneJanitor.checks module: naming
Fri Feb 22 14:52:32 2013 (+16ms) : Importing rig
Fri Feb 22 14:52:32 2013 (+0ms) : debug : Import Checks folder: sceneJanitor.checks module: rig
Fri Feb 22 14:52:32 2013 (+0ms) : Importing scene
Fri Feb 22 14:52:32 2013 (+0ms) : debug : Import Checks folder: sceneJanitor.checks module: scene
Fri Feb 22 14:52:32 2013 (+16ms) : debug : Not creating comet menu in batch mode
Fri Feb 22 14:52:32 2013 (+0ms) : Registering maya specific queueSubmitter functions
Fri Feb 22 14:52:32 2013 (+0ms) : Setting up callbacks
Fri Feb 22 14:52:32 2013 (+0ms) : mayaStartup completed
Fri Feb 22 14:52:32 2013 (+0ms) : debug : Registering Checks for tboa
Fri Feb 22 14:52:32 2013 (+0ms) : Importing correctOceanBuffer
Fri Feb 22 14:52:32 2013 (+0ms) : debug : Import Checks folder: sanityChecks module: correctOceanBuffer
Fri Feb 22 14:52:32 2013 (+46ms) : Importing PublishException localOceanCache
Fri Feb 22 14:52:32 2013 (+46ms) : Finding plugis to autoload: //S2/exchange/software/managed/SoftwareConfiguration/mayaShared/env/projects/tboa/plugins;//S2/exchange/software/managed/SoftwareConfiguration/mayaShared/env/configs/default/plugins;C:/Program Files/Autodesk/Maya2012/bin/plug-ins;;C:/Program Files/Autodesk/Maya2012/bin/plug-ins;…/plugins/MsvTools042/plug-ins;C:/Program Files/NVIDIA Corporation/PhysX for Maya/Maya 2012/plug-ins;C:/Documents and Settings/ScanlineVFX/My Documents/maya/2012-x64/plug-ins;C:/Documents and Settings/ScanlineVFX/My Documents/maya/plug-ins;C:/Program Files/Autodesk/Maya2012/bin/plug-ins;…/plugins/MsvTools042/plug-ins;C:/Program Files/NVIDIA Corporation/PhysX for Maya/Maya 2012/plug-ins
Fri Feb 22 14:52:32 2013 (+0ms) : Dir starts with //s2, checking for files: //S2/exchange/software/managed/SoftwareConfiguration/mayaShared/env/projects/tboa/plugins
Fri Feb 22 14:52:32 2013 (+0ms) : Appending to list: DifferenceMesh.mll
Fri Feb 22 14:52:32 2013 (+0ms) : Dir starts with //s2, checking for files: //S2/exchange/software/managed/SoftwareConfiguration/mayaShared/env/configs/default/plugins
Fri Feb 22 14:52:32 2013 (+0ms) : Appending to list: AbcExport.mll
Fri Feb 22 14:52:32 2013 (+0ms) : Appending to list: AbcImport.mll
Fri Feb 22 14:52:32 2013 (+0ms) : Appending to list: IntersectRay.mll
Fri Feb 22 14:52:32 2013 (+0ms) : Appending to list: Maya2012ExocortexAlembic1.1.mll
Fri Feb 22 14:52:32 2013 (+0ms) : Appending to list: SOuP.mll
Fri Feb 22 14:52:32 2013 (+0ms) : Appending to list: XMeshLoader.mll
Fri Feb 22 14:52:32 2013 (+0ms) : Appending to list: closestPointOnCurve.py
Fri Feb 22 14:52:32 2013 (+0ms) : Appending to list: fbxmaya.mll
Fri Feb 22 14:52:32 2013 (+0ms) : Appending to list: pointOnGridMesh.mll
Fri Feb 22 14:52:32 2013 (+0ms) : Appending to list: poseDeformer.mll
Fri Feb 22 14:52:32 2013 (+0ms) : Appending to list: poseReader.mll
Fri Feb 22 14:52:32 2013 (+0ms) : Appending to list: rigBlueprint.py
Fri Feb 22 14:52:32 2013 (+0ms) : Appending to list: rigConfig.py
Fri Feb 22 14:52:32 2013 (+0ms) : Appending to list: rigRoot.py
Fri Feb 22 14:52:32 2013 (+0ms) : Appending to list: savePc2Cmd.mll
Fri Feb 22 14:52:32 2013 (+0ms) : Appending to list: cacheFilePc2.mll
Fri Feb 22 14:52:32 2013 (+0ms) : Appending to list: relaxNode.mll
Fri Feb 22 14:52:32 2013 (+0ms) : Loading plugin: DifferenceMesh.mll
Fri Feb 22 14:52:32 2013 (+78ms) : Loading plugin: AbcExport.mll
Fri Feb 22 14:52:33 2013 (+157ms) : Loading plugin: AbcImport.mll
Fri Feb 22 14:52:33 2013 (+108ms) : Loading plugin: IntersectRay.mll
Fri Feb 22 14:52:33 2013 (+93ms) : Loading plugin: Maya2012ExocortexAlembic1.1.mll
Fri Feb 22 14:52:33 2013 (+125ms) : Loading plugin: SOuP.mll
Fri Feb 22 14:52:34 2013 (+858ms) : Loading plugin: XMeshLoader.mll
Fri Feb 22 14:52:34 2013 (+94ms) : Loading plugin: closestPointOnCurve.py
Fri Feb 22 14:52:34 2013 (+94ms) : Loading plugin: fbxmaya.mll
Fri Feb 22 14:52:34 2013 (+311ms) : Loading plugin: pointOnGridMesh.mll
Fri Feb 22 14:52:34 2013 (+94ms) : Loading plugin: poseDeformer.mll
Fri Feb 22 14:52:34 2013 (+108ms) : Loading plugin: poseReader.mll
Fri Feb 22 14:52:35 2013 (+94ms) : Loading plugin: rigBlueprint.py
Fri Feb 22 14:52:35 2013 (+77ms) : Loading plugin: rigConfig.py
Fri Feb 22 14:52:35 2013 (+78ms) : Loading plugin: rigRoot.py
Fri Feb 22 14:52:35 2013 (+78ms) : Loading plugin: savePc2Cmd.mll
Fri Feb 22 14:52:35 2013 (+93ms) : Loading plugin: cacheFilePc2.mll
Fri Feb 22 14:52:35 2013 (+94ms) : Loading plugin: relaxNode.mll
Fri Feb 22 14:52:35 2013 (+94ms) : Loading plugin: Mayatomr
Fri Feb 22 14:52:36 2013 (+639ms) : Scanline Pipeline Maya Setup Complete!

Hey Laszlo,

Going forward, can you perhaps upload the actual log file, instead of pasting the contents into your replies? I’m finding it very difficult to keep up with what you posting, and compare logs, simply because I’m having to do a lot of scrolling. :slight_smile:

It definitely seems like most of these problems begin to occur around the time where a slave loses the ability to connect to the database and/or the repository. I’m assuming these are different machines (inferno and deadline.scanlinevfxla.com)? What operating system are these machines running?

Coincidentally, I have been running tests today to see how the slave behaves when it loses connection to the database and/or repository while in the middle of rendering, and I have already fixed a few bugs. It could be these bugs are the root of at least some of the issues you are seeing. These fixes will be included in beta 13, which we hope to get out next week. You can then upgrade your farm to see if this helps stabilize things.

Cheers,

  • Ryan

Also, any update on the slaves you were running in nogui mode?

I managed to look over many of the logs you posted here, and a common occurrence is that the post job script “removeSoftwareDepth.py” is involved. Is it possible that this script could dead lock at some point? I checked our code, and currently we are just running the main function and waiting for it to return. If it never returns, the slave could get stuck waiting for it to return indefinitely. This is a problem even if the job it’s working on is deleted.

I’m going to do some digging about what we could do to avoid this, but I’m just curious how “advanced” this post job script is, and if there is a chance it could lock up.

Cheers,

  • Ryan

Just an update that we’ve found a solution to pre/post job scripts potentially locking things up, which will also be included in beta 13.

Would it be possible to post your removeSoftwareDepth.py script?

Cheers,

  • Ryan

Ill try to do just attaches, its a bit more cumbersome, as i am remoting into slaves… to attach, ill need to copy the files somewhere instead of just doing a copy/paste :slight_smile:

Inferno is an isilon, running their freebsd based OS

deadline is the mongo machine, using centos 6.3

The contents of said script:

import os
from Deadline.Scripting import *

def removeDepthPasses(sDir):
for sContent in os.listdir(sDir):
if “_depth.jpg” in sContent:
os.remove(os.path.join(sDir, sContent))

def main(*args):
sDirs= SlaveUtils.GetCurrentJobValue(‘OutputDirectories’)
if len(sDirs) > 0:
removeDepthPasses(sDirs[0])

Nope not yet, i have not yet been able to do this.

Privacy | Site terms | Cookie preferences