AWS Thinkbox Discussion Forums

deadline slave crash - orphaned task

its happening on a bunch of jobs again:

crashed_slave.PNG

In this case, the slave crashed in a nuke render, then the slave itself disappeared. Oddly, right after it dequeued a job. Last lines of the log:

2014-08-20 17:47:10:  0: STDOUT: Nuke 7.0v8, 64 bit, built Jun  7 2013.
2014-08-20 17:47:10:  0: STDOUT: Copyright (c) 2013 The Foundry Visionmongers Ltd.  All Rights Reserved.
2014-08-20 17:47:21:  0: INFO: Process exit code: 1073807364
2014-08-20 17:47:22:  0: An exception occurred: Error in RenderTasks: Error in CheckExitCode: Renderer returned non-zero error code, 1073807364. Check the log for more information.
2014-08-20 17:47:22:     at Deadline.Plugins.ScriptPlugin.RenderTasks(String taskId, Int32 startFrame, Int32 endFrame, String& outMessage, AbortLevel& abortLevel) (Deadline.Plugins.RenderPluginException)
2014-08-20 17:47:22:  0: Unloading plugin: Nuke
2014-08-20 17:47:22:  Scheduler Thread - Render Thread 0 threw a major error: 
2014-08-20 17:47:22:  >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2014-08-20 17:47:22:  Exception Details
2014-08-20 17:47:22:  RenderPluginException -- Error in RenderTasks: Error in CheckExitCode: Renderer returned non-zero error code, 1073807364. Check the log for more information.
2014-08-20 17:47:22:     at Deadline.Plugins.ScriptPlugin.RenderTasks(String taskId, Int32 startFrame, Int32 endFrame, String& outMessage, AbortLevel& abortLevel)
2014-08-20 17:47:22:  RenderPluginException.Cause: JobError (2)
2014-08-20 17:47:22:  RenderPluginException.Level: Major (1)
2014-08-20 17:47:22:  RenderPluginException.HasSlaveLog: True
2014-08-20 17:47:22:  Exception.Data: ( )
2014-08-20 17:47:22:  Exception.TargetSite: Void RenderTask(System.String, Int32, Int32)
2014-08-20 17:47:22:  Exception.Source: deadline
2014-08-20 17:47:22:    Exception.StackTrace: 
2014-08-20 17:47:22:     at Deadline.Plugins.Plugin.RenderTask(String taskId, Int32 startFrame, Int32 endFrame)
2014-08-20 17:47:22:     at Deadline.Slaves.SlaveRenderThread.RenderCurrentTask(TaskLogWriter tlw)
2014-08-20 17:47:22:  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2014-08-20 17:47:22:  Scheduler Thread - Seconds before next job scan: 1
2014-08-20 17:47:23:  Scheduler - Performing Job scan on Primary Pools with scheduling order Pool, Weighted, Balanced
2014-08-20 17:47:23:  Scheduler - The 53f53dcd13ead4410cb4f05b limit is maxed out.
2014-08-20 17:47:23:  Scheduler - The 53e98f608f4516165ccc0d78 limit is maxed out.
2014-08-20 17:47:23:  Scheduler - The 53ed280a14529e044415f757 limit is maxed out.
2014-08-20 17:47:23:  Scheduler - The 53f53b9113ead43e86a8d606 limit is maxed out.
2014-08-20 17:47:23:  Scheduler - The 53f291b4d893cc249c5bb458 limit is maxed out.
2014-08-20 17:47:23:  Scheduler - The 53f541c3a8fa77183410256e limit is maxed out.
2014-08-20 17:47:23:  Scheduler - The 53f54041f7b14d24bc9596df limit is maxed out.
2014-08-20 17:47:23:  Scheduler - The 53f5417d032f751cac6d4ea1 limit is maxed out.
2014-08-20 17:47:23:  Scheduler - The 53f528f5cfe1f11f5078ed64 limit is maxed out.
2014-08-20 17:47:23:  Scheduler - The 53f52bfede293025403c88f2 limit is maxed out.
2014-08-20 17:47:23:  Scheduler - The 53f52ed1d9c9701a0c48b46d limit is maxed out.
2014-08-20 17:47:23:  Scheduler - The 53f53cd73178bd33446a739e limit is maxed out.
2014-08-20 17:47:23:  Scheduler - The 53f53ce13178bd6fdc47e87d limit is maxed out.
2014-08-20 17:47:23:  Scheduler - The 53f53cec3178bd1cd87f3085 limit is maxed out.
2014-08-20 17:47:23:  Scheduler - The 53f53f5a8657680eb420b476 limit is maxed out.
2014-08-20 17:47:23:  Scheduler - The 53f540d83afddf13dc51000c limit is maxed out.
2014-08-20 17:47:23:  Scheduler - The 53f5418c032f753bf842fd29 limit is maxed out.
2014-08-20 17:47:24:  Scheduler - The ingest2d limit is maxed out.
2014-08-20 17:47:24:  Scheduler - The ingest2d limit is maxed out.
2014-08-20 17:47:24:  Scheduler - Slave has been marked bad for job 53f541a6be270014a4ad7929, skipping this job.
2014-08-20 17:47:24:  Scheduler - Successfully dequeued 1 task(s).  Returning.
2014-08-20 17:47:24:  0: Shutdown
2014-08-20 17:47:24:  0: Shutdown
2014-08-20 17:47:24:  0: Exited ThreadMain(), cleaning up...

It got assigned to task 15 of job 53f541b31442944998a5caf6.
Then the first lines of the next log, you can see, while it returns the stub, the tasks status is still invalid (says ‘starting up’):

2014-08-20 17:53:39:  BEGIN - LAPRO0618\scanlinevfx
2014-08-20 17:53:39:  Deadline Slave 6.2 [v6.2.0.32 R  (2563d5bc8)]
2014-08-20 17:56:09:  Auto Configuration: A ruleset has been received
2014-08-20 17:56:09:  Auto Configuration: Setting Launch Slave At Startup value to 'false'
2014-08-20 17:56:09:  Auto Configuration: Setting Restart Stalled Slave value to 'True'
2014-08-20 17:56:13:  Info Thread - Created.
2014-08-20 17:56:13:  Slave 'LAPRO0618-secondary' has stalled because it has not updated its state in 8.342 m. Performing house cleaning...
2014-08-20 17:56:13:  Could not find associated job class though.
2014-08-20 17:56:17:  Trying to connect using license server '27001@lapro0001.scanlinevfxla.com'...
2014-08-20 17:56:18:  License obtained.
2014-08-20 17:56:18:  The license file being used will expire in 100 days.
2014-08-20 17:56:18:  Scheduler Thread - Slave initialization complete.
2014-08-20 17:56:18:  Scheduler - Performing Job scan on Primary Pools with scheduling order Pool, Weighted, Balanced
2014-08-20 17:56:18:  Scheduler - The 53e98f608f4516165ccc0d78 limit is maxed out.
2014-08-20 17:56:18:  Scheduler - The 53ed280a14529e044415f757 limit is maxed out.
2014-08-20 17:56:18:  Scheduler - The 53f291b4d893cc249c5bb458 limit is maxed out.
2014-08-20 17:56:18:  Scheduler - Slave is not whitelisted for quicktime_confirmed_working limit.
2014-08-20 17:56:18:  Scheduler - The 53f52bfede293025403c88f2 limit is maxed out.
2014-08-20 17:56:18:  Scheduler - The 53f52ed1d9c9701a0c48b46d limit is maxed out.
2014-08-20 17:56:18:  Scheduler - The 53f53cd73178bd33446a739e limit is maxed out.
2014-08-20 17:56:18:  Scheduler - The 53f53cec3178bd1cd87f3085 limit is maxed out.
2014-08-20 17:56:18:  Scheduler - The 53f5417d032f751cac6d4ea1 limit is maxed out.
2014-08-20 17:56:18:  Scheduler - The 53f5418c032f753bf842fd29 limit is maxed out.
2014-08-20 17:56:18:  Scheduler - The 53f542052a844a1274a2de5e limit is maxed out.
2014-08-20 17:56:18:  Scheduler - The 53f5433191cd504bf4983f6b limit is maxed out.
2014-08-20 17:56:18:  Scheduler - The 53f5431efa354f1530a30422 limit is maxed out.
2014-08-20 17:56:18:  Scheduler - The ingest2d limit is maxed out.
2014-08-20 17:56:18:  Scheduler - Slave has been marked bad for job 53f541a6be270014a4ad7929, skipping this job.
2014-08-20 17:56:19:  Scheduler - The ingest2d limit is maxed out.
2014-08-20 17:56:19:  Scheduler - Successfully dequeued 1 task(s).  Returning.
2014-08-20 17:56:19:  Scheduler - Returning limit stubs not in use.
2014-08-20 17:56:19:  Scheduler -   returning 53f541e9df1c6d1a902c7908
2014-08-20 17:56:19:  Scheduler -   returning 53f541b31442944998a5caf6
2014-08-20 17:56:19:  Scheduler -   returning 53f541ce542d661b24ebf107
2014-08-20 17:56:19:  Scheduler -   returning 53f54210542d6615406aa334
2014-08-20 17:56:19:  Scheduler -   returning 53f54237e74b5415acdf9b4e
2014-08-20 17:56:19:  Scheduler -   returning 53f542e8162e4f094838cd8e
2014-08-20 17:56:19:  Scheduler -   returning 53f542c1faa7dc3ce0aafba7
2014-08-20 17:56:19:  Scheduler -   returning 53f5405ffeddf81184a1a9d6
2014-08-20 17:56:19:  Scheduler Thread - Synchronizing job auxiliary files from \\inferno2.scanlinevfxla.com\deadline\repository6\jobs\53f543a67165a511f89a322e
2014-08-20 17:56:19:  Scheduler Thread - All job files are already synchronized
2014-08-20 17:56:19:  Scheduler Thread - Synchronizing plugin files from \\inferno2.scanlinevfxla.com\deadline\repository6\plugins\Nuke
2014-08-20 17:56:20:  Scheduler Thread - Synchronization time for plugin files: 356.698 ms

Regular housecleaning didnt catch this on pulse, but if i triggered one manually, then it did…

That’s the randomness problem (the randomness isn’t applied when the monitor runs it, only when pulse or the slaves do). In Deadline 7, this will be guaranteed to run on a regular interval, regardless of which application is running it.

Is this fix already in beta2?

No, but it will be in beta 3. Because we have to rebuild the Qt and python libraries, we probably won’t be able to get beta 3 out until the first week of September.

Privacy | Site terms | Cookie preferences