Detecting hung tasks

I’ve been looking for a way to automate a way to detect hung tasks in Deadline 6. What I’d like is to be able to detect any lack of progress over a certain period of time. If a task is hung then it will be automatically resubmitted and the slave blacklisted. I haven’t seen any sort of feature like this in Deadline so I’m curious if it’s something that can be scripted.

Other than using task timeouts, there isn’t a way to automatically determine if a render has hung in Deadline. Relying on standard output isn’t always very reliable (some renderers output little or nothing in terms of stdout), so I’m not sure if we would want that to be a “always on” feature, but maybe we could add a job option to enable this.

I think the only way to do this now would be to modify the Deadline plugin(s). You could add a stdout handler that matches every line of stdout (using the regex “*”) and the function it calls could update an internal timestamp. Then you could have a thread in there that periodically checks the last timestamp, and if there’s been no output for a set period of time, call FailRender(). This is just theoretical, so I’m not sure how well it would work.

Hm, I was hoping there was a way to tap into what was feeding the progress bar in the Monitor. It seems like more work for not so much return though. Thanks for the info!

So… it turns out we have a feature to do EXACTLY this already built into our plugin system. I just happened to stumble across this today while investigating a completely unrelated issue. The DeadlinePlugin/ManagedProcess class has a function called SetUpdateTimeout, which takes an integer representing the number of seconds the plugin will wait between lines of stdout before throwing an error. This is the description of the function:

You should be able to set this in the InitializeProcess function for your DeadlinePlugin class (for a simple plugin) or your ManagedProcess class (for an advanced plugin). As a simple example, here’s how it can be done in the Blender.py plugin file in \your\repository\plugins\Blender:

    def InitializeProcess(self):
        self.SingleFramesOnly=False
        self.StdoutHandling=True
        self.SetUpdateTimeout( 300 ) # wait 5 minutes before timeout if there is no progress
        
        #Std out handlers
        self.AddStdoutHandlerCallback(".*Fra:([0-9]+).*").HandleCallback += self.HandleStdoutRender
        self.AddStdoutHandlerCallback(".*Saved:.*").HandleCallback += self.HandleStdoutSaved
        #self.AddStdoutHandlerCallback(".*Error.*").HandleCallback += self.HandleStdoutError
        self.AddStdoutHandlerCallback("Unable to open.*").HandleCallback += self.HandleStdoutFailed
        self.AddStdoutHandlerCallback("Failed to read blend file.*").HandleCallback += self.HandleStdoutFailed
        self.AddStdoutHandlerCallback(".*Unable to create directory.*").HandleCallback += self.HandleStdoutFailed

Cheers,

  • Ryan

Oh awesome! I’ll have to dig into this Monday morning. I’m sure I’ll have a few questions though.

Thanks!