AWS Thinkbox Discussion Forums

Deadline not Recognizing Slaves as Crashed

Hi,
We’re using version 6.1.0.54584, and we’re having an issue where slaves are crashing on a 3ds max job, but still showing up as rendering. Do you have any idea what could be causing this?
Thanks!
-Shem

Deadline’s housecleaning operation is responsible for detecting if slaves have hung or crashed and then mark them as stalled. Are you guys running Pulse? If not, running Pulse will ensure that housecleaning is performed on a regular basis. Otherwise, it is performed randomly by slaves in between tasks.

A slave is determined to be stalled if it hasn’t updated its state in a certain period of time. This time can be configured in the repository options under Slave Settings (the Wait Times tab):
thinkboxsoftware.com/deadlin … Settings_2

Cheers,
Ryan

I see. We have pulse running, and it’s performing house cleaning constantly. We’ve also noticed that there’s some significant slowdown as well.

By default, housecleaning occurs every 30 seconds, and if your farm is often busy and requires lots of job cleanup, pending jobs being released, etc, it would make sense that you would see lots of housecleaning operations going on.

Can you elaborate a bit on this? Does the Pulse application itself “slow down”, or do the housecleaning operations just take a while? If you enable Pulse verbose logging in the repository options (under Application Logging), Pulse should be printing out stats regarding how much stuff is cleaning up and how long it took for each housecleaning step. If you could send us this verbose log, we can take a look and see which parts are taking longer than others. We can then ask some follow up questions and determine if the time taking to run these operations is expected or not.

Thanks!
Ryan

For some reason, everything within Deadline started slowing down about two weeks ago following some maintenance downtime on the machine running the Deadline Repository. We’ve seen a lot of machines stalling with memory issues (which may or may not be related to Deadline), and a handful of jobs constantly getting stuck on the “Starting Up” phase.

A few details; the downtime lasted for about an hour. Also, I had to downgrade Deadline from 6.2 to 6.1 due to us not having the latest license until just recently. We’re still on the latest build of 6.1. Could any of these have affected the farm in this way?

Thanks for the support!

(Everything within deadline includes reorganizing pools, requeueing jobs, suspending jobs…etc)

Weird. The only thing I can think of is that maybe when taking the machine down, something got messed up in the database. Running a database repair should resolve the issue though if that’s the case:
viewtopic.php?f=86&t=10767&p=46973&hilit=repair#p46973

Give that a try and let us know if it makes a difference or not.

Cheers,
Ryan

Okay, sounds good. Our farm is a bit slammed at the moment, so I’ll give it a shot when the load dies down a little bit.
Thanks!

Privacy | Site terms | Cookie preferences