AWS Thinkbox Discussion Forums

Deadline Slaves not Corresponding to Jobs/Tasks

Hi,

  1. 6.2.1.40 R (6bc36088d)
  2. Windows 7
  3. 3ds Max
  4. Run a job that hangs

I’ve included a screenshot of what’s happening; it looks like the machines with the 20 hr frames aren’t showing up under the tasks for the associated job. When I remote into the machine, I find slaves either crashed out, or hanging on frames. Shouldn’t the slaves either time out or display themselves as stalled?
Thanks for looking!
-Shem

They could be running event plugins for that job. Events have no rules on how long they can occupy a slave for, and they don’t show up anywhere in the job or task statuses.

When I remote into the machine, I find slaves either crashed out, or hanging on frames. Shouldn't the slaves either time out or display themselves as stalled?

Do you have a timeout set for these jobs? Just wondering if the problem is that the slave isn’t respecting the timeout that is already set for the job.

In the cases where the slave has crashed, is the slave process still running, but a crash window is being displayed? If so, it could be that one of the slave threads have crashed, but the thread that reports the status is still running just fine, so the slave never misses a status update and is therefore never marked as stalled. We think we can improve robustness here in the future by having the status thread simply check the status of all the slave threads and base its state on that.

If there is a crash window for the slave, can you send a screenshot of it?

Thanks!
Ryan

Unfortunately, we removed the job because it was failing a bunch of machines, so we don’t have a screenshot as an example. Most ofthe time it was the generic “3ds Max has crashed” standard windows dialog. I’ll try to post it here if it happens within the next few weeks.
Thanks!

Privacy | Site terms | Cookie preferences