AWS Thinkbox Discussion Forums

stalled slaves eating my farm

DL v6.0.0.51030 beta 18

I have three machines on my farm that seem to be stalling on all renders…looking into why…but the issue is they stall, but the stall doesn’t show up as an error, but instead it ‘completes’ the task, and I have a job full of completed tasks that took less than 3 seconds to complete no frames to show for it.

I’m sure you’d need some error reports to help me debug this, would this be in slave reports? The last slave report for one of the offending nodes is as follows:

STALLED SLAVE REPORT

Current House Cleaner Information
Machine Performing Cleanup: Z064
Version: v6.0.0.51030 R

Stalled Slave: Z031
Slave Version: v6.0.0.51030 R
Last Slave Update: 0001-01-01 00:00:01
Current Time: 2013-04-18 03:46:40
Time Difference: 20.136 c
Maximum Time Allowed Between Updates: 10.000 m

Current Job Name:
Current Job ID:
Current Job User:
Current Task Names:
Current Task Ids:

Do not have enough information to identify stalled job/task.
Need at least a job name or a task name.
Attempting to check all active jobs to see if they have this slave rendering a task.
Setting slave’s status to Stalled.
Setting last update time to now.

Slave state updated.

A stalled slave should never result in a completed task. Can you check the reports for the job to see if there are Log reports that correspond to the completed tasks? If there are, can you post one? If there are no Log reports for the completed tasks, we’ll have to do some digging…

Thanks!

  • Ryan

Just recognized that in the stalled slave report, the time difference was over 20 centuries!! I checked our stalled slave detection logic, and found the bug that caused this. This will be fixed in the next beta.

Cheers,

  • Ryan

ah, cool. thank you.

Privacy | Site terms | Cookie preferences