Starting to see a lot of errors like this since we rolled out beta13. The slave picks up, starts rendering then pops this right away. Note that the slave system time seems to be off by an hour:
STALLED SLAVE REPORT
Current House Cleaner Information
Machine Performing Cleanup: lapro0216
Version: v6.0.0.50509 R
Stalled Slave: LAPRO0233
Slave Version: v6.0.0.50509 R
Last Slave Update: 2013-03-01 12:04:56
Current Time: 2013-03-01 13:04:58
Time Difference: 1.001 hrs
Maximum Time Allowed Between Updates: 10.000 m
Current Job Name: [TBOA] Software Render: FB_090_1770_maya_animation_layout.ma version: v0013
Current Job ID: 51310898f5ec9b07b48a2868
Current Job User: Winn.OBrien
Current Task Names: 1151-1165
Current Task Ids: 10
Searching for job with id “51310898f5ec9b07b48a2868”
Found possible job: [TBOA] Software Render: FB_090_1770_maya_animation_layout.ma version: v0013
Searching for task with id “10”
Found possible task: 10:[1151-1165]
Task’s current slave: LAPRO0233
Slave machine names match, stopping search
Associated Job Found: [TBOA] Software Render: FB_090_1770_maya_animation_layout.ma version: v0013
Job User: Winn.OBrien
Submission Machine: LAPRO3060
Submit Time: 03/01/2013 12:02:34
Associated Task Found: 10:[1151-1165]
Task’s current slave: LAPRO0233
Task is still rendering, attempting to fix situation.
Requeuing task
Setting slave’s status to Stalled.
Setting last update time to now.
I noticed that the slave was reporting proper times before beta13, then it started popping values an hour later
The particular slave had error reports logged at (these are the times listed in the slave ‘right click / slave reports’ popup dialog):
13.10 <-- now using v6.0.0.50509 R
13.07 <-- now using v6.0.0.50509 R
13.05 <-- now using v6.0.0.50509 R
11.58 <-- still using v6.0.0.50272 R
11.56 <-- still using v6.0.0.50272 R
10.57 <-- still using v6.0.0.50272 R
We did make a breaking change between beta 12 and 13 with respect to reports, so it’s possible those times are misleading.
Jon and I walked through the code on Friday, and we couldn’t find any reason why this would be happening. When checking for stalled slaves, Deadline is always using the database time, and is adjusting it to local time so that even if the database is in a different time zone, it shouldn’t matter.
Just in case, can you check the time on ‘lapro0216’ if you haven’t done so already? This is the machine that reported that the slave on ‘LAPRO0233’ was stalled.
Thanks for checking that. It would appear that the slave machine’s local time is still playing a role, which is something we were trying to avoid in Deadline 6. We’ll look into it.
Given that date (and the current date), I’m willing to bet it was a Daylight Savings thing (April 2008 would’ve been in DST, while the current date is not). We’ll make sure that it won’t stay a problem