Hi,
I have noticed over the last couple of weeks that some of my slower machines stall a lot with heavy 3ds max render jobs. Error report below. I am guessing there is a timeout setting somewhere I could increase? Its fustrating as they do render through backburner just v slow.
Error Message
STALLED SLAVE REPORT
Current House Cleaner Information
Machine Performing Cleanup: ********
Version: v4.1.0.42706 R
Stalled Slave: *******
Last Slave Update: 2011-05-25 09:57:38
Current Time: 2011-05-25 10:08:19
Time Difference: 10.698 m
Maximum Time Allowed Between Updates: 10 m
Current Job Name: 2001_8003
Current Task Names: 9
Current Task Ids: 9
Searching for job with id “999_050_999_3e35dbf7”
Found possible job: 2001_8003
Searching for task with id “9”
Found possible task: :[9-9]
Task’s current slave: ********
Slave machine names match, stopping search
Associated Job Found: 2001_8003
Associated Task Found: :[9-9]
Task’s current slave: ********
Task is still rendering, attempting to fix situation.
Requeuing task
Setting slave’s status to Stalled.
Setting last update time to now.
Slave state updated.
Any help would be great, or tell me to upgrade to 5.0!!
Mark
Hi,
I believe this is due to the system time on your slave being more than 10 minutes out-of-sync with your deadline file repository, which Deadline then interprets it as having failed and re-queue’s the frame:
Check out your repository options, “SNTP Date/Time Synchronization Settings”. In an ideal world, its best to have a SNTP setup on your domain. Your local sys admin will know where this might be positioned, ie: on your domain controller for example OR set the system time manually on all your slaves and deadline file server to be the same!
Hope this helps,
Mike
You could try bumping up the stalled slave detection delay in the Repository Options. In the Monitor, while in Super User Mode, select Tools -> Configure Repository Options. Then bump up the “Stalled Slave Delay” setting in the Slave Settings. Maybe try something like 30 minutes to see if that helps.
As Mike stated, you should also make sure that your date/time is synced up across all machines. Note though that this is no longer a requirement in Deadline 5.0 and later.
Cheers,
Yes, thats what I thought, but I have looked and all slaves have the same time as the repository, syncing when they log in to the Domine controller.
I have found where I can extend the time difference for picking up stalls, thanks Ryan, and I’ve doubled to 20 mins. Ill have to see if that helps any. When the slave starts rendering heavy scenes it must stop communitacting back its time or something, so it gets picked up after 10 mins as stalled.
Ill be soon free to upgrade to 5.0, so if its now not an issue, Ill shut up!!
Thanks Mike and Ryan!