Slaves Hanging

Discussion and Support of Deadline, the Render Management System
david3d
Posts: 9
Joined: Mon Nov 21, 2005 11:53 am

Slaves Hanging

Postby david3d » Wed Nov 23, 2005 4:04 am

I occasionally run into a situation where Fusion (4) tasks seem to hang. Requiring me to requeue the tasks to resolve it. They hang for hours and once requeued render out in minutes. I was curious if anyone else was running into this problem and if the new version addresses this.



David Miller

LaszloSebo
Posts: 2581
Joined: Mon Nov 21, 2005 7:30 am

Re: Slaves Hanging

Postby LaszloSebo » Wed Nov 23, 2005 4:25 am

> I occasionally run into a situation where Fusion (4) tasks seem to hang.
> Requiring me to requeue the tasks to resolve it. They hang for hours and
> once requeued render out in minutes. I was curious if anyone else was
> running into this problem and if the new version addresses this.


Is the hanging due to a crash? Can you vnc into the machine while its
"taking a rest"?

We had occasional crashes with df, and setting a default slave timeout
took care of that. DF script is really touchy, and tends to flip out
every now and then. Deadline can't catch all the crashes, so a master
timeout is usually a good solution. Especially with df renders, where
you cant be surprised by a multi-hour long frame on occasion (like
with mental ray & motionblur_ ;)


cheers,
laszlo

david3d
Posts: 9
Joined: Mon Nov 21, 2005 11:53 am

Re: Slaves Hanging

Postby david3d » Wed Nov 23, 2005 5:27 am

The machine itself is responding fine. I have thought about using the task timeout but seems to be a bit messy especially when the same flow can contain tasks that take minutes and task that take hours. Is there anything that I can do to help the development team catch these Fusion errors that cause this problem?



David Miller

null
Posts: 1
Joined: Thu Nov 24, 2005 1:31 pm

Re: Slaves Hanging

Postby null » Thu Nov 24, 2005 1:53 pm

A task timeout is messy as some arbitrary number but it may be possible to change the sematics of the slave such that if fusion hasn't consumed any CPU time (or memory usage hasn't changed) in some ordinate amount of time then the task is killed and the frame requeued. This of course will not work if there is an infinite loop (constantly chewing CPU time doing nothing productive in a loop) based hang and not a dead lock (ie. does nothing until an event happens which never comes).



You can tell what kind of 'hang' it is by opening up the task manager on the particular slave. Go to the processes tab and add a few columns (under the view menu), I would suggest 'CPU Time' and 'Memory Usage Delta'. Watch those numbers for a few minutes, if CPU Time doesn't increase then you probably have a 'lock' based bug and if the 'Memory Usage Delta' is not zero then something is happening (ie. memory is being allocated and/or freed) and it could be an infinite loop.

david3d
Posts: 9
Joined: Mon Nov 21, 2005 11:53 am

Re: Slaves Hanging

Postby david3d » Tue Nov 29, 2005 9:02 am

Sorry I just got back to the board today. Next time I see the problem I'll take a look at those things and let you know.



--

David


Return to “Thinkbox Software - Deadline”

Who is online

Users browsing this forum: No registered users and 1 guest