Serious bug

Hi guys we are experiencing a serious bug with Deadline 2.0.



When rendering an animation, the LAST tasks of the animation are taken by one or many slaves. The slave(s) renders the frames(s) completely BUT the slave doesn’t stop. It stay active with the task and doesn’t do a thing.



The salve(s) will stay active as long as I manualy relaunch them.



I have experience this a many times while rendering Maya|mentalRay jobs and 3dsmax|vray jobs.



This might be a bitch to track down but it really is a a bug there. The symptoms are the same every time, the very last task(s) of a multi-task job will hang.



Hope you can track this one down. If you can track it down, It would be important to release a patch for this bug. We are loosing a lot of rendering time, especialy during the week end. I had 10 machines hung for 86 hours this morning!



Thanks



Sylvain Berger | Technical Director | Alpha Vision


There is this line in the log that looks suspicious:

---- April 24 2006 – 06:20 AM ----

Slave - failed to update slaveInfo – will try again soon

Slave - reason: The process cannot access the file “\Deadlinesrv\DeadlineRepository\slaves\R93.slaveInfo.R93_682” because it is being used by another process.


And this one too:

InfoThread - SLAVE INFO THREAD:requesting slave info thread quit.
Sending cancel task command to Plugin
Could not query for available free disk space


Sylvain Berger | Technical Director | Alpha Vision


Hi Sylvain,



There are a couple of things you can try that might help resolve the issue.


  1. On your Deadline Repository machine, go to Control Panel and open Administrative Tools. Then open Computer Management, expand Shared Folders, and then click on Sessions. Close any sessions that have been open for more than 24 hours. This will force any stalled slaves that have locks on files in Deadline to give up those locks. This is likely the cause of the slave error message.


  2. Open up the repository options dialog in the Monitor and click on the Error Reporting Setup option on the left. Set the reporting policy to Disabled and click OK to close the dialog. We’ve found that occassionally the automatic error reporting causes a slave to be locked up. It is likely that automated error reporting is kicking in when the slave reports that error message.



    So (1) should hopefully prevent that error from happening again and (2) should hopefully prevent the slave from locking up when it reports that error. I hope this helps with your problem.



    Cheers,



    Ryan Russell

    Frantic Films Software

    http://software.franticfilms.com/

    (204)949-0070

Thanks for the tips.



I already had the error reports off in the repository settings. As for the shared folder connections, I have looked at them and everything looks fine.



I will monitor it more closely and try to get more information when it appens again.



Thank





Sylvain Berger | Technical Director | Alpha Vision