AWS Thinkbox Discussion Forums

slaves not releasing finished tasks

We have installed 2.6 and are running 53 servers. I am seeing servers not release their final tasks with this version. The frame renders but the slave doesn't reset itself to look for the next task. It seems to be random. Anywhere from 1 to 11 servers are hanging at a time. (so far) Only way I've found to get them back is to close the slave and launch it again. Canceling current task or trying to force the slave to look for a new task doesn't seem to work. I've seen it on both max and maya files. Any ideas?

Hi, I had this problem recently. The problem seems to appear when you set the value “Clear Salave Logs Between Jobs” to True in the repository option.



Since I set this option to False, the problem disapeared.





Sylvain Berger | Technical Director | Alpha Vision


Unfortunately that’s not it. It was already set to false. Any other ideas? This seems to be a 2.6 issue as we haven’t seen the problem in 2.5 or 2.0. Thanks.

I just checked many jobs, and I think I see the same problems. The last tasks takes 3 or 5 times longuer then the other frames…



It’s a shame this problem came back in 2.6.



I’ll dig in more and try to find something.



Sylvain Berger | Technical Director | Alpha Vision


Our tasks aren’t releasing at all. We sent a file last night on the way out and the frames were taking 5 minutes to render. The hung slaves were still saying they were rendering this morning 12 hours later. Thanks for the help.

Yes, this problem is a big one. It’s a shame that you experience this, but at least I am no longer the only one experiencing it. So maybe the debugging will be easier now that we know the problem is not related to my renderfarm setup.



Are you experiencing this in Maya only or any other renderer?





Sylvain Berger | Technical Director | Alpha Vision


Can you enable slave verbose logging in the Repository Options (Tools ->

Configure Repository Options while in the monitor in super user mode),

and then restart your slave machines so they recognize the change. Then

the next time the problem happens, can you post the slave log?



Thanks!

Just to confirm, does this problem only happen on the last unfinished

task of a job? If so, the problem might be related to how Deadline sends

an email to the job’s user when it completes. We modified this code

because people were often running into problems with emails not beint

sent with the old code.



Can you guys try setting your notification settings to None in your user

settings (Select Tools -> Options in the monitor, then select the User

Settings option from the list on the left). After making the change,

submit a new job and see if you still have the problem. If the problem

goes away, then for the next version we will add the option to select

which version of emailing you want to use.



Let me know how the tests go!



Thanks!

Not always the last frame but it seems to always be happening close to the end. Frame 179 or 180 on a 180 frame render and frame 85 on a 90 frame render. Now I'm only seeing one machine hanging each time we render. So possibly my multiple slaves were from users submitting and re-submitting the same job last night. I'll try and get a log to you soon. Thanks.

....Email notification change didn't seem to change anything

it can be one or many frames. But it is always the last tasks.



let’s say that my animation have 20 tasks, as long as there are queued tasks, each task render in a normal time, but when there are no more queued task, those last task takes longer to render. At worst, they will get stuck there rendering for hours even days.





Sylvain Berger | Technical Director | Alpha Vision


Email notification was the problem on our end. The only weird thing was we had to change the setting in the "manage users" box. Otherwise tools - options wasn't always actually changing the preference. Could be due to some weird way we have our permissions setup. The seemingly actual problem was that 48 out of our 53 servers do not have internet access. Those servers would hang during the last frame rendered trying to email notification. The 5 servers that have internet access would still fail in delivering an email (there's no mail program set up on them) but wouldn't hang if they happened to render the last frame. So my suggestion would be to set notification to "none" at install. Unless most people have their servers connected to the internet... I play in the defense industry so outside access is by default difficult to get most of the time around here. Of course now that we've figured it out it takes all of 20 seconds to make the change regardless which direction you end up going.

Thanks for your help. smile

Thanks for the information Kurt. Going forward we will default the

notification method to None, and we will try to make the notification

code more robust in situations where there is no internet connection, or

the internet connection goes down. Having the slaves hang indefinitely

like that is not a good thing.



Anyways, I’m glad you were able to find a workaround, and we appreciate

your help (and patience) getting this problem sorted out.



Cheers,

hi there, when i was using deadline 2.5 i was receiving email notification instead now, after upgrading to 2.6 i do not get them anymore…



are there any clue about?

what can i try to do…

here ppl seeme have to disable the notification service couse it is creating problems on the last part of the job, instead i do not have any problem about finishing job…for me 2.6 works fine, email notification apart.

thanks

matteo

Hi Matteo,



Check the repository configuration settings to make sure your snmp

settings are configured properly. While in super user mode in the

Monitor, select Tools -> Configure Repository Options. Then select

Repository Options from the list on the left and scroll down to the

Notification Settings section. The key things to set are:



Email Notification Sender Account

Email Notification SMTP Server

Email Notification Domain Name



If these settings aren’t set properly, and users choose to be notified

via email when their jobs complete, it can cause the slaves to delay

finishing the last task because they’re trying to email the user.



Cheers,

the smtp server is set and so on all the mail point to me…so i should get something …



and my user has the mail notification set as active…what else could i look at?

Can you confirm whether or not your slave machines can send emails?

Perhaps the slaves can’t access your smtp server, or don’t have the

necessary permissions to send the emails. Maybe try submitting a single

task job to Deadline, then watch the slave machine as it finishes the

task to see if it prints out any error messages when attemting to send

the notification. Feel free to post the slave log here as well.



Cheers,

i do not have any mail software on the box…but all the rendering box can reach internet…and i didn-t get any netsend messege at all…



weird…at least on the box on my side i saw a black prompt starting and closing…but i didn-t get any netsend on my box





is there a way to send an email…just to check the emaliling system…without haveing to render frames?

last time it was such a simple thing to be setupped




i do not have any mail software on the box…but all the rendering box can reach internet…and i didn-t get any netsend messege at all…



weird…at least on the box on my side i saw a black prompt starting and closing…but i didn-t get any netsend on my box





is there a way to send an email…just to check the emaliling system…without haveing to render frames?

last time it was such a simple thing to be setupped




It may be possible that net sends are disabled on your slaves. Check out

this link:

http://support.microsoft.com/kb/839018



You can try sending an email via Deadline’s command line tool:

DeadlineCommand.exe -SendEmail -to "myself@myemail.com" -subject

“test” -message “This is a test”



You can run the following for complete usage help:

DeadlineCommand.exe -help -SendEmail



Nothing has changed for the notification feature between Deadline 2.5

and 2.6, so it’s strange that you would be having difficulties.



Cheers,

i do not have any mail software on the box…but all the rendering box can reach internet…and i didn-t get any netsend messege at all…



weird…at least on the box on my side i saw a black prompt starting and closing…but i didn-t get any netsend on my box





is there a way to send an email…just to check the emaliling system…without haveing to render frames?

last time it was such a simple thing to be setupped




Privacy | Site terms | Cookie preferences