AWS Thinkbox Discussion Forums

Slave Auto canceling task

Yep !

After a week end trying to rendering some basic jobs, i’m still into troubles.

Some task worked pretty fine, then after a while, all slaves goes into “auto cancellling mode” and no image goes out.
I checked somes log files and the basic path is like that :

Pulse log corresponding :

Tasks are requeue forever, so nothing goes ahead.
Maybe this could be a bad options / timer for pulse, but i ain’t change many options for the moment.


Windows Env.
Repo & Pulse upgraded 0.50 manually, slaves automatic upgraded.
3dsmax, vray and some additionnals plugins (ornatrix, forest…)

I think for RC5 we’re going to try reverting back to the way the slaves reported their status to see if that helps. The new way we introduced can result in a 30% reduction in the amount of data the slave is sending to the database, but that’s only in the best case scenario.

We’ve been running some scale testing in the cloud, and we seem to be hitting a state-related bug as well, and we think this could be the potential fix.

We hope to have RC5 out tomorrow.

Cheers,
Ryan

Ryan does this affect farms only with throttling on? Or the problem would affect all scenarios?

It would apply to all scenarios.

Hello !

update 0.51 is munch better !

But i still have some random slaves that stay stuck sending information to the pulse.
Slave log :

In the monitor they are easily trackable because the status is “starting up” even after 10 or 15min (my timeout for 3dsmax plugin is 1K seconds) so i can requeue them manually.
Did you tracked it ?

Thanks !

Hi,
It looks like NODE113 is having difficulty starting 3dsMax. Can you test manually starting 3dsMax on NODE113 under the same user account that Deadline runs as? Do you see any issues which you need to resolve? Is NODE113 running the latest SP for 3dsMax? If there is still an issue, please can you provide the full 3dsMax job log report for this node, which should tell us some more information potentially why 3dsMax is failing to start on NODE 113.

Node113 is a totally rendernode, so i can’t launch max on it.
BTW i know it works fine. It rendered some other frames, and if a requeue the bugged frame it could work fine on same render node. That’s a bit random at the moment…
And yes, all my nodes are up to date from version 3DSmax / Vray / deadline and others plugs…

If this is an option for you, you could use the 3dsMax 30-day eval license to startup 3dsMax on this troublesome machine if that would help?
Alternatively, our 3dsMax job log reports are reasonably comprehensive, so if you prefer, feel free to post one of these logs from a machine displaying this issue and we can take a look and see if anything stands out for you.

As i said, this looks like pretty random at the moment, differents frames, different nodes, differents jobs…

Here’s a recent log from a “waiting” slave.
You can see job queued at 17h15, slave still in “waiting to start” at 17h30 when i connect to it.
deadlineslave-NODE214-2014-12-12-0000.log (3.67 MB)

In your 3dsmax plugin configuration settings (Tools -> Configure Plugins while in monitor), can you turn on Kill ADSK Comms Center Process option to see if it helps? We’ve seen cases where Max can lock up randomly like this, and enabling this option can sometimes help.

Cheers,
Ryan

I turned on this option, but i still have random waiting slaves

Privacy | Site terms | Cookie preferences