Hello
My bug ain’t been fixed during beta, so i’m re-opening a thread here.
My pipe is 3dsmax (2015, sp3, ext1), Vray (3.10.01), And some extra plugs (Ornatrix, forest, phoenix…)
When i’m sending 3dsmax shoot jobs, i’ve got random slaves that get stuck into “Starting Up” status.
A big problem to find the origin of the bug is that it’s random machines and random jobs.
My job 1 could be nicely rendering, then some slave will goes into ghost mode, blocked on starting up the job.
For the moment, i requeue bugged task & after a while, job is finnaly finished. But, during night, job don’t progress and that requieres a lot of human attention…
From what i’ve seen from logs, it looks like a Pulse communication problem.
Last line in log is mostly :
I’m adding a log of a ghost slave from this morning.
Thanks !
deadlineslave-NODE101-2014-12-19-0001.log (19.3 KB)
Hello,
Can you try turning off throttling if it’s possible to see if that impacts these issues? That would be helpful to see if that is the problem.
Hello.
I’ve removed throttling, now i don’t have the “Notifying Pulse” message, but slave still sometimes block on starting up.
Can I have you try to upgrade to the latest Deadline version, 7.0.1.3 and see how that works? If you don’t have the link to the release versions, email sales or support for the link. Thanks.
I did, and looks like the problem is still hanging on.
Hi,
I have 3 x steps for you to try to resolve this issue.
Looking at your supplied log report previously…
-
Please could you update your workstations and slaves to Max2015 SP3 (recently released by ADSK).
-
Please could you disable your custom eventPlugin “DLVideoGen” which is erroring out in your log and will eventually lead to a memory leak, which might be de-stabilising your machines. We can discuss separately this event plugin if you so wish. (Feel free to create a separate forum post or support ticket for privacy). You can temporarily disable the event plugin, by going to Monitor -> Configure Event Plugins -> Change the “Enabled” drop-down list item to “False”, assuming you have this UI element exposed in your custom plugin. Alternatively, moving the directory: “DLVideoGen” out of the “events” directory will suffice.
-
Please see my private message to you via this forum on some experimental code to potentially help resolve this issue.
EDIT: I need to run this experimental code via some colleagues, so I will send it to you later today.
Regards,
Mike
hi mike !
-
i’ve update extention pack2 + SP3 recently. I’m up to date on all softs now.
-
DLVideoGen is supposed to not being active. I’ve clean it few times ago. Did you see it active on last jobs ? BTW this is a “on job finish” script that shouldn’t impact starting job, but we never know…
-
I’ve added your code to my repository.
Unfortunately, it didn’t fix my problem at the moment. I’ve tryed with and without Trottling.
By The Way, i’ve noticed something that could be part of the problem.
My render node are updated with Automatic Upgrade from repository since some releases now.
The launcher detect well the 7.0.1.3 version, but windows doesn’t, still blocked on 7.0.0.47 version. I’ve update manually on some nodes & windows detected it right. Should i upgrade all my render nodes manually ?
Thanks for support !
D-7 before renders begins '
I’ve isolate some slave for testing, update them manually, still your scripts in repository and they are still blocked on “starting up”…
Hi,
Thanks very much for the updated information. You don’t need to worry about the mis-match of Deadline version between what Windows Control Panel says and what Deadline applications are stating the version is. I don’t believe it’s possible to update the control panel installed version when automatic updates run (escalated permissions required). I will ask though. If your Deadline launcher, slave, pulse, balancer or monitor application are reporting the correct version, then that is all that matters. (You can also use the Deadline Monitor --> “Version” column in the Slave Panel to verify all the slaves are running the latest version)
Quite a bit has changed recently, please could you provide a few example, full 3dsmax log reports from some of your slaves which are still displaying this issue and we can see if anything still stands out and take it from there.
Thanks,
Mike
Yep i used to check version with About into launcher, that’s why i missed this ‘bug’ versionning with automatic upgrade.
Here’s a log of recent job that bloked on “Starting Up”
deadlinelauncher-NODE101-2015-01-09-0002.zip (5.95 KB)
Hi,
Yep, as I thought, we are unable to do anything about the version mismatch due to the elevated user permissions issue.
So, your log report looks good! Unfortunately my new code doesn’t even have the chance to do anything here, as 3dsMax on this machine is just failing to start-up correctly.
-
Could you login as the user: “TAT” and see if anything stands out here? To me, it looks like 3dsMax is just plain broken on this machine outside of Deadline.
-
Do you have a simple, test Max 2015 scene file with something really simple in it, like a single “teapot”, which refuse to work on your farm, which you could send me?
-
Are you running your rendernodes as as service?
-
I think we need to start looking at striping out all your plugins, editing plugin.ini file, etc to narrow this issue down.
-
It might be worth disabling the “Kill ADSK Comm Center” switch to see if it makes any difference here.
Mike
Hey.
I’ tryed a simple teapot, and it blocked some render node.
So i tryed something, i deleted AppData folder in thoses nodes.
It debug them for the moment.
Maybe all of this was only a corrupted data in autodesk folder… i’m saying “at the moment” has the bug is random for the moment.
Stay Tuned !
-thanks-
Interesting.
On another couple of nodes which are all displaying the same issue, can you zip up their appData folders before you purge them and if they are small enough, upload here or I can setup a fast upload BOX folder for you.
As i scripted the clean of my folders, i had difficulties to find one.
But here it is.
2015 - 64bit - BUG.7z (15.7 MB)
Thanks for the upload.
Unfortunately nothing really stands out as the clear ‘winner’ here as what might be causing the issue. I have a few thoughts, that it might well be something studio specific such as a file path that no longer exists, or the max_start/maxstart.max file contains a reference to an old version of VRay or something else in it, which is causing the slave to go bang when it loads.
Areas that you might want to consider looking further into, if this issue pops up again:
- “3dsmax.ini” file - check various references to paths located on your “s:” drive and also located at: “MaxStart=\NAS_LAJR2\LAJR2\XX_LIB\Maxstart” and things like:
ColorFileName=S:\XX_LIB\UI\pingus\Pingus_UI_2.clrx
KeyboardCurDir=S:\XX_LIB\UI\pingus\
-
InfoCenter.log is reporting sometimes difficulty ‘talking’ to Autodesk web servers. Looks like a firewall block. We have more information here if you wish to resolve this issue:
docs.thinkboxsoftware.com/produc … iderations
docs.thinkboxsoftware.com/produc … -ref-label
-
“Usermacros” directory - I would double-check that one or more of your in-house mcr scripts or those pesky “__tempXXXX.mcr” files are not causing an issue here.
Sorry, I can’t be of much more help here. However, if you slowly eliminate these issues, then hopefully something in the future should stand out for you as being the root cause of the issue.
Sometimes, just deleting the “3dsmax.ini” file can do wonders to fix 3dsmax on a particular machine.