Idle Shutdown not working

Hi folks,

I have a weird problem with my Farm since I updated to 7.1.2.1 and tried a few things and can’t really figure out whats happening.
Maybe it is not a problem of 7.1.2.1 at all but before I did the update everything worked fine.

We did a complete re-installation of our server where Deadline is running too. After we setup the complete OS incl. drivers etc. we installed Deadline (Repository, Database). We didn’t touch the nodes and kept the “old” installation of the clients with 7.0. After we booted up all nodes, they updated the version to 7.1.2.1 automatically. I also configured the Power Management and told the slaves to shutdown after 15 minutes of being in idle mode.

So far so good.

When the whole farm boots up and the Power Management tells them to shut down after 15 minutes a few of them didn’t shut down. Also, when I try to shut them down via Remote Control -> Machine Commands they refuse to shut down with the error message “Failure: cannot accept a connection because Remote Administration has been disabled under the Client Settings in the Repository Options”. The funny thing, Remote Administration is active and works for other Slaves.

I tried a complete reinstalltion of the clients where the problem occured and after the installtion is complete and the slave appears in the monitor and I can shutdown the machine once via Remote Control -> Machine Commands -> Shutdown Machine.
After it boots up again the problem is back. By now I have 4 Nodes out of 20 where the command is working and 16 which seem to ignore the command completly.

At this point I’m running out of ideas and I’m almost at a point where I think of a complete new installation of the Nodes (OS, drivers, client software etc.) to see if this solves my problem. I know there are tons of information which I probably forgot to list, but maybe someone has an idea where this could come from or where I forgot to tick a box. I think I’m at a poit where I miss the woods for the trees and hope someone could point me into the right direction.

Please let me know if you need any more information e.g. error reports and I’m happy to provide these…

Thanks in advance.

Best,
Christian

So, the first thing I would advise is to check the launcher log to make sure it can connect to the DB, as that seems like it could be the issue on this. It would be odd that the slave can connect, but the Launcher can’t, but I’d like to rule it out.

Thanks for your reply and sorry for the late response…

Actually you are completly right with your guess. Some of the blades show in the launcher that they are “connected” and some of them are “disconnected”.

I tried running them as a service or executing the launcher with a normal autostart but this doesn’t make a difference.
I also checked permission, which seem to be fine. On the repository side “everyone” had access. I was curious and added the “render” user (local user on the slave side) but also no change.
I restarted the launcher manually and suprisingly the are “connected” and I can shut them down as usual.

If the permissions are not the problem and they can connect when I restart the launcher app, what is happening here? :open_mouth:

Thanks for your time.

Best Regards,
Christian

Hello Christian,

I want to clarify, that nothing in the slave app is changing between failure and success, only the launcher is closing, is that correct?

Hello Dwight,

basically there was no change between failure and success.

The slave app. not matter if started as a service (automatic or automatic (delayed)) or in the autostart seemed to be not connected to the database. After a manual restart of the launcher (which started the slave as well) it could connect to the database.

On my way to this post, I stumbled over “Strob’s” post…
forums.thinkboxsoftware.com/vie … 11&t=13441

You helped him via teamviewer correcting on line in his dbConnect.xml. I just did the same and this solved my problem as well.

Fixed this line from
Nyx;10.40.30.4
into
10.40.30.4

Slaves start normally and after 15 minutes in idle they automatically shutdown again. Also the machine commands are working again.

Still wondering why this happened after updating to 7.1.2.1, when Strob was running 7.1.0.35 and had the same problem. I was running this version before without any issues.

Thanks for your time and for helping Strob. :slight_smile:

Best Regards,
Christian

Hello Christian,

I wanted to reassure you I have asked a dev to take a look at this because I am not sure why it’s not working.

Just info here for all. The problem is likely related to Windows’ DNS resolution. If the host name resolves to an ‘fe80’ IPv6 address, things break.

Technically this is a problem with Windows as it’s clearly not able to use fe80 addresses (I’ve never had luck with them). I’m not sure why the MongoDB middleware (https://github.com/mongodb/mongo-csharp-driver) doesn’t failover correctly like it should.