AWS Thinkbox Discussion Forums

Getting rid of AWOL slaves

The whole “multiple slaves with the same name” issue has taken a new turn. I now have a machine with two slave processes running, neither of which responds to deadlineslave -shutdown. Normally, to clean up the duplicate slaves, I run deadlineslave -shutdown (which stops one of them), kill the other, and then restart the launcher so only one slave comes up.

With this machine, I run deadlineslave -shutdown and get this:

Slave startup file is unlocked Checking port 41152 for slave
After this, I still have two slave processes running. If I kill them and restart the launcher service… they both come back. And neither will respond to the normal shutdown command.

Any ideas on how to fix this?

Upgrade to 6.2.1? :slight_smile:

For now, kill the slaves, then use the Monitor to mark them as Offline before restarting the launcher service. That should hopefully do the trick.

Cheers,
Ryan

OK, I tracked this down to an improperly-configured deadline.ini. Populating it to match our other slaves seems to work.

Just to clarify the situation, this wasn’t exactly the same as the case where an extra slave process is started by the launcher. There were two slave processes running, but neither of them would respond to the normal deadlineslave -s command (normally one of them does and the other doesn’t). In other words, when the shutdown command was issued, it didn’t think any slaves were running.

Before correcting things, I checked the slaves with lsof, and they didn’t have any port bound to listen for incoming connections. From looking at the logs, it became clear that the deadline.ini file didn’t have a network root set.

Could the behavior of the slaves be changed so that they exit when something as critical as the network root is missing from their configuration? Trying to reconnect every 10 seconds isn’t going to solve the problem…

We’ve already addressed this issue in 7.0. The problem was that the slave wouldn’t start listening until after it had connected to a repository. Now, it will be listening before it tries to connect, so even when it’s in this startup loop, it will be able to respond to the shutdown command.

Cheers,
Ryan

Wunderbar!

Privacy | Site terms | Cookie preferences