Difference between Slave name and Machine Name

Dave · April 16, 2014, 2:56pm

Hi All

Can anyone tell me the difference between the Slave Name and the Machine Name?

As I understand it the machine name originates from the hostname of a machine, but there does the slave name originate from?

I’m having some issues with multiple linux clients machines appearing as a single slave in the slave list and understanding where the names come from is the first piece in the puzzle to finding out why…

thanks

eamsler · April 17, 2014, 4:17pm

Hey Dave! Most of the time, they’re the same thing. When the Slave starts up, it will use host name of the machine as its own name.

If you pass the Slave “-name cookies” as an argument in Deadline 6, it will append “-cookies” to its name. So, on my machine the resulting slave name that comes up would be “Moblie-010-cookies”.

What you’re likely seeing is a bug we had with the multi-Slave feature back in ye olde 5.2. The issue is two-fold.

The first half of the problem was that we didn’t append the name to the end of the machine name, and instead allowed you to name the Slave whatever you wanted. That would have been fine if not for the other side.

That other problem was how we stored names for the multi-Slave feature. Each name is written into a folder named “slaves” on the local machine, in the form of “slavename.ini”. The one with no prefix we introduced in 6.0 (".ini") is the default Slave name, in 5.2 it was called “NAME_OF_HOST.ini” (eg “Mobile-010.ini”). The reason it’s not a problem in 6.0 is because we’re using the machine name as a prefix as mentioned before.

The likely problem here is that you have Deadline 5.2 and that the boxes were set up with a system image. The way to fix this (as well as fixing accidental multiple Slaves) is to clear out the contents of that local Slave directory. In 6.0 and up, that’s at “/var/lib/Thinkbox/Deadline6/slaves” and in 5.2, it’s at “/usr/local/Thinkbox/Deadline/slaves”.

Dave · April 23, 2014, 12:39pm

Hi Edwin

Thanks for your response. I did try clearing out the local slave folders and creating some new machines from the updated image but unfortunately it didn’t fix things. From what you said I’m guessing it didn’t help because I’m running deadline 6.22, rather than 5 (please let me know if I should move this over to the beta forum).

The setup I’m working on is aws cloud based and slightly complex, unfortunately I haven’t had a moment recently to really look into this and post a comprehensive question with a suitable amount of info. For now though I wondered if I could get your opinion on my hunch that it’s a dns and hostname resolution problem. Basically with my current setup I can ping between instances using ip addresses, but there’s no dns server to resolve local addresses so pinging a hostname doesn’t work. Is there any chance this could cause problems/confusion for deadline?

The other thing I’m doing that may confuse deadline is running a shell script at startup that changes the hostname of the machine, so that each instance ends up with a unique name. Is there any chance that the deadline service starts before the hostname changes? I doubt this is what’s happening but still worth checking.

In a week I’ll be free to dedicate a larger chunk of time to solving this and providing more info, for now I’m trying to get my head round it so I can hit the ground running so any suggestions are really appreciated.

eamsler · April 23, 2014, 8:43pm

Ah! AWS changes things a bit.

If your script is not run, does the machine start up with the same host name? And, is that script run before Deadline starts up or after? If it’s run as an init script, you may want to prefix it with S0 so it starts up before anything else.

I don’t remember offhand what Amazon uses to set the hostname upon boot. We let it use its internal IP address because we decided DNS lookup failure was reasonable in our circumstances.

Maybe a reboot as a test is in order?
docs.aws.amazon.com/AWSEC2/lates … tname.html

Dave · May 14, 2014, 11:23am

In the end it turned the slave naming problem was caused by setting up our slave instances in a public subnet and then moving them over to a private subnet. Rebuilding the slaves in the private subnet meant the hostname once again was set to the private ip at boot time, allowing each machine to have a unique name.

dwallbridge · May 14, 2014, 9:10pm

Thanks for the update on that issue.