Linux slaves slow to take new task


#1

I find that our Linux machines are quite a bit slower than Windows machines to dequeue a new task. For longer tasks it’s not a big deal, but for short tasks it adds up.

It looks like it takes about a minute or two for a Linux machine to get a new task once it’s finished the last one. Our Windows systems get new tasks almost instantly. We have a Pulse running on the network.

I’m wondering if it’s related to an error I see in the logs between tasks regarding licensing? My Windows machines have the same license server, and I wasn’t able to find any information from Google about that FlexNet error so I figured I’d try here. It always seems to get the license eventually, but I’m not sure why it has trouble the first time.

(I replaced some internal network information with xxxx, but the real names match up to what they should be)


#2

Hmm. “Operation now in progress” is an OS level error when trying to access a non-blocking socket (Google fun). Now, I’ve never managed to find what causes that in the Flex code we license. I did some research and no one seems to be giving real answers to what causes this either (a lot of “it’s the firewall” or DNS, and such).

One interesting bit, are you using lmadmin to serve anything, or are you using the installers that are made available over here?:

thinkboxsoftware.com/license … nstallers/


#3

I use the download from Thinkbox, it runs on a Windows 2012 server. I also have a Foundry / Nuke license being served from that same server, maybe that’s causing a conflict?

I just installed the latest from your link and restarted and it doesn’t seem to help. I’m still getting the error between tasks.


#4

I don’t think it’s a port conflict since some machines work alright. Conflicts tend to be an all-or-nothing affair. One workaround that’s helped in the past is moving it to another server… Do you have a second license server we could test with? I can have Sales send you out a relocation request, but we’ll need to move this discussion into the support system.

If you’re able, can you send an e-mail to support@thinkboxsoftware.com and we’ll try that relocation approach?


#5

Sounds good, I’ll get in touch with sales and see if switching the server helps. Thanks!


#6

I was sucked into a big project for a couple weeks and couldn’t do more testing until today. But I have victory to report!

I’m not 100% sure what solved it, but the changes I made today were:

  • Specify the port for the license server - before I just had @licenseserver and I tried it with 27008@licenseserver
  • I found there were some permission problems with the slave. When I started it manually I found it was having trouble accessing various Python files in the user directory (on Linux the user’s home directory the slave is running as + Thinkbox/Deadline8). Something must have gone strange in the installation because the user wasn’t the owner of all the files. I changed permissions on the directories and the errors went away.

I’m not sure which of those did the trick or maybe both. If anyone is having similar troubles, give those items a look.

Thanks for the help!


#7

Welcome sir!

I’m leaning towards the permissions being the culprit. The way the license server works, the clients are supposed to do a scan across the possible ports from 27000 to 27009 if there is no port defined. Not saying it’s impossible, but my bet is on the permissions.


#8

Just replying back to this thread since it was very useful in diagnosing a similar issue we were having. We had a lot of error messages similar to the above, and intermittently the slave would pick up a license and job just fine.

We only have one port open in the firewall, not a range of ports, so I assume Deadline would be randomly trying a bunch of blocked ports, and every now and then would find the open port, and work.

I tried forcing it to a single port by changing the settings to read port@licenseserver rather than just licenseserver and that seems to have fixed the issue.