We are using an RLM server with Redshift license with AWS worker nodes, when submitting a jobs to deadline we intermittently get connection issues connecting the RLM server
License error: Error communicating with license server (-17)
License error: (RLM) Communications error with license server (-17)
RLM server is running on 5053, with the ISV running on 50053 (this was defined in the floating Redshift License we recieved from Maxon), however it’s not clear which ports related to which in the AWS Portal setup. There is the Server Port and the Vendor Daemon port.
It seems redshift_LICENSE is set to {server port}@{gateway IP} on the worker node, but it’s hard to understand what the correct values are and how to debug where this connection issue is happening.
Should both ports be open on the machine hosting the RLM server, and should these ports be forwarded on the network router?
Thanks for reaching out. For the question about which port is which, you can look at your license file, it should be like below:
SERVER [hostname] [MAC Address] 27008
VENDOR [daemon] PORT=2708
^ in this example 27008 is the Server port and 2708 is the Vendor Daemon port.
Should both ports be open on the machine hosting the RLM server, and should these ports be forwarded on the network router?
Portal Link opens up tunnels to forward the license server and vendor daemon ports (via two separate SSH tunnels/Remote Port Forwarding) through the Gateway instance which allows the AWS Portal Workers to check out licenses through them so you need to open those ports through firewall.
We have run this and are still getting the following
License error: Error communicating with license server (-17)
License error: (RLM) Communications error with license server (-17)
However this is intermittent and sometimes succeeding it gets the license. Is there a way to debug what might be causing the communication issue? I’m assuming the IP address is ok even if Deadline and license server are on the same machine?
One thing I have noticed, is that in the RLM logs we frequently get the following error
08/02 13:09 (redshift) DENIED: (1) redshift-core-cpu v2023.05 to ec2-user@ip-10-128-96-100.ec2.internal
08/02 13:09 (redshift) License server does not support this product
08/02 13:10 (redshift) OUT: redshift-core v2023.05 by ec2-user@ip-10-128-96-100.ec2.internal
08/02 13:10 (redshift) IN: redshift-core v2023.05 by ec2-user@ip-10-128-96-100.ec2.internal
This corresponds to the same times that we’re seeing errors in the Deadline logs, however, in the lic file provided by Maxon the license is for LICENSE redshift redshift-core 2024.06 could this be causing an issue?
Which 3D app are you using for redshift? Are you calling RS via the CLI or via the 3D app?
redshift-core-cpu and redshift-core are two different products.
RLM is reporting that redshift-core-cpu product is not supported (i.e. no license for it) so it is denying that and then moving onto the next which is redshift-core (for which you are licensed).
Besides looking at the rlm debug log, you can look at the “License Forwarder” log and RS log. You can also run an RLM diag on the RLM server.
I think the product DENY would not cause the “-17” issue; that “DENY” may just be a red herring.
Have you tried running the ISV / Vendor daemon without specifying its port?
We’re using the archive option in the Redshift ROP in Houdini. I’m not sure if this is related to the fact that running /redshiftCmdLine -listgpus on the worker node returns
List of available GPUs:
0 : NVIDIA A10G
1 : CPU 0 AMD EPYC 7R32
which might explain the request for redshift-core-cpu or not but either way, as you say, this may be a red herring.
Where would the License forwarder log be located? Is this on the worker or is this a separate machine on AWS.
Have you tried running the ISV / Vendor daemon without specifying its port?
Not sure what you mean here. Do you mean remove this in the RS license file?
License forwarder log: Deadline Monitor → View → New Panel → License Forwarders and then R-click select: “Connect to License Forwarder Log…”
In the redshift license file, just comment out or remove the port number on the ISV line. Example below is commenting out the original ISV line and adding another ISV line but without specifying a port number.
First, make a backup of the license file.
Original: ISV redshift port=50053
Modified:
# ISV redshift port=50053
ISV redshift
You will have to restart the RLM server process/service/daemon.
Yes, it opens another window and displays the log contents.
You can also ssh into the AWS gateway (if that is what you are using for the forwarder) and go into /var/log/Thinkbox/Deadline10/ and the deadlinelicenseforwarder log should be in there.
Hmm it seems ‘Connect to License Forwarder Log…’ doesn’t open any window. Should this be a Deadline UI window or something external to the program?
When checking the logs on the Gateway however I get this on todays log
2023-08-02 10:21:19: Deadline License Forwarder 10.2 [v10.2.1.1 Release (094cbe890)]
2023-08-02 10:21:19: License Forwarder Thread - License Forwarder thread initializing...
2023-08-02 10:21:19: License Forwarder Thread - License Forwarder thread listening on port 17004
2023-08-02 10:21:24: License Forwarder - Web Forwarder not started, certificate not found at '/opt/Thinkbox/Deadline10/certs/10.128.2.4.pfx'.
I’ve check for the /opt/Thinkbox/Deadline10/certs/10.128.2.4.pfx file but it doesn’t exist, making me think something is wrong with the License forwarder
Thanks @jarak Yep seems to do something, but doesn’t provide any info.
PS C:\Program Files\Thinkbox\Deadline10\bin> .\deadlinecommand.exe -ConnectToLicenseForwarderLog ip-10-128-2-4 False
Making a connection to '192.168.1.82' port 18000
Success: Connected to Pulse, Pulse will attempt to redirect command to target 'ip-10-128-2-4 39927'...
Success
If I just look at the logs directly on the Gateway, I can see the following NGINX issues:
023-08-07 08:46:23: BEGIN - ip-10-128-2-4\ec2-user
2023-08-07 08:46:23: Operating System: Linux
2023-08-07 08:46:23: CPU Architecture: x86_64
2023-08-07 08:46:23: CPUs: 16
2023-08-07 08:46:23: Video Card: Amazon.com, Inc. Device 1111
2023-08-07 08:46:23: Deadline License Forwarder 10.2 [v10.2.1.1 Release (094cbe890)]
2023-08-07 08:46:23: License Forwarder Thread - License Forwarder thread initializing...
2023-08-07 08:46:23: License Forwarder Thread - License Forwarder thread listening on port 17004
2023-08-07 08:46:27: License Forwarder - Web Forwarder not started, certificate not found at '/opt/Thinkbox/Deadline10/certs/10.128.2.4.pfx'.
2023-08-07 12:15:43: ERROR: UpdateClient.MaybeSendRequestNow caught an exception: POST https://10.128.2.4:4433/rcs/v1/update returned BadGateway "<html>
2023-08-07 12:15:43: <head><title>502 Bad Gateway</title></head>
2023-08-07 12:15:43: <body bgcolor="white">
2023-08-07 12:15:43: <center><h1>502 Bad Gateway</h1></center>
2023-08-07 12:15:43: <hr><center>nginx/1.12.2</center>
2023-08-07 12:15:43: </body>
2023-08-07 12:15:43: </html>
2023-08-07 12:15:43: " (Deadline.Net.Clients.Http.DeadlineHttpRequestException)
2023-08-07 12:15:48: License Forwarder - An error occurred while updating license forwarder's info: Connection Server error: 10.128.2.4 on port 4433 using POST https://10.128.2.4:4433/db/licenseforwarders/info/save. Please ensure you are connecting to the proper server and the Connection Server is up to date and running. (System.Net.WebException)
2023-08-07 12:15:57: License Forwarder - An error occurred while updating license forwarder's info: Connection Server error: 10.128.2.4 on port 4433 using POST https://10.128.2.4:4433/db/licenseforwarders/info/save. Please ensure you are connecting to the proper server and the Connection Server is up to date and running. (System.Net.WebException)
2023-08-07 12:16:06: License Forwarder - An error occurred while updating license forwarder's info: Connection Server error: 10.128.2.4 on port 4433 using POST https://10.128.2.4:4433/db/licenseforwarders/info/save. Please ensure you are connecting to the proper server and the Connection Server is up to date and running. (System.Net.WebException)
2023-08-07 12:16:16: License Forwarder - An error occurred while updating license forwarder's info: Connection Server error: 10.128.2.4 on port 4433 using POST https://10.128.2.4:4433/db/licenseforwarders/info/save. Please ensure you are connecting to the proper server and the Connection Server is up to date and running. (System.Net.WebException)
2023-08-07 12:16:25: License Forwarder - An error occurred while updating license forwarder's info: Connection Server error: 10.128.2.4 on port 4433 using POST https://10.128.2.4:4433/db/licenseforwarders/info/save. Please ensure you are connecting to the proper server and the Connection Server is up to date and running. (System.Net.WebException)
2023-08-07 12:16:31: ERROR: Error occurred while updating network settings: Connection Server error: 10.128.2.4 on port 4433 using GET https://10.128.2.4:4433/db/settings/network?invalidateCache=true. Please ensure you are connecting to the proper server and the Connection Server is up to date and running. (System.Net.WebException)
2023-08-07 12:16:31: at Deadline.StorageDB.Proxy.Utils.ProxyUtils.HandleException(Exception e, NetworkManager manager, String server, Int32 port, String certificatePath)
2023-08-07 12:16:31: at Deadline.StorageDB.Proxy.ProxySettingsStorage.GetNetworkSettings(Boolean invalidateCache)
2023-08-07 12:16:31: at Deadline.StorageDB.SettingsStorage.GetRecentNetworkSettings()
2023-08-07 12:16:34: License Forwarder - An error occurred while updating license forwarder's info: Connection Server error: 10.128.2.4 on port 4433 using POST https://10.128.2.4:4433/db/licenseforwarders/info/save. Please ensure you are connecting to the proper server and the Connection Server is up to date and running. (System.Net.WebException)
There seems to be a pattern whereby a frame will fail it’s first attempt with License error: Error communicating with license server (-17) the retry the frame and succeed. The next frame will similarly fail, then succeed.
Were you ever able to resolve this issue? We spent literal months with Maxon and Thinkbox support at the beginning of the year with a very, very similar issue. Unfortunately, were never able to resolve it. Curious to know what version of Redshift you’re using and if you’re rendering proxies.