Cannot initialize usage-based licensing for custom Arnold AMI

osiriswrecks · May 4, 2021, 2:19pm

I spun up a custom AMI with Gaffer and Arnold to use with Deadline, but because I installed Arnold manually it’s not seeing my UBL certificates. I know you have to configure licensing for Arnold as a separate step, but I can’t find anything online that deals with setting this up for Deadline on your own - I think AWS assumes you’ll just start with one of the pre-configured Arnold/Maya, Arnold/3DS, whatever instances instead.

At the moment I can render but everything is watermarked of course, and trying to pass the Arnold limit with the job just causes the task to get queued up and then de-queued over and over again without ever getting anywhere. Any help is appreciated!

Bobo · May 4, 2021, 4:41pm

There are several ways to run Deadline with cloud instances, so it depends a lot on the actual architecture, but here are the main points to pay attention to:

For UBL to work, the Deadline License Forwarder must be running.
- If you are running Deadline on premises, you must run it on the local network
- If you are running Deadline all-in-AWS, you must run the Deadline License Forwarder yourself on an instance, likely on the one hosting the Repository and Database.
- If you are running Deadline in a hybrid cloud setup with AWS Portal, then the License Forwarder is already deployed automatically as part of the Gateway instance, and there is no need to run it yourself
When the Deadline License Forwarder is first launched in the first two cases above, it will ask for the location of the UBL certificates.
- If you need to modify the License Forwarder’s configuration to change the path to where the certificates are stored, you can edit the file C:\ProgramData\Thinkbox\Deadline10\deadline.ini (on Windows) and modify the line
```
LicenseForwarderSSLPath=C:\DeadlineCerts\UBL
```
- In the AWS Portal case, you configure the path to the UBL certificates in the Advanced tab of the AWS Portal Setup dialog, and Deadline takes care of uploading them to S3 and making them accessible to the License Forwarder on the Gateway instance.
When the Worker picks a job that is tagged with a License Limit that mentions Arnold, the Worker will connect to the License Forwarder and make sure it can acquire a stub for the given Limit. Then it will set the environment variable for Arnold to connect to the License Forwarder which connects to the cloud license server that hosts the actual licenses, and Arnold gets its license.

osiriswrecks · May 4, 2021, 5:48pm

Thanks for the info, @Bobo. I’m running a hybrid cloud setup - Deadline repo/portal/etc are running on a local machine, but all rendering happens on AWS instances.

If I understand you correctly, this means that the licensing should just be handled automatically, but my jobs just keep getting queued/requeued without starting a render. Removing the limit allows the render to continue with a watermark. Is it possible that whatever command is trying to configure Arnold to connect to the License Forwarder isn’t finding my Arnold installation? Are there environment variables or something I need to set on the AMI side to make sure they make the connection?

osiriswrecks · May 4, 2021, 6:14pm

Aha, I have some more info now. I connected to the worker log, and it’s throwing this error:

Error: No longer serving products.
Could not initialize usage-based licensing for arnold. Either the product name is not valid or the product is only available on AWS Portal instances.

Bobo · May 4, 2021, 7:35pm

You MUST build your custom Arnold instance from an AWS Portal instance that came with Deadline.

osiriswrecks · May 4, 2021, 8:01pm

@Bobo I did - it just wasn’t an instance that also came with Arnold because there didn’t seem to be an Arnold standalone AMI from what I could tell. They were all bundled with other DCC’s, like Maya or 3DS Max.

EDIT: For what it’s worth, I used the Deadline Worker Base Image Linux 10.1.5.0 2020-03-13T034909ZAMI to spin up my own.

Bobo · May 4, 2021, 9:05pm

For V-Ray and Arnold, I believe we are re-using the Maya Linux AMIs to do stand-alone rendering, since all of them come with a stand-alone copy. You could start with a Maya+Arnold AMI, and update Arnold to your version. It will likely work.

osiriswrecks · May 4, 2021, 9:13pm

Ok fantastic. I’ll give it a try tonight.

osiriswrecks · May 5, 2021, 3:52am

@Bobo I tried a couple of different Maya/Arnold AMI instances tonight, and they all came back with the same error message as my original one did:

I tried both running Arnold from it’s own directory in /opt, and replacing the mtoa plugin files in /opt/solidangle/mtoa/2018/ just in case that might do the trick.

Bobo · May 5, 2021, 5:29am

Let’s take a step back here - what happens when you launch an AWS Portal Spot Fleet with the Arnold Standalone AMI listed in the Spot Fleet Configuration dialog? We want to make sure your Arnold UBL configuration works with the standard Deadline AMIs before we go a step further and try custom AMIs.

Can you confirm that the Deadline AMIs can use your current Arnold UBL configuration?

osiriswrecks · May 5, 2021, 1:16pm

Oddly enough, I get a different error when I spin up an Arnold Standalone instance.

Scheduler Thread - Failed to retrieve the secret (/admin/ublsettings/UsageBasedURL), this operation was forbidden. Please ensure you have been granted access to this resource, or contact your Administrator to ensure Secrets Management was correctly configured. Please see Server's application log for further information. (System.InvalidOperationException)

Bobo · May 5, 2021, 3:33pm

It is not strange - it means the Secrets Manager was deployed when the Repository was installed (it defaults to on, unless you opt out, or you have database encryption disabled), and you have not granted a role under the Manage Identities dialog.

https://docs.thinkboxsoftware.com/products/deadline/10.1/1_User%20Manual/manual/secrets-management/deadline-secrets-management.html#assigning-identity-status-and-roles

Note that you can both grant a Role explicitly (I believe that adding it for one instance that uses a particular AMI will apply to all instances running that AMI), and using a procedural approach where you define an IP pattern, and the Role is assigned to all Workers that match that pattern.

When the Secrets Manager is installed and configured, all secrets (like the account name and password used to map a Windows drive under Tools > Configure Repository Options > Mapped Drives, or the UBL Activate Code) are stored in it, and the client applications must be granted access to pull these secrets out of it before they can use them.

osiriswrecks · May 5, 2021, 6:03pm

The good news is, assigning the proper role to the instance let it begin rendering. Of course it errored out because it’s missing some software that’s on the custom AMI but it was able to find the license ok according to the License Server logs.

EDIT: Ok, here’s what I know. With Arnold Standalone AMIs, they will try to kick off a render as long as they are whitelisted for Secrets. I have a couple of custom AMIs based on one of the Maya2018/Arnold base AMIs - one is untouched and the other has my software loaded and configured on it. But those instances will turn on and go idle indefinitely. They never try to pick up a job, don’t reach out to the license server, and their names never show up in the Manage Identities box to be assigned a Role.

Is it possible the AMIs I’m using are too old? Is the Arnold Standalone AMI newer than 2018, the newest Maya/Arnold AMI I can see in the AWS Console? I’m currently using 357466774442/Deadline Slave Base Image 10.0.10.3 with Maya 2018 and Arnold for my custom AMIs.

EDIT #2: After all of this testing, I think the issue comes down to Secrets. My AMIs are based on a version that’s older than when Secrets was introduced (I thought I grabbed the earliest one originally but I didn’t). I’ll do another test with a Deadline version greater than 10.1.10 and let you know how it goes.

Justin_B · May 5, 2021, 8:32pm

Make sure the version of Deadline you’re running on the AMI matches the one you’re running locally. You’ve figured this out in Edit#2 but I want to let you know that you’re right.

osiriswrecks · May 5, 2021, 9:03pm

Sometimes the biggest issues lie between the chair and the computer.

Bobo · May 5, 2021, 10:35pm

Yes, the AMIs must be the EXACT VERSION of your Repository. If you have Secrets Manager enabled, you are likely running a 10.1.10 or later. You MUST base your AMI on a Deadline AWS Portal AMI that was built with the 10.1.10 Worker. There are underlying services related to asset synchronization and possibly licensing that would make an ancient 10.0.x AMI not work with your Repository.

You can filter by the Deadline version, be sure to look under Public Images - I see 31 for the latest 10.1.15, there will be similar numbers for older 10.1.x builds:

osiriswrecks · May 6, 2021, 12:25am

We’re soooo close now. My custom AMI can see my render credits, but it throws a connection error. Do I have to open port 5056 manually on the AMI? Or is this something else?

2021-05-06 00:21:08:  Port Forwarder (arnold:5056): Client connected to port forwarder.
2021-05-06 00:21:08:  Worker - Confirmed Credit Usage for "arnold".
2021-05-06 00:21:08:  >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021-05-06 00:21:08:  Exception Details
2021-05-06 00:21:08:  ExtendedSocketException -- Connection refused 10.128.2.4:5056
2021-05-06 00:21:08:  SocketException.SocketErrorCode: ConnectionRefused (10061)
2021-05-06 00:21:08:  SocketException.ErrorCode: 111 (Connection refused)
2021-05-06 00:21:08:  Win32Exception.NativeErrorCode: 111
2021-05-06 00:21:08:  Exception.TargetSite: Void DoConnect(System.Net.EndPoint, System.Net.Internals.SocketAddress)
2021-05-06 00:21:08:  Exception.Data: ( )
2021-05-06 00:21:08:  Exception.Source: System.Net.Sockets
2021-05-06 00:21:08:  Exception.HResult: -2147467259
2021-05-06 00:21:08:    Exception.StackTrace: 
2021-05-06 00:21:08:     at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress
2021-05-06 00:21:08:     at System.Net.Sockets.Socket.Connect(EndPoint remoteEP
2021-05-06 00:21:08:     at System.Net.Sockets.Socket.Connect(IPAddress address, Int32 port
2021-05-06 00:21:08:     at System.Net.Sockets.TcpClient.Connect(String hostname, Int32 port)
2021-05-06 00:21:08:  --- End of stack trace from previous location where exception was thrown --
2021-05-06 00:21:08:     at System.Net.Sockets.TcpClient.Connect(String hostname, Int32 port
2021-05-06 00:21:08:     at System.Net.Sockets.TcpClient..ctor(String hostname, Int32 port
2021-05-06 00:21:08:     at Deadline.Net.PortForwarder.b(Socket bej)
2021-05-06 00:21:08:  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2021-05-06 00:21:08:  Port Forwarder (arnold:5056): Client connected to port forwarder.
2021-05-06 00:21:08:  >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2021-05-06 00:21:08:  Exception Details
2021-05-06 00:21:08:  ExtendedSocketException -- Connection refused 10.128.2.4:5056
2021-05-06 00:21:08:  SocketException.SocketErrorCode: ConnectionRefused (10061)
2021-05-06 00:21:08:  SocketException.ErrorCode: 111 (Connection refused)
2021-05-06 00:21:08:  Win32Exception.NativeErrorCode: 111
2021-05-06 00:21:08:  Exception.TargetSite: Void DoConnect(System.Net.EndPoint, System.Net.Internals.SocketAddress)
2021-05-06 00:21:08:  Exception.Data: ( )
2021-05-06 00:21:08:  Exception.Source: System.Net.Sockets
2021-05-06 00:21:08:  Exception.HResult: -2147467259
2021-05-06 00:21:08:    Exception.StackTrace: 
2021-05-06 00:21:08:     at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress
2021-05-06 00:21:08:     at System.Net.Sockets.Socket.Connect(EndPoint remoteEP
2021-05-06 00:21:08:     at System.Net.Sockets.Socket.Connect(IPAddress address, Int32 port
2021-05-06 00:21:08:     at System.Net.Sockets.TcpClient.Connect(String hostname, Int32 port)
2021-05-06 00:21:08:  --- End of stack trace from previous location where exception was thrown --
2021-05-06 00:21:08:     at System.Net.Sockets.TcpClient.Connect(String hostname, Int32 port
2021-05-06 00:21:08:     at System.Net.Sockets.TcpClient..ctor(String hostname, Int32 port
2021-05-06 00:21:08:     at Deadline.Net.PortForwarder.b(Socket bej)

osiriswrecks · May 6, 2021, 2:22am

I restarted the Infrastructure and the error went away! I think we can count this one as solved.