AWS Portal UBLs no longer work after migrating to secrets management

Hi all,

I recently upgraded my install to Deadline 10.1.17.4 from a previous 10.0 install, and it seems maybe enabling secrets management was a mistake. At least until I figure out what’s going wrong here.

What I’m trying to do:

  • Render on a Maya instance using UBL for Maya and Arnold, using the latest 2022 Linux AMIs build by Thinkbox (AMI ID ami-0bc705167c064477e)
  • Infrastructure gateway host seems to be using a recent AMI (AMI ID ami-0babaa21197fc1820). As far as I can tell these are all running 10.1.17.4 clients, so they should be compatible with secrets management.
  • When the instances first boot up, I go to Manage Identities to register them and assign the Client role. I did this for both the gateway IP and the single worker I booted to test.
  • When I first installed 10.1 I allowed the installer to migrate the secrets and remove the legacy secrets

What seems to happen:
On the worker node, I get an error pulling the secret for UBL:

2021-07-17 07:10:56:  ERROR: Scheduler Thread - Unexpected Error Occurred
2021-07-17 07:10:56:  Scheduler Thread - Failed to retrieve the secret (/admin/ublsettings/UsageBasedURL), this operation was forbidden. Please ensure you have been granted access to this resource, or contact your Administrator to ensure Secrets Management was correctly configured. Please see Server's application log for further information. (System.InvalidOperationException)
2021-07-17 07:10:58:  POST https://10.128.2.4:4433/rcs/v1/getSecret returned Forbidden "" (Deadline.Net.Clients.Http.DeadlineHttpRequestException)
2021-07-17 07:10:58:     at Deadline.Net.Clients.Http.HttpClient.b(HttpRequestMessage bka)
2021-07-17 07:10:58:     at Deadline.Net.Clients.Http.HttpClient.SendRequestForStream(String method, String uri, String contentType, Dictionary`2 headers, HttpContent httpContent)
2021-07-17 07:10:58:     at Deadline.Net.Clients.Http.HttpClient.SendRequest(String method, String uri, String contentType, Dictionary`2 headers, HttpContent httpContent)
2021-07-17 07:10:58:     at Deadline.Net.Clients.Http.HttpClient.Post(String uri, Object body, String contentType, Dictionary`2 headers)
2021-07-17 07:10:58:     at Deadline.Net.Clients.Http.HttpClient.Post[TRequest,TResponse](String uri, TRequest body, String contentType, Dictionary`2 headers)
2021-07-17 07:10:58:     at Deadline.Controllers.RemoteSecretsManagementController.GetSecret(String secretId)
2021-07-17 07:10:58:  ERROR: Scheduler Thread - Unexpected Error Occurred

I see a similar matching “Authentication Failed” error on the RCS:

Authentication failed!
Failed authentication=
POST /rcs/v1/getSecret
  Content-Type=application/json; charset=utf-8
  Accept=application/json
  Accept-Encoding=br
  Authorization=DEADLINE-RSASSAPSS Credential=[REDACTED]/20210717/thinkboxrcs, SignedHeaders=accept-encoding;content-type;host;user-agent;x-amz-date;x-amz-deadline-rcs-api, Signature=[REDACTED]
  Host=10.128.2.4
  User-Agent=DeadlineWorker10.1/10.1.17.4
  Content-Length=47
  x-amz-deadline-rcs-api=4
  X-Amz-Date=20210717T071058Z

Thanks for any advice folks can share on how to further troubleshoot this. Additionally, it seems the docs (particularly the AWS Portal Quickstart) could benefit from an update to reference the secrets management stuff.

1 Like

FYI I troubleshooted this with Thinkbox Support. Ultimately we didn’t quite get to the bottom of the issue but I’ll add a few notes for others’ benefit:

  • It’s fairly easy to remove secrets management just by re-running the repository installer. Any UBL settings have to be re-configured afterwards, but it was pretty straightforward to pull these from the flexnet portal (ultimately we just removed secrets management for the time being).
  • Both the node running the Deadline RCS on your local network and the AWS gateway instance need to be assigned the “Server” role in Manage Identities. Any workers need to be assigned the “Client” role.
  • There’s no automation in role assignment today, but you can make an IP-based rule to auto-assign the client role to new AWS workers. There’s no way to make that rule differentiate the gateway instance and assign it the “Server” role though.
1 Like

I have run into the same problem. For me i was getting problems right from the start where it seemed the initial admin user did not even have rights to access the various repository options that required secrets and it was a real pain to work out how to fix that via all the command line nodes. However eventually it worked and I could save settings.

However once the aws infrastructure and fleet have booted and I assigned server role to gateway and client to the workers, i still get that secrets error.

It is pretty painful to deal with and likely just have to reinstall with it off.

We are running 10.1.16.8. The latest version would not even install the AWS Portal stuff.

Richard

I’ve also just been through the pain of ‘secrets’. trying to remove it seems to fail but go through and get running again. I do see the errors still popping up in the console though

Guess I’ll uninstall secrets as well. I tried assigning all ec2-user OSUsers to Server and it still errors.

Just another bloated Deadline feature that is next to impossible to configure and manage. Thinkbox desperately needs to refactor Deadline to be manageable and deployable by normal human beings. This feels like a giant Python hacked prototype not a product.

EDIT: Even re-installing the repository and disabling secrets now fails. Unable to Remove Secrets.