AWS Thinkbox Discussion Forums

AWS Asset Server / Portal Link issue

Hello,
I’m setting up AWS Portal Link along with AWS Asset Server. I’ve managed to succesfully install those, launch an infrastructure and a fleet. Workers are appearing and pick up jobs. But on the machines (linux or windows) there’s no way to access assets.
No errors are shown in the logs files in C:\ProgramData\Thinkbox.

Paths seemingly are mapped correctly, but the job (draft) fails quickly :
Initialize: Error: Failed to create output directory "/mnt/Data/DProjects72bd5ce404ac6e221a73c20e63d3efec/test/awsportal". The path may be invalid or permissions may not be sufficient.

The only errors I can see are in CloudWatch Logs :

In /thinkbox/S3BackedCache/worker and in /thinkbox/S3BackedCache/central

1715787576.437389 2024-05-15 15:39:36,437 [/opt/Thinkbox/S3BackedCache/Client/lib/python3.10/site-packages/slavelib/utilities.py:wrapper:79] [root] [3409] [Dummy-4] [ERROR] CacheManagerException: 'getattr'
Traceback (most recent call last):
  File "/opt/Thinkbox/S3BackedCache/Client/lib/python3.10/site-packages/slavelib/cache_mgmt.py", line 420, in _get_file_attributes
    response = self.central.GetFileAttributes(request)
  File "/opt/Thinkbox/S3BackedCache/Client/lib/python3.10/site-packages/grpc/_channel.py", line 492, in __call__
    return _end_unary_response_blocking(state, call, False, deadline)
  File "/opt/Thinkbox/S3BackedCache/Client/lib/python3.10/site-packages/grpc/_channel.py", line 440, in _end_unary_response_blocking
    raise _Rendezvous(state, None, None, deadline)
grpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with (StatusCode.UNKNOWN, Exception in central controller: <_Rendezvous of RPC that terminated with (StatusCode.UNAVAILABLE, Connect Failed)>)>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/opt/Thinkbox/S3BackedCache/Client/lib/python3.10/site-packages/slavelib/utilities.py", line 71, in wrapper
    ret = func(*args, **kwargs)
  File "/opt/Thinkbox/S3BackedCache/Client/lib/python3.10/site-packages/slavelib/utilities.py", line 18, in wrapper
    ret = func(*args, **kwargs)
  File "/opt/Thinkbox/S3BackedCache/Client/lib/python3.10/site-packages/slavelib/utilities.py", line 54, in wrapper
    ret = func(*args, **kwargs)
  File "/opt/Thinkbox/S3BackedCache/Client/lib/python3.10/site-packages/slavelib/fuse_operations.py", line 212, in getattr
    ret = self.cache_manager.lstat(path_rel)
  File "/opt/Thinkbox/S3BackedCache/Client/lib/python3.10/site-packages/slavelib/cache_mgmt.py", line 30, in wrapper
    ret = func(*args, **kwargs)
  File "/opt/Thinkbox/S3BackedCache/Client/lib/python3.10/site-packages/slavelib/cache_mgmt.py", line 1638, in lstat
    response = self._get_file_attributes(file_entry)
  File "/opt/Thinkbox/S3BackedCache/Client/lib/python3.10/site-packages/slavelib/cache_mgmt.py", line 451, in _get_file_attributes
    raise CacheManagerException(str(e.code()))
slavelib.cache_mgmt.CacheManagerException: StatusCode.UNKNOWN

In /thinkbox/S3BackedCache/worker (repeated)

1715788052.807198 2024-05-15 15:47:32,807 [/opt/Thinkbox/S3BackedCache/Client/lib/python3.10/site-packages/slavelib/utilities.py:wrapper:79] [root] [2809] [Dummy-7] [ERROR] CacheManagerException: 'getattr'
Traceback (most recent call last):
  File "/opt/Thinkbox/S3BackedCache/Client/lib/python3.10/site-packages/slavelib/utilities.py", line 71, in wrapper
    ret = func(*args, **kwargs)
  File "/opt/Thinkbox/S3BackedCache/Client/lib/python3.10/site-packages/slavelib/utilities.py", line 18, in wrapper
    ret = func(*args, **kwargs)
  File "/opt/Thinkbox/S3BackedCache/Client/lib/python3.10/site-packages/slavelib/utilities.py", line 54, in wrapper
    ret = func(*args, **kwargs)
  File "/opt/Thinkbox/S3BackedCache/Client/lib/python3.10/site-packages/slavelib/fuse_operations.py", line 212, in getattr
    ret = self.cache_manager.lstat(path_rel)
  File "/opt/Thinkbox/S3BackedCache/Client/lib/python3.10/site-packages/slavelib/cache_mgmt.py", line 30, in wrapper
    ret = func(*args, **kwargs)
  File "/opt/Thinkbox/S3BackedCache/Client/lib/python3.10/site-packages/slavelib/cache_mgmt.py", line 1593, in lstat
    raise CacheManagerException('SequenceManager is not ready yet')
slavelib.cache_mgmt.CacheManagerException: SequenceManager is not ready yet

I’ve tried reinstalling multiple times, with different users and on different servers. Service account is the user used on all our render nodes and has access to the relevant assets.

Any help appreciated,
Thank you

Is this the service user for the AWSPortal services on your on-premise server or on the EC2 instances? Just making sure you haven’t changed the user running on the EC2 Workers, that’s got to stay as it is.

Which version of Deadline are you running on premise, and which version of Deadline is on the AMI? We’ve seen this issue when everything isn’t running the same version of Deadline across the board.

1 Like

Service account is only for on prem servers. I didn’t touch the config of the EC2 Workers.
EC2 Worker version: Command Stdout: v10.3.0.15 Release (76d003b0a)
Here are the installers used throughout our on prem servers / workers :

DeadlineClient-10.3.0.13-windows-installer.exe
DeadlineRepository-10.3.0.13-windows-installer.exe
AWSPortalLink-1.3.0.3-windows-installer.exe

Could this minor version mismatch be causing this error ?

Possibly! 10.3.0.13 was pulled for an issue in the AWS Portal event code where the Workers wouldn’t automatically shut themselves down and would sit running until the Resource Tracker steps in.

So please upgrade to 10.3.0.15 and lets see how it behaves.

Thank you I’ll try it out. Can I update only the Portal / Asset Server ? Or Repo aswell ? Or do I need to update everything including all my workers ?

The Repo is a must-update, and I’d update the Portal/Asset server if the version number on the installer is bumped. I cannot recall offhand if it has and I’d rather reply quick if you’re working now than make you wait. :sweat_smile: Your local Workers should be fine, though auto-upgrade can take care of the local Workers for you.

As an aside, technically auto upgrade can upgrade your EC2 instances as well, but those charge by the minute so having an upgrade get run every time the connect isn’t cost efficient.

Alright that makes sense thank you for your input. I’ll go ahead and update repo + portal to the latest version and we’ll see how it goes !
I’ll leave the auto update for a couple of days so our workers pick it up and I’ll disable it so not to waste time on EC2 workers.

1 Like

Well, I updated everything, EC2 workers are on the same (10.3.2.1) version as RCS and Repository. But still the same exact errors in CloudWatch Logs and on the EC2 workers not being able to access on prem files.

Worker error :

2024-05-17 15:03:36:  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2024-05-17 15:03:37:  Scheduler Thread - Job's Limit Groups: 
2024-05-17 15:03:38:  0: Loading Job's Plugin timeout is Disabled
2024-05-17 15:03:38:  0: SandboxedPlugin: Render Job As User disabled, running as current user 'ec2-user'
2024-05-17 15:03:39:  All job files are already synchronized
2024-05-17 15:03:40:  Plugin DraftPlugin was already synchronized.
2024-05-17 15:03:40:  0: Executing plugin command of type 'Initialize Plugin'
2024-05-17 15:03:40:  0: INFO: Executing plugin script '/var/lib/Thinkbox/Deadline10/workers/ip-10-128-24-251/plugins/6644dc7f641f9be783d465be/DraftPlugin.py'
2024-05-17 15:03:40:  0: INFO: Plugin execution sandbox using Python version 3
2024-05-17 15:03:40:  0: INFO: Found Draft python module at: '/var/lib/Thinkbox/Deadline10/workers/ip-10-128-24-251/Draft/Draft.so'
2024-05-17 15:03:40:  0: INFO: Setting Process Environment Variable PYTHONPATH to /var/lib/Thinkbox/Deadline10/workers/ip-10-128-24-251/Draft:/home/ec2-user/Thinkbox/Deadline10/pythonAPIs/vXiJchfTd6HrfrxRHxsOCw==:/opt/Thinkbox/Deadline10/bin/python3:/opt/Thinkbox/Deadline10/bin/python3/lib:/opt/Thinkbox/Deadline10/bin/python3/lib/site-packages:/opt/Thinkbox/Deadline10/lib/python3/lib/python310.zip:/opt/Thinkbox/Deadline10/lib/python3/lib/python3.10:/opt/Thinkbox/Deadline10/lib/python3/lib/python3.10/lib-dynload:/opt/Thinkbox/Deadline10/lib/python3/lib/python3.10/site-packages:/opt/Thinkbox/Deadline10/bin/
2024-05-17 15:03:40:  0: INFO: Setting Process Environment Variable MAGICK_CONFIGURE_PATH to /var/lib/Thinkbox/Deadline10/workers/ip-10-128-24-251/Draft
2024-05-17 15:03:40:  0: INFO: Setting Process Environment Variable LD_LIBRARY_PATH to /opt/Thinkbox/Deadline10/bin/python/lib:/var/lib/Thinkbox/Deadline10/workers/ip-10-128-24-251/Draft
2024-05-17 15:03:40:  0: CheckPathMapping: Swapped "P:\test\awsportal" with "/mnt/Data/elysiumprojectseffd94c1274df72bf35b367f8ddd5957/test\awsportal"
2024-05-17 15:03:40:  0: INFO: Creating the output directory "/mnt/Data/elysiumprojectseffd94c1274df72bf35b367f8ddd5957/test/awsportal"
2024-05-17 15:03:40:  0: Encountered an error while executing plugin command of type 'Initialize Plugin'
2024-05-17 15:03:42:  Sending kill command to process tree with root process 'deadlinesandbox.exe' with process id 4682
2024-05-17 15:03:44:  Scheduler Thread - Render Thread 0 threw a major error: 
2024-05-17 15:03:44:  >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024-05-17 15:03:44:  Exception Details
2024-05-17 15:03:44:  RenderPluginException -- Initialize: Error: Failed to create output directory "/mnt/Data/elysiumprojectseffd94c1274df72bf35b367f8ddd5957/test/awsportal". The path may be invalid or permissions may not be sufficient.
2024-05-17 15:03:44:  RenderPluginException.Cause: JobError (2)
2024-05-17 15:03:44:  RenderPluginException.Level: Major (1)
2024-05-17 15:03:44:  RenderPluginException.HasSlaveLog: True
2024-05-17 15:03:44:  RenderPluginException.SlaveLogFileName: /var/log/Thinkbox/Deadline10/deadlineslave_renderthread_0-ip-10-128-24-251-0000.log
2024-05-17 15:03:44:  Exception.TargetSite: Deadline.Slaves.Messaging.PluginResponseMemento d(Deadline.Net.DeadlineMessage, System.Threading.CancellationToken)
2024-05-17 15:03:44:  Exception.Data: ( )
2024-05-17 15:03:44:  Exception.Source: deadline
2024-05-17 15:03:44:  Exception.HResult: -2146233088
2024-05-17 15:03:44:    Exception.StackTrace: 
2024-05-17 15:03:44:     at Deadline.Plugins.SandboxedPlugin.d(DeadlineMessage bgx, CancellationToken bgy
2024-05-17 15:03:44:     at Deadline.Plugins.SandboxedPlugin.Initialize(Job job, CancellationToken cancellationToken
2024-05-17 15:03:44:     at Deadline.Slaves.SlaveRenderThread.e(String ake, Job akf, CancellationToken akg
2024-05-17 15:03:44:     at Deadline.Slaves.SlaveRenderThread.b(TaskLogWriter aka, CancellationToken akb)
2024-05-17 15:03:44:  <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

Running `ls -ls /mnt’ on the EC2 workers outputs this :

Connection Accepted.Command exited with code: 0
Command Stdout: total 4
0 d--------- 1 ec2-user ec2-user 81920 May 17 14:49 Data
4 drwxrwxrwx 6 ec2-user ec2-user  4096 May 17 14:59 dtu

And running ls /mnt/Data fails :

Failure: Command exited with code: 2
Command Stdout: total 0

Command Stderr: ls: reading directory /mnt/Data: Bad address
 (System.Exception)

Asset Server seems quite happy in the logs:

1715958548.715086 2024-05-17 17:09:08,715 [C:\Program Files (x86)\Thinkbox\AWSPortalAssetServer\awsportalassetserverlib\share_util.py:refresh_shares:104] [root] [137156] [Dummy-1] [INFO] Refreshing shares list.
1715958548.720086 2024-05-17 17:09:08,720 [C:\Program Files (x86)\Thinkbox\AWSPortalAssetServer\awsportalassetserverlib\share_util.py:refresh_shares:111] [root] [137156] [Dummy-1] [INFO] Share: Path: \\elysium\projects\ Id: elysiumprojectseffd94c1274df72bf35b367f8ddd5957
1715958548.721086 2024-05-17 17:09:08,721 [C:\Program Files (x86)\Thinkbox\AWSPortalAssetServer\awsportalassetserverlib\share_util.py:refresh_shares:111] [root] [137156] [Dummy-1] [INFO] Share: Path: \\elysium\Assets\ Id: elysiumAssets0365ed9f9512882829c36fc20ae3dcc1
1715958550.141956 2024-05-17 17:09:10,141 [C:\Program Files (x86)\Thinkbox\AWSPortalAssetServer\awsportalassetserver.py:get_and_set_ip_address:89] [root] [137156] [Dummy-1] [INFO] IPAddress set to 10.88.49.55

As does AWSPortal Link Tunnel connected

No other logs I’ve found show any errors that could help me.

Privacy | Site terms | Cookie preferences