Hi,
We have been running some Cloud nodes for a few months wthout issue, suddenly our FSx storage failed early this morning and after I built a new FSx a few hours ago, none of my renders work and the Spot workers have this error:
Event Error (OnSlaveInfoUpdated): ConnectTimeoutError : Connect timeout on endpoint URL: “https://ec2.eu-west-1.amazonaws.com/”
It’s worth noting that the jobs just hang in a ‘rendering’ state - effectively costing money while they aren’t doing anything until I went investigating? Seems pretty bad! 
Anyone got any ideas? The FSx storage is visible to our onsite connection nodes and all the spawned Spot Fleet nodes are connected to it and they attach to Deadline without issue.
Thanks,
Which event is dying in that place? I’m assuming it’s in the Spot Event Plugin? More of the task error would be good to see where the failure is happening.
If the files are inaccessible I’d expect a renderer to fail, so a hang’s really really odd.
We could add something in the OnSlaveInfoUpdated that uses the SlaveInfo object we’re given to find the task and fail it when ConnectionTimeoutErrors occur. Not a fix by any means, but should at least mitigate the spend. I want more info before starting on that. 
1 Like
Thanks Justin, here is more of the log, mostly the full error:
2023-10-16 16:17:28: 0: STDOUT: Setting SOHO disk file to /mnt/PROJECT/CHAR.$F5.ass
2023-10-16 16:17:28: 0: STDOUT: Enabled log to console
2023-10-16 16:17:28: 0: STDOUT: Set verbosity to detailed
2023-10-16 16:17:28: 0: STDOUT: ** Did NOT detect NO_EXPAND_PROCEDURAL environ var. **
2023-10-16 16:17:28: 0: STDOUT: ** Using scene's EXPAND_PROCEDURAL settings. **
2023-10-16 16:17:28: 0: STDOUT: ** Did NOT detect AUTOGENERATE_TX_TEXTURES environ var. **
2023-10-16 16:17:28: 0: STDOUT: ** EXPAND PROCEDURALS is set to : 0 within this scene. **
2023-10-16 16:17:28: 0: STDOUT: ** Forcing OFF the generate tx file at time of render flag. **
2023-10-16 16:17:28: 0: STDOUT: ** Forcing on Arnold Proxy generation. **
2023-10-16 16:17:28: 0: STDOUT: ** Did not detect ARNOLD_GPU env. Forcing off GPU use on the farm. **
2023-10-16 16:17:28: 0: STDOUT: ** Forcing on Arnold Fail on missing license **
2023-10-16 16:17:28: 0: STDOUT: ** NOT forcing arnold job not to abort on error **
2023-10-16 16:17:28: 0: STDOUT: ** Forcing on Arnold checkpointing/append rendering **
2023-10-16 16:17:28: 0: STDOUT: Rendering frame 9456
2023-10-16 16:17:28: 0: STDOUT: Rendering frame 9456
2023-10-16 16:18:34: Unable to stream EC2 Deadline Worker status due to the following exception:
2023-10-16 16:18:34: A task was canceled. (System.Threading.Tasks.TaskCanceledException)
2023-10-16 16:18:34: at Deadline.AWS.AWSUtils.d(AggregateException ckp, String ckq, String ckr, String cks, String[] ckt)
2023-10-16 16:18:34: at Deadline.AWS.AWSUtils.b[g](Task`1 ckf, String ckg, String ckh, String cki, String[] ckj)
2023-10-16 16:18:34: at Deadline.AWS.Wrappers.SQSWrapper.a[g](Task`1 coz, String cpa, String cpb, String cpc)
2023-10-16 16:18:34: at Deadline.AWS.Wrappers.SQSWrapper.GetQueueUrl(String queueName)
2023-10-16 16:18:34: at Deadline.Slaves.EC2ComputeNodeStatusUpdateStream.get_l()
2023-10-16 16:18:34: at Deadline.Slaves.EC2ComputeNodeStatusUpdateStream.i(String adh, String adi, Boolean adj)
2023-10-16 16:18:34: Exception Details
2023-10-16 16:18:34: TaskCanceledException -- A task was canceled.
2023-10-16 16:18:34: TaskCanceledException.Task: System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[Amazon.SQS.Model.GetQueueUrlResponse,Amazon.Runtime.Internal.MetricsHandler+<InvokeAsync>d__1`1[Amazon.SQS.Model.GetQueueUrlResponse]]
2023-10-16 16:18:34: OperationCanceledException.CancellationToken: System.Threading.CancellationToken
2023-10-16 16:18:34: Exception.Data: ( )
2023-10-16 16:18:34: Exception.TargetSite: Void Throw()
2023-10-16 16:18:34: Exception.Source: System.Private.CoreLib
2023-10-16 16:18:34: Exception.HResult: -2146233029
2023-10-16 16:18:34: Exception.StackTrace:
2023-10-16 16:18:34: at Deadline.AWS.AWSUtils.d(AggregateException ckp, String ckq, String ckr, String cks, String[] ckt
2023-10-16 16:18:34: at Deadline.AWS.AWSUtils.b[g](Task`1 ckf, String ckg, String ckh, String cki, String[] ckj
2023-10-16 16:18:34: at Deadline.AWS.Wrappers.SQSWrapper.a[g](Task`1 coz, String cpa, String cpb, String cpc
2023-10-16 16:18:34: at Deadline.AWS.Wrappers.SQSWrapper.GetQueueUrl(String queueName
2023-10-16 16:18:34: at Deadline.Slaves.EC2ComputeNodeStatusUpdateStream.get_l(
2023-10-16 16:18:34: at Deadline.Slaves.EC2ComputeNodeStatusUpdateStream.i(String adh, String adi, Boolean adj)
2023-10-16 16:18:38: Spot: Spot Plugin - On Worker Info Updated
2023-10-16 16:18:38: Spot: i-01bbc78bba4f9dc75 is rendering.
2023-10-16 16:18:41: 0: STDOUT: Finished Rendering
2023-10-16 16:18:42: 0: INFO: Process exit code: 0
2023-10-16 16:18:42: 0: INFO: Finished Houdini Job
2023-10-16 16:18:42: 0: Done executing plugin command of type 'Render Task'
1 Like