AWS Thinkbox Discussion Forums

Deadline Cloud task failures

Hello There,

We have been using deadline cloud for sometime now, and have noticied this issue where in a single task gets stuck or fails causing the entire job to fail.

For example all the tasks in this job was successfull except this one and the error message was like below.

2024/09/23 21:21:54+05:30 Fra:221 Mem:25135.91M (Peak 28218.27M) | Time:00:21.56 | Remaining:04:10.63 | Mem:21487.00M, Peak:21664.18M | Scene, ViewLayer | Sample 1/150
2024/09/23 21:22:20+05:30 Fra:221 Mem:25135.91M (Peak 28218.27M) | Time:00:47.85 | Remaining:03:38.89 | Mem:21487.00M, Peak:21664.18M | Scene, ViewLayer | Sample 17/150
2024/09/23 21:22:38+05:30 Caught SIGTERM
2024/09/23 21:22:38+05:30 Sending SIGTERM to 8114

The maximum memory on this machine was 64 GB, and this is from Customer Managed Fleet

I can share more details as required

Was this a spot instance perhaps? If all the instances were the same size and only one machine failed then I’d guess that the machine was taken away. Did this task succeed on a later attempt or does it just never work?

Hello Justin, Sorry for the late response

Yes this is happening for spot instances. All the instances are of the same size, some tasks succeed but some get stuck for 8 to 10 hours, Ideally time required for execution of a single task is 3 mins on a m5.8xlarge.

If there are 240 tasks, then we see this behaviour from task number 200 or 210

Privacy | Site terms | Cookie preferences