AWS instances not starting job... can't find cause in logs

JeffBrown · January 5, 2020, 12:06am

My local Deadline Repository and workers are functioning well. In delving back into AWS rendering, I have managed to set up a working infrastructure and launch a test spot fleet, using a custom image. The machines show up a Render Candidates, but all I get when watching the job progress is a brief (~1 second) message of “Starting up” on frame zero. This is repeated about every 10 or 20 seconds, but I have yet to find any info in the logs I’ve looked at. Any suggestions welcomed. Wondering if the RCS errors below have anything to do with it:

JeffBrown · January 5, 2020, 12:07am

I should mention the IP address in the RCS errors is the local machine address, the static public IP address I use for the infrastructure seems to function OK.

Justin_B · January 6, 2020, 1:53pm

Hello!

So in version 10.1.0.10 we’d see this error when the AWS Portal Link could connect to the RCS but fail to authenticate to it. It’s not broken, but the chatter makes it seem like it is.

A guess off the top of my head, given the behaviour you’re describing is that the job you submitted doesn’t have the correct limit set, so the Usage-based License isn’t getting checked out. But that’s just a guess.

I’d check the Worker log from one of your AWS Worker instances, that should shed more light.

JeffBrown · January 7, 2020, 4:46pm

Thanks for the input, Justin. After reviewing a couple other threads on this forum, I decided to re-install the client & AWS components on my asset server, and things are working OK now. It might have been a certificate mis-match? I’ll happily ignore those still-present RCS errors for now; as you mentioned, it doesn’t seem to matter.

Justin_B · January 8, 2020, 4:24pm

It’s possible that your usage-based licensing cert wasn’t where it should be, or didn’t make it up into AWS. I suppose we’ll have to settle for it being a mystery till next time. Not that I want it to break again!