AWS Thinkbox Discussion Forums

[Bug] [Deadline10] Hanging sandboxes

Hi there,

We are noticing stale sandbox processes on the slaves from deadline10. I recall haivng a similar issue on deadline8, but it was eventually resolved with one of the builds:

I haven’t seen this out in the wild since we fixed it in 8.0…

The patch note says “Fixed a bug that could potentially cause the creation of extra sandbox processes if a render plugin fails to initialize properly.”

I’ll go dig into what code fixed that in 8.0.16 and see if it was carried forward or not.

Thanks Edwin!

Well, some of the fixes for the sandbox being artificially left open have definitely come over to 10. In fact, they were merged into 9 and carried forward. Can you get some memory dumps of those sandboxes and send them into the support system?

Also, I’m assuming this problem is happening on an idle render node. During normal operation, each render thread gets its own sandbox:

The slave is not idle, but its rendering only in one slave. In the past, once a render was done, its sandbox was cleared up. It seems like now they are left behind?

The screenshot you sent shows this behavior on your side as well, it doesnt seem right to see 6 sandboxes? (one for the launcher, one for the slave’s event handler, one for the job, maybe… but 6?)

You see 3 slaves here:
proc ID 4360 and 5180 are idle. 5180 is a deadline8 instance, and you can see it only has 1 sandbox, as expected. The idle deadline10 sandbox has 3.
Proc ID 4860 is rendering and has 8 sandboxes, only one of which belongs to the actual render (3dsmax):

Here you can see the process start times listed as well:
Capture2.PNG

Ill make a few mem dumps

On my side, the screenshot was from a job where concurrent tasks had been set to 16. I didn’t check if they were reaped after the render completed…

The memory dumps will be great here. They should show what (if anything) is waiting inside the sandbox.

If I recall correctly, Jon implemented a kill switch awhile ago inside the sandbox where if the inter-process TCP connection was down, the Sandbox would exit. I don’t want you to spend much more time on this after the memory dumps, but I do wonder what those sandbox’s TCP sessions look like (how many listening connections are open). Process explorer should show them in the process properties.

Ok, these jobs are all 1 slave = 1job, no concurrency. The ram dumps are fairly large, and with no direct internet connection, ill need assistance from our IT dept to upload them to the ticket i opened (if i can upload it there).

Here are a few screenshots of the tcp connections:

Capture2.PNG

Capture.PNG

This is the actively rendering sandbox:
Capture3.PNG

Memory dumps uploaded!

Hello,

Any news regarding this bug?

Thank you,
Liviu

Privacy | Site terms | Cookie preferences