AWS Thinkbox Discussion Forums

Control memory usage and avoid empty completed jobs

Hi,

Currently, when a job asks for too much memory, the render server tends to hang and become useless (Windows 2016 Server), the problem is that Deadline slave does not close and some jobs run on the hanging machine, completing without any output.

My plan is to customize the Houdini/Mantra plugin to force failure after 95% usage of memory before the machine run completely out of memory then create an event that will check if the output frames exist and if not, fail the task instead of keeping it completed.

But that kind of feels like patching an issue that may happen somewhere else, is there maybe a way to do this out of the box ? Or improve the slave timeout or swapping detection ?

We are using Deadline 9.0.6.1 and Houdini 16.5.405 currently.

Thanks !
Chris

That might work… The big problem is that we’re not sure how quickly the machine would run out of memory (say if the program asked to allocate a block of 8GBs). That’s probably unlikely though.

The other piece is that .net (the framework we build Deadline on top of) is a managed memory framework, so it’s difficult for us to gracefully catch and recover from an out of memory situation… That means that parts of Deadline can crash when pressure is high. I think the hanging your seeing is the scheduler thread failing, which puts the Slave into a zombie state where the Info thread keeps reporting the Slave is alive but it won’t pick up more work.

I know we added some safety checks in a somewhat recent build of Deadline 10 that will detect internal thread crashes and shut down the Slave when this happens. I’ll go ask if we restart it in that situation. It’s not out-of-memory checker, but it may work to keep the Slave happy. Others have written OnSlaveInfoUpdated() scripts that do a memory check every 7 seconds (when the Slave updates its info in the DB)…

Does going the scripting route for now make sense? It should be equivalent to it living in the core code with the benefit of it being extended.

Hi Edwin,

thanks for you answer,

Switching to Deadline 10 is not gonna be that simple for us and will need to be done later on. For now, the scripting route makes sense for us as long as I could check that we didn’t miss a feature that already exists in Deadline.

Thanks !
Chris

Privacy | Site terms | Cookie preferences