AWS Thinkbox Discussion Forums

deadline to monitor ram usage / cancel task if its too high

Would be great if the deadline slave could cancel the task if its ram usage is reaching the physical amount of ram in the machine. We now have a secondary process doing this, but that leaves ‘max disappeared’ messages in the log, instead of a nice ‘uses too much ram, bad luck’ which can then be handled appropriately.

cheers,
l

We can put this on the wishlist. Probably be good to make it a per job setting? Or maybe it should be a global setting? Or both?

What do you think?

I would prefer a global value as we would most likely not want any job to go over the physical ram (as that makes everything come to a grind, and in some cases crash the deadline slave application itself). Although, having both options would give the best flexibility.

The complexity comes when you have multiple slaves running parallel on a machine… Our current ‘slave kill’ logic kills secondary slaves first, primaries after a timeout (if ram usage is still high).

+1 for both global & per job options. Perhaps the global option could control whether it calculates on a per slave or all slave mode? Added complexity of concurrent tasks as well?

We have been talking about implementing a “slots” system in Deadline at some point, and it sounds it would be the proper solution here. The idea is that a slot is an arbitrary unit that can represent available RAM, cores, diskspace, etc. For example, a slot could be 1 core and 2 gigs of RAM. Each slave would have a slot count, and each job would indicate how many slots they require. For example, a Nuke job might only need 1 slot, and a 3D render might require 8 slots. The slaves would then try to fill up their slots as best as possible. If a job goes over its slot requirement, it could be failed, or it could increase the slot requirement as appropriate.

We don’t have an ETA for this feature at this time, but we feel that this system should handle the concurrency issue, and it would also allow for proper action other than simply canceling the task if it uses too much memory.

Cheers,
Ryan

I’m was looking for this exact functionality but can’t find anything in the docs. It’s 5 years later so did this feature fall by the way-side?

Privacy | Site terms | Cookie preferences