Advice on Dynamic CPU/GPU Affinity Allocating

Stephen_Scollay · March 9, 2023, 6:07am

Heya, I am very new to Deadline and am assisting in a system to dynamically change CPU and GPU affinities (Which are currently used to split computers into 2 different split workers). The intention is that because RAM is shared across one computer (unlike CPUs and GPUs), when a job needs more RAM it will wait for the jobs to be complete on two split workers, then re-combine the two workers back to one, run the job and split apart afterwards. I have no idea if this is possible but I’m just wondering if anyone could offer any pointers on ways this may be done or obvious pitfalls that I will come across in my ignorance. Apologies for the vague and open ended question but any assistance would be greatly appreciated.
Cheers,
Stephen

Justin_B · March 10, 2023, 2:51pm

That’s totally possible with manual intervention, but Deadline doesn’t have tools to make it possible automatically.

If you’re only ever looking to have two or one worker on a given machine I’d instead have three Workers, two that are halves of the machine and one that is the full machine. Then disable/enable the Workers instead of splitting them.

I would make sure that concurrent tasks aren’t a better fit compared to multiple Workers on a single machine as that’ll allow multiple single-CPU tasks to be run in parallel. And if the DCC you’re running has GPU affinity you may be able to associate each task with it’s own GPU. Redshift standalone can do this, where each task gets its own GPU device.

Stephen_Scollay · March 12, 2023, 10:59pm

Hi Justin,

Thanks very much for getting back to me and for the advice on this. The multiple workers is pretty much what we have decided to go with exactly for the reasons you stated above so it’s great to hear that we’re headed in a good direction. Our nodes tend to have both CPU and GPU affinity but I will ask about Redshift Standalone as that may open up some other doors.

Anyhow, thanks so much,
Stephen