Optimizing for multiple GPU generations

antoinedurr · April 26, 2021, 10:07pm

Here’s a question to the group: say we have 3 or 4 generations of GPUs, e.g. 3080’s, 2080’s, 1080’s, and one previous generation, dealer’s choice How could one set up pools/groups/limits/etc. to cascade from fastest to slowest GPU? In other words, I first want to use up the fastest cards, an then the next fastest, and so on, and only get to the slowest ones if we’re really desperate!

Bobo · April 27, 2021, 1:02am

Due to its decentralized nature, Deadline does not allow for easy control over which Workers should do something earlier than others. If a Worker is free and allowed to consider a Job, it will do.

There are two possible approaches I can think of, but there might be more.

One would be to write an Event script which automates the decisions which jobs get which GPUs based on whatever logic you want to implement in Python code, and massage group assignments or allow/deny lists based on your needs. Obviously this is too generic of a description, and it would be rather complex to develop.

The other would be to create Resource Limits for every type of GPU. You would name each Limit based on the GPU model or performance, then switch the middle list mode to Allow List, so the Master Light becomes a Deny List. Then you move all machines that match the GPU model for that limit to the middle Allow list. Repeat the same for every GPU type until you have 3 or 4 Resource Limits. At this point, you can ask your artists to submit their Jobs with the fast GPUs Limit checked, which will allow those jobs to use the fast GPUs, and will disallow the use of the slow ones. If you want a job to use the slower GPUs, either submit the Job with these Limits already tagged, or modify the Job Properties to add multiple Limits so it can pick up several generations of GPUs.

You could also combine the two approaches by writing an Event script that adds and removes those same Limits according to the Jobs’ needs.