AWS Thinkbox Discussion Forums

Render machines don't use all GPUs

I recently discovered the fact that our render machines need almost 1,5 times more time to finish a rendering compared to our workstations.

Farm machines are equiped with two 4090ies, workstations with a single one.

Jobs are sent as it should be:
2 tasks per machine with 1 gpu per task

After some research here and there and some tips to view the log files of the render jobs and tasks, I could see that for simutaniously tasks on one machine both tasks use the same GPU. How can I solve this or is this only an issue with the log? I need to identify the problem soon as there are a lot of renderjobs need to be done in upcoming weeks.

Many thanks in advance!
Rob

run two ‘Workers’ on the node and assign them 1 GPU each under the GPU affinity options.

The option to launch another instance is disabled and buried awkwardly in the user groups
Tools > Super/Power User > Manage User Groups
image
Then ‘Launch new Named Worker’ under Menu Items


You can then right click a Worker and launch another one

Then go to each worker properties
image
And select a GPU for each machine

Which render engine you are using?

1 Like

We use Cinema 4D as the host application and redshift as render engine.

should work this way, remember it needs more RAM than VRAM, if the 4090 has 24Gb of VRAM and you have 2 of these you need more than 48Gb free RAM as Windows (and other apps) will also use some RAM.

I’d benchmark running 1 job on 2x cards and 2x jobs on 1 card each, it may be easier/quicker running single tasks across 2 cards depending on the job being processed

Does this setup still work with 1 C4D/Redshift license per machine?

As you can see, log files still tell the same GPU is used on different tasks.

how do you have the affinity set up on each?
like this?
RENMASTER01 - GPU0
RENMASTER01-Second - GPU1

The licenses should be per host, so you can run as many instances as you like on the same box and pull 1 license of C4D CLR & RS

Setup is exactly as you say.

After running an overnight test, I could figure out that GPU 0 doesn’t mean anything.
In the first task per render job and worker the log file lists which GPU is used over the whole job including PCI-bus and it even tells device 1/2 or 2/2.
So I can clearly say that all GPUs are running, only the master machine (where deadline repository is running) is around 12,5% slower.

Privacy | Site terms | Cookie preferences