I recently discovered the fact that our render machines need almost 1,5 times more time to finish a rendering compared to our workstations.
Farm machines are equiped with two 4090ies, workstations with a single one.
Jobs are sent as it should be:
2 tasks per machine with 1 gpu per task
After some research here and there and some tips to view the log files of the render jobs and tasks, I could see that for simutaniously tasks on one machine both tasks use the same GPU. How can I solve this or is this only an issue with the log? I need to identify the problem soon as there are a lot of renderjobs need to be done in upcoming weeks.
run two ‘Workers’ on the node and assign them 1 GPU each under the GPU affinity options.
The option to launch another instance is disabled and buried awkwardly in the user groups
Tools > Super/Power User > Manage User Groups
Then ‘Launch new Named Worker’ under Menu Items
should work this way, remember it needs more RAM than VRAM, if the 4090 has 24Gb of VRAM and you have 2 of these you need more than 48Gb free RAM as Windows (and other apps) will also use some RAM.
I’d benchmark running 1 job on 2x cards and 2x jobs on 1 card each, it may be easier/quicker running single tasks across 2 cards depending on the job being processed
After running an overnight test, I could figure out that GPU 0 doesn’t mean anything.
In the first task per render job and worker the log file lists which GPU is used over the whole job including PCI-bus and it even tells device 1/2 or 2/2.
So I can clearly say that all GPUs are running, only the master machine (where deadline repository is running) is around 12,5% slower.