Maya Vray GPU Affinity

Hello,

I downloaded Deadline a few days ago. My goal was to build a small Farm with two nodes. I want to use this system for GPU rendering with Maya and Vray. I figured out that if you switch on GPU Affinity it will not work because it multiplies the amount of vram you use for the Scene with the Amount of Graphic Cards. So if the scene needs 2.5 GB of vram it needs 10 GB with Affinity. Is that true on your side or is it a Bug?

I use:

  • Deadline Client Version: 10.0.28.2 Release (31a4a2e50)
  • Vray for Maya Next Update 1
  • Maya 2018 SP6

Cheers
Tobias

Hey @Tobias_Rosli

I think this is because Deadline’s Maya plugin doesn’t yet support GPU affinity for V-Ray

Hi Kavi,

thanks for the info, I spoke with one of the Support Team and he told me that, and also that he started a Internal Request for implementing this Feature. I hope it comes soon. It would increase my render power. It’s already 1 Month ago…

Hello, so I tought I might ask after a year how far you guys are with implementing GPU affinity for Vray and Maya?

Cheers!

Hi, thanks for sharing the current situation. Hope that you’ll find some good solutions for the problems you’re facing right now.

Cheers!

Ok, here is my WIP implementation of V-Ray GPU Affinity.
I have done only rudimentary testing, so please give it a try and let me know if anything does not behave as expected.

It uses the same basic code and UI as the Redshift implementation, with just a bit of V-Ray specific code to set the VRAY_GPU_PLATFORMS environment variable to the correct GPU indices.

VRayGPUAffinity_WIP_20201022.zip (117.1 KB)

The ZIP file contains 3 updated script files you need to deploy as follows:

  • MayaBatch.py goes into the Repository/Plugins/MayaBatch/ folder - this is the integration plugin
  • MayaSubmission.py goes into the Repository/Scripts/Submission/ folder - this is the Monitor Submitter.
  • SubmitMayaToDeadline.mel goes into Repository/Submission/Maya/Main/ folder - this is the integrated Maya submitter

Please BACK UP the original versions of the files before replacing them!

To test, I submitted a V-Ray GPU scene from Maya using the integrated submitter set to frames 1 to 8, 4 Concurrent tasks, 1 GPU Per Task. I run this on AWS using a g4dn.12xlarge instance which has 4 x T4 GPUs. The result was 4 Tasks rendered in parallel, each using only 1 GPU.

I repeated the test with different combinations of Concurrent Tasks and GPUs per Task using both the Monitor Submitter and the Integrated Submitter.

I also submitted a job with Selected GPUs, entering the indices by hand, e.g. “0,2” to render on the first and third GPUs with 1 Concurrent Task. As expected, the Task rendered on only the two specified GPUs.

Unfortunately, V-Ray indexes the devices in the log in consecutive order, so 0,2 reports as Device 0 and Device 1. But the correct physical GPUs end up rendering, so it seems to be working as expected.

Note that if you try to render more concurrent tasks than there are GPUs, some tasks will use the GPUs Per Task value, and the excess ones will render on all GPUs. For example, 6 Concurrent Tasks, 1 GPU Per Task will render Tasks 0,1,2,3 on GPUs 0,1,2,3, while Tasks 4 and 5 will both render on all 4 GPUs. This is As Designed.

wow, cool, I’ll test it in the next 2-3 Days and give you Feedback.

Cheers!

Hi, I tried to use it but got some problems/questions

thats the value in the VRAY_GPU_PLATFORMS env variable:
nvidia cuda geforce rtx 2080 ti gpu index0;nvidia cuda geforce rtx 2080 ti gpu index1;

This are the submitter settings:

The log from Deadline is in the adeadline_log.zip (6.3 KB) ttachment

somehow, still both gpus are initialized for one job.
Am I missing something? Or am I supposed to create workers per gpu and GPU affinity override?

I replaced the Files like you said. I also restarted the Computer.

these are very strange lines in the log:
2020-10-26 16:20:57: 0: STDOUT: [2020/Oct/26|16:20:57] V-Ray: Device[0]: GeForce RTX 2080 Ti (WDDM mode) has compute capability 7.5. PCI Bus ID: 0000:0A:00.0
2020-10-26 16:20:57: 1: STDOUT: [2020/Oct/26|16:20:57] V-Ray: Device[0]: GeForce RTX 2080 Ti (WDDM mode) has compute capability 7.5. PCI Bus ID: 0000:42:00.0
2020-10-26 16:20:57: 0: STDOUT: [2020/Oct/26|16:20:57] V-Ray: Device[1]: GeForce RTX 2080 Ti (WDDM mode) has compute capability 7.5. PCI Bus ID: 0000:42:00.0

it detected twice gpu 0 and once gpu 1

Cheers

This appears to be a Worker log. I need to see the TASK log.
In this log, it has index 0: and 1: showing two threads were running:

This is thread 0 (first task):

2020-10-26 16:20:57:  0: STDOUT: [2020/Oct/26|16:20:57] V-Ray: Device[0]: GeForce RTX 2080 Ti (WDDM mode) has compute capability 7.5. PCI Bus ID: 0000:0A:00.0
2020-10-26 16:20:57:  0: STDOUT: [2020/Oct/26|16:20:57] V-Ray: Device[1]: GeForce RTX 2080 Ti (WDDM mode) has compute capability 7.5. PCI Bus ID: 0000:42:00.0

This is thread 1 (second task):

2020-10-26 16:20:57:  1: STDOUT: [2020/Oct/26|16:20:57] V-Ray: Device[0]: GeForce RTX 2080 Ti (WDDM mode) has compute capability 7.5. PCI Bus ID: 0000:42:00.0

So for some reason one task is rendering on one GPU, the other on both.

The most important thing missing is the log line where the Environment variables are being set. There is no sign of any environment variables being set.

Please post the individual logs of the two tasks processed at the same time by the same Worker. The environment variables are being set before the MayaBatch process is even started, and your log does not show that, as it happened before the place you copied from.

Here is an example of what my task log looks like:

2020-10-23 02:49:20: 0: INFO: Rendering with Maya Version 2018.0
2020-10-23 02:49:20: 0: INFO: Setting VRAY_GPU_PLATFORMS environment variable to 0 for this session
2020-10-23 02:49:20: 0: INFO: Setting Process Environment Variable VRAY_GPU_PLATFORMS to 0

Note that this overrides the actual ENV variable you can see on your OS level - even if you see all those GPU IDs listed in the environment, we set a temporary value for the process being launched (MayaBatch) to obscure the system-wide settings only while Deadline is rendering.

So all you would have to do is set the GPU per Task to 1, set the Concurrent Tasks to 2, and render. It sounds like you did that, but your results were surprising. So let’s look at the two individual Task logs to see what is being reported there…

(Select a Task, right-click, View Task Reports…, click a render log to open, then save to disk using the first icon in the toolbar).

Repeat for the next task.

Hi, in the attachment you find 3 frames from the task log.

I rendered 3 frames. After that I let the others fail. strange thing is that when I did that another log entry appeard in frame task 2 but this frame was already done with rendering.
You’ll find it in the zip.

I am running on Deadline 10.1.9.2 is that a problem?

cheers!

tasklog.zip (27.9 KB)

Thanks for the logs. Something is indeed not working in your environment.

  • I can see the env. variable being set to 0 in the log of Frame 1 rendered on thread 0. It still renders on both devices, which is wrong.
  • The log of Frame 2 being rendered on thread 1 shows a CUDA Error and a crash. The env. variable is being properly set to 1. We don’t know if it would have rendered on one or two GPUs.
  • The log of Frame 3 rendered on thread 1 shows the env. variable set to 1, and it renders on one GPU as it should. Since the scene did not previously load and render Frame 2 properly due to the CUDA crash, MayaBatch was reloaded and this time it worked right.
  • The log of Frame 2 being re-rendered at the same time on thread 0 shows no env. variables being set, and it uses 2 GPUs. This is likely because MayaBatch was already launched on thread 0 to render Frame 1, and it rendered incorrectly on 2 GPUs there, so the incorrect behavior from the first log persisted.

Questions:

  • Does the crash always occur?
  • If not, can you send me logs from 4 frames rendered with 1 GPU per Task, 2 Concurrent Tasks that do not contain a crash? I want to know what the behavior would be if frames 1,2 and 3,4 were rendered together.

I retested on my machine with 1 GPU Per Task, 4 Concurrent Tasks, and all frames rendered on 1 GPU each as expected. Frames 1,2,3,4 were rendered on the same Worker together and had the Env. variable set to 0,1,2 and 3 respectively. The frames 5,6,7,8 were rendered in another go, and the logs don’t show the Env. variable being set, because Maya stays loaded in memory and just moves to a different frame.

In the Job Properties, there is an option to reload the plugin between tasks. This would reload Maya and start everything from scratch, including the loading of the scene, and setting the env. variables of the MayaBatch process. This of course makes the rendering a bit slower.

I decided to test this option too, and I had the env. variable print in the log of all frames, including the later ones. I would love to know what happens on your system if you check the “Reload Plugin Between Tasks” in the Job Properties > General section.

Privacy | Site terms | Cookie preferences