AWS Thinkbox Discussion Forums

GPU per task (Redshift)

I know it’s been brought up before, and I saw in the 10.0.9.2 release notes that GPU settings was now applied through ENV variables. I was hoping it would fix the GPU per task issue.

I still see that if one job has been submitted with a GPU per task of 1, it becomes a sticky setting.
Next job submitted, if left at the default GPU per task 0, it’ll only use one GPU and leave the rest idle.
This is for Houdini and Redshift 3.0 in our case.

Is this a Thinkbox or a Redshift issue? Would be nice to have it fixed so we don’t waste render hours. The intention is that any job submitted with the default GPU affinity 0 utilizes all available GPUs, right? Earlier, the workaround was to set the RS preferences file to read-only, but is it now an env variable that’s hanging around after the job completes?

It’s just in very simple frames it might be much faster to do GPU affinity 1 and one task per GPU, but gain is all lost when the next heavy job leaves multiple GPUs idle if you don’t babysit the servers :smile:

Before we get in to troubleshooting and duplicating the issue, did you grab the new submitters or are you using the older version?

They don’t get auto-upgraded with the client unfortunately.

Oh, thanks! I’ll update to make sure. I probably did not do that.

So updated the submitter, and it didn’t fix anything. I submitted one job with 1 GPU per task, then back to 1 concurrent task, 0 on GPUs per task, and it still just utilized one GPU. On both Linux and Windows.

linux_gpu_load

So much for the easy solution!

Right then, could you either share the logs (here or in a support ticket) or look for lines like:

2020-04-27 22:33:10:  0: STDOUT: No GPUs were selected in the command line, using all devices
...
2020-04-27 22:33:10:  0: STDOUT: Overriding GPU devices due to REDSHIFT_GPUDEVICES (0,2)

to see if the environment variables are being picked up properly?

Incidentally, the reason we switched from using a command line flag to the environment variables is the command line flag would change a setting in the preferences file. Which could affect other jobs.

Not seeing any of that, but here’s one of the logs.
Submitted with GPUs per task 0, after a previous job ran with 1 GPU per task.
Job_2020-09-02_23-25-19_5f50702f1caec909c6f3e906.zip (3.4 KB)

There is a line in there that reads:

Unable to locate 'CallDeadlineCommand.py', please ensure your Deadline ROP installation is correct.

Did you update the ROP as well? That might explain why changes aren’t going through.

Thanks Justin,

that should be because the Houdini Submitter itself isn’t installed on the render slave. Just the Deadline Client.
The workstation, that does have the submitter installed, is having the same issue though, so I don’t think it’s the cause.

To double check, I just reinstalled the submitter. I recreated the Deadline ROP, resubmitted one job with GPUs per Task 1, Concurrent tasks 2. This runs as expected.
Afterwards, I submitted a new job with the default settings (GPUs per task 0, concorrent tasks 1)
Here is an updated log from the workstation rendering the frame, still just utilizing one GPU on the second job.

Are you not able to replicate?

Job_2020-09-03_09-00-26_5f50f6fecc269c8ab04deb1e.zip (3.4 KB)

I haven’t duplicated the issue on my end.

But I have found the merge request associated with that release note, and the only change was to the Redshift standalone plugin. It’s setting REDSHIFT_GPUDEVICES and a search in our codebase shows that’s also how 3dsMax is doing it. I’ve also heard that Maya is also happy - but haven’t seen it with my own eyes.

So the Houdini plugin wasn’t changed - so there shouldn’t be an environment variable hanging around, and your work-around should still work.

So let’s call this a Thinkbox issue and I’ll create the developer issue to save you some time spent babysitting servers. Sorry about the run-around, I should checked the code first!

Thanks, I appreciate it! :slight_smile:
I’ll be patient and hope it gets resolved soon!

1 Like

Was this ever fixed? We think we’re running into it…

Privacy | Site terms | Cookie preferences