AWS Thinkbox Discussion Forums

Problem with GPU assignments for redshift?

Hello,
I’ve noticed some weird behavior while rendering and figured to check out here if there is an issue with GPU assignments or I’m missing something.
I have setup with 2 slaves per render node, each slave have assigned 2 GPUs with override GPU affinity from slave property. Each render node have 4 GPUs.
So in short
render node 1 4xGPU
- slave_A GPU0 and GPU2
- slave_B GPU1 and GPU3
And same goes for all other slaves.
Now if I start rendering with 2 concurrent tasks and 1 GPU from submitter it looks like each slave is actually render 2 tasks but using same GPU for both tasks and 2nd GPU assigned to slave is idle.
Here is one example how that looks in deadeline and how I noticed it at first place. Image renderweird and renderweird2.
mirkoj-VIII is single 1080ti and other slaves are 2x GPUs, some are 1080ti, TitanX Maxwell and some 1070.
But still in both cases single 1080ti render same speed or in another case twice as fast as 2 GPUs in slave.

Also looking at log checking two running concurrent tasks from same slave it does say in both:
INFO: This Slave is overriding its GPU affinity, so the following GPUs will be used by RedShift: 0

So concurrently running 2 tasks on slave with 2 GPUs assigned, I would expect that one would use GPU0 and another GPU2 as they are assigned but it seems that both concurrent tasks are using GPU0 and GPU2 is idling.

So now it is possible that either I didn’t submit properly or setup something else properly but would ask for help to try and chase the problem.
Is it one override being primary over another or submission settings are not setup properly for using 2 slaves with GPU affinity, or after all I didn’t read log properly and it is even maybe case of simply network bottle necking all other machines? All render nodes are in one room on same network reading from NAS with 4GBit/sec aggregation and ther eare 6 render nodes and home comp with single GPU is reading alone from another NAS.

It is a bit messy but le tme know if more details are needed or test or whatever. Thank you!
renderweird2.JPG
renderweird.JPG

I wonder if there is some issues with the code between the per-Slave and per-task GPU assignment.

Each Deadline plugin has to implement this itself, so the code in MayaBatch to control GPU assignment is going to be different than Redshift standalone.

Which plugin did you see this with? Also, can you archive the job and send it along? I’ll be able to see what your submission options were. Note that any submitted scene files will get put into the same archive, so it may make sense to re-submit and re-fail the job so you don’t send along unnecessary files (or you can carve it out of the zip archive if you’re comfortable with that).

Privacy | Site terms | Cookie preferences