Multiple GPU's & Multiple Slaves,

anthonygelatka · September 30, 2014, 3:23pm

Hi All

I have an 8x GPU box with 8x Slave instances running.

Submitting a job with the selection of 1 GPU per task over the 8 slave instances and it only utilises 1 GPU for all 8 slaves.

Set it to 0 GPU per task then it submits over all the cards.

I’m trying to get each task to utilise only 1 GPU to test the most efficient method of render.

Using Maya 2015 ext1 with RedShift v1.0.58 on Windows 8.1.

You can select in the rendering menu which GPU to use when in Maya but this will mean only one GPU is used across submission.

I want ‘Slave-Instance1’ to use ‘GPU1’ and so on.

Is there a way to do this with Deadline 7? (I’m using beta3)

rrussell · September 30, 2014, 4:47pm

You could modify the Maya plugin and add some code to check the slave name, and use a specific GPU depending on the slave name. For example, this is the current code in MayaBatch.py:

            gpusPerTask = self.GetIntegerPluginInfoEntryWithDefault( "RedshiftGPUsPerTask", 0 )
            if gpusPerTask > 0:
                gpus = []
                for i in range((self.GetThreadNumber() * gpusPerTask), (self.GetThreadNumber() * gpusPerTask) + gpusPerTask):
                    gpus.append(str(i))
                
                # GPU array is in melscript format. For example: {0,1}
                return 'redshiftSelectCudaDevices({' + ",".join(gpus) + '});'

You could do something like this instead:

            gpus = ""
            thisSlave = self.deadlinePlugin.GetSlaveName().lower()
            if thisSlave = "slave-01":
                gpus = "0"
            elif thisSlave = "slave-02":
                gpus = "1"
            elif thisSlave = "slave-03":
                gpus = "2"
            elif thisSlave = "slave-04":
                gpus = "3"
            elif:
                # etc...
                
            if gpus != "":
                return 'redshiftSelectCudaDevices({' + gpus + '});'

Cheers,
Ryan

anthonygelatka · October 3, 2014, 9:43am

Not sure how this code works?

I’m guessing the GetSlaveName needs to have the same slave name as the slave instance

so if I said “GPUslave”, “GPUslave-02” I would have to specifiy that in the file, this therefore wouldn’t work if I had two seperate GPU machines.

Also does the gpus command not allocate a number of gpus? so in this example slave-01 gets 0 gpus but slave-04 gets 3 gpus?.

I don’t seem to get a decent division across the GPU’s, it seems to be splitting the job as the GPU’s don’t max out, they look like they are splitting the job rather than rendering independantly?

Any help with this greatly appreciated

thanks

Ant

rrussell · October 3, 2014, 2:15pm

The example code is assigning one GPU to each slave. The redshiftSelectCudaDevices({’ + gpus + '}); command takes a list, and we’re just passing a single item in that list. The previous code (redshiftSelectCudaDevices({’ + “,”.join(gpus) + '});) was taking the gpus array and converting it to a comma-separated list to pass to the same command.

If you had two separate GPU machines, perhaps you could name them in a way that indicates which GPU that slave should use. For example, let’s say you have these 2 machines:

machine-00 (4 gpus)
machine-01 (2 gpus)

You then run 4 slaves on the first and 2 on the second, named like this:

machine-00-gpu-0
machine-00-gpu-1
machine-00-gpu-2
machine-00-gpu-3
machine-01-gpu-0
machine-01-gpu-1

Then your code could just look for the postfix and assign the GPU based on the number on the end:

            gpus = ""
            thisSlave = self.deadlinePlugin.GetSlaveName().lower()
            if thisSlave.find( "-gpu-" ) != -1:
                gpus = thisSlave[-1:]
                
            if gpus != "":
                return 'redshiftSelectCudaDevices({' + gpus + '});'

We would like to better support gpus in the future (ie: it would be nice for the Deadline slave to know how many GPUs it has), but hopefully this helps you out for now.

Cheers,
Ryan

anthonygelatka · October 7, 2014, 1:27pm

I don’t think this is working as I’m running into VRAM issues.

I should be able to launch 8 slaves, each processing a job on one of the 8 GPU’s

If I enable 2 slaves and set 2 GPU’s to be used the 20min job goes through in seconds with

2014-10-07 13:41:45: 0: STDOUT: Error:  [Redshift] Redshift cannot operate with less than 256MB of free VRAM. Frame rendering aborted. If you're using multiple GPUs, please ensure that SLI is disabled in the NVidia control panel (use the option 'Disable multi-GPU mode')

(This mode is disabled)

I set Mayabatch.py to look like this…

   if thisSlave = "gpubox":
            gpus = "0"
        elif thisSlave = "gpubox-instance-02":
            gpus = "1"
        elif thisSlave = "gpubox-instance-03":
            gpus = "2"
        elif thisSlave = "gpubox-instance-04":
            gpus = "3"

If I set it to 1 slave with 0 GPUs (aka ALL) then the scene renders out ok in a good time, if I then change to any other setting it continues to render across all 8 GPU’s

If i look in the redshiftRenderer.xml or run
render.exe -r redshift
it shows there is a switch for
-gpu in array CUDA devices to use

Is this what you are using to specify the GPU being used?

Not sure why I get this issue whether it’s redshift or deadline, maybe I need to test with another GPU renderer

rrussell · October 7, 2014, 1:36pm

Hmm… maybe we should just test with the MayaCmd plugin first, since that just performs a command line render, and you’ll be able to see the command line that Deadline is passing to Maya. You can use the MayaCmd plugin by disabling the Use MayaBatch option when submitting the job.

In MayaCmd.py, look for this section of code:

                # If the number of gpus per task is set, then need to calculate the gpus to use.
                gpusPerTask = self.GetIntegerPluginInfoEntryWithDefault( "RedshiftGPUsPerTask", 0 )
                if gpusPerTask > 0:
                    gpus = []
                    for i in range((self.GetThreadNumber() * gpusPerTask), (self.GetThreadNumber() * gpusPerTask) + gpusPerTask):
                        gpus.append(str(i))
                    
                    # GPU array is in melscript format. For example: {0,1}
                    rendererArguments += " -gpu {" + ",".join(gpus) + "}"

Replace it with this:

                gpus = ""
                if thisSlave = "gpubox":
                    gpus = "0"
                elif thisSlave = "gpubox-instance-02":
                    gpus = "1"
                elif thisSlave = "gpubox-instance-03":
                    gpus = "2"
                elif thisSlave = "gpubox-instance-04":
                    gpus = "3"

                if gpus != "":
                    rendererArguments += " -gpu {" + gpus + "}"

The code we’re setting in the MayaBatch plugin should essentially be doing the same thing, but it will be easier to debug what’s going on if we start with the MayaCmd plugin. After rendering, if you still have the same problem, post the render logs and we can check the command line arguments to see if anything is being set incorrectly.

Cheers,
Ryan

anthonygelatka · October 7, 2014, 2:24pm

This commandline renders on 1st GPU

render -r redshift -gpu {0} filename.ma

This renders on 1st and last GPU

render -r redshift -gpu {0,7} filename.ma

I need a way to send the -gpu {0} to slave instance 0 when I want to render 1 GPU for one slave

or
-gpu{0,1,2,3} to slave 1
and
-gpu{4,5,6,7} to slave 2

This should split the render across two slave and so on

Is this possible for deadline to do? I don’t see the commandline output in the task logs or slave logs so I’m not sure what deadline is doing. I know if I get several commandline boxes up it does render out.

I’ll try out the other code and see if that works

Thanks Ryan

ant

anthonygelatka · October 7, 2014, 2:38pm

yup, same thing,

the code reads as

[code]

ANT ADDED THE BELOW SECTION

			gpus = ""
			thisSlave = self.deadlinePlugin.GetSlaveName().lower()
			if thisSlave = "box":
            gpus = "0"
			elif thisSlave = "box-GPU_2":
            gpus = "1"
			elif thisSlave = "box-GPU_3":
            gpus = "2"
			elif thisSlave = "box-GPU_4":
            gpus = "3"
			elif thisSlave = "box-GPU_5":
            gpus = "4"
			elif thisSlave = "box-GPU_6":
            gpus = "5"
			elif thisSlave = "box-GPU_7":
            gpus = "6"
			elif thisSlave = "box-GPU_8":
            gpus = "7"
			if gpus != "":
				rendererArguments += " -gpu {" + gpus + "}"
				
			##### ANT HASHED THIS OUT GOT TEST
			#return 'redshiftSelectCudaDevices({' + gpus + '});'
			# If the number of gpus per task is set, then need to calculate the gpus to use.
            #gpusPerTask = self.GetIntegerPluginInfoEntryWithDefault( "RedshiftGPUsPerTask", 0 )
            #if gpusPerTask > 0:
            #    gpus = []
            #    for i in range((self.GetThreadNumber() * gpusPerTask), (self.GetThreadNumber() * gpusPerTask) + gpusPerTask):
            #        gpus.append(str(i))
            #    
            #    # GPU array is in melscript format. For example: {0,1}
            #    rendererArguments += " -gpu {" + ",".join(gpus) + "}"
            ################END OF HASH OUT[/code]

Still renders across all GPU’s if I have use 1 or 2 GPU’s selected in deadline.

For the moment the best way seems to be submitting to 1 slave using 0 GPU’s

anthonygelatka · October 7, 2014, 2:44pm

redshift3d.com/forums/viewthread/1713/

There is a way to find out the GPU’s from the redshift file on a machine here, maybe this can be used to determine the GPUs on a set machine
C:\ProgramData\Redshift\preferences.xml

rrussell · October 7, 2014, 3:05pm

Can you post the Deadline logs that show it passing the different gpu numbers to the -gpu option? Preferably a log for each slave? It could very well be a bug in the code I posted (since I haven’t tested this myself), and the logs should help confirm if this is the case or not.

anthonygelatka · October 7, 2014, 3:38pm

It doesn’t give me any task report?? Which is a bit odd?

I can see the slave log because I’ve shut the machine off and sent it somewhere else so no more testing till tomorrow