AWS Thinkbox Discussion Forums

GPU usage column

We are doing a lot of Redshift renders. It would be great to have a column in Deadline Monitor where you can see the GPU usage of a slave like the CPU usage column.

I’d love to see this on Deadline

You can use something like MSI Afterburner to remotely monitor the cards on Windows

I quite like Netdata for a visual view on linux nodes

It should be straightforward to poll nvidia-smi on both systems though , would be handy to see gpu usage, vram usage and temperatres etc.

Should be able to use this command to figure out if cards are NVLinked, and maybe shift jobs running out of vram onto bigger cards/ nvlinks Workers.

I know in the past the issue was that GPUs aren’t consistent in reporting their usage, so that column would be a lot of work to add.

I have seen folks make use of OnSlaveInfoUpdatedCallback to read GPU usage and add that data to the Extra Info fields. I don’t know what the delay would look like but it should be alright.

is it possible to have even a rough implementation of it?

do you have the callback? event script for this?

Just a guess as I’ve never seen anyone’s implementation, but it probably looks like:

from Deadline.Events import *

def GetDeadlineEventListener():
    return GPUUsage()

def CleanupDeadlineEventListener( deadlinePlugin ):
    deadlinePlugin.Cleanup()

class GPUUsage (DeadlineEventListener):

    def __init__( self ):
        # Set up the event callbacks here
        self.OnSlaveInfoUpdatedCallback += self.OnSlaveInfoUpdated

    def Cleanup( self ):
        del self.OnSlaveInfoUpdatedCallback

    def OnSlaveInfoUpdatedCallback( self, slaveName, slaveInfo ):
        slaveSettings = RepositoryUtils.GetSlaveSettings(slaveName, True)

        gpuUsage = gpu.getusage()#replace with an actual call to the GPU's api. Maybe do some detection + switching based on the GPU type?

        slaveSettings.SlaveExtraInfo0 = gpuUsage

        RepositoryUtils.SaveSlaveSettings(slaveSettings)

Obviously missing the most important bit where the GPU usage gets fetched from the GPU but that’s hopefully not painful? Hopefully!

You’d also want to rename the Worker’s Extra Info 0 under Configure Repository->Worker Settings->Extra Properties to GPU Usage so it’s clear what’s being presented.

Let

2 Likes
Privacy | Site terms | Cookie preferences