AWS Thinkbox Discussion Forums

Possible to change environment variable depending on GPU used?

Hey,

I have three slaves running on one machine. The trouble is that each slave renders using the same user preferences folder and crashes jobs that require different user preferences (or a different workgroup).

If softimage encounters one of these crashes it will then try to delete the user preferences or workgroup and then start up again - this would fix the problem if I only had one slave per machine. But because I have three slaves running on the machine, these particular files are “in use”, and the software isn’t able to recreate them, so I just get constant crashes or jobs running with the incorrect workgroup and rendering with different Redshift versions.

My solution is for each slave to set its own environment variable specifying a different location for the preferences folder, so none of the slaves clobber each other.

Eg
Slave running gpu 0 tasks use %USERDIR% = c:/users/folderA
Slave running gpu 1 tasks use %USERDIR% = c:/users/folderB
Slave running gpu 2 tasks use %USERDIR% = c:/users/folderC

I tried running each slave under a different user (which would also solve my problem) but that wasn’t possible…

Any ideas?

Thanks

We hit this already.

there are 2 ways to accomplish this asa job preload script:
docs.thinkboxsoftware.com/produc … preload-py

or as a an job start event:
docs.thinkboxsoftware.com/produc … nt-plug-in

the JobPreLoad.py is what we use.
that way you can query information about the current job and the slave the job is running on and configure the env before the plugin loads.

Can provide some examples if needed.

Cheers
kym

Hi Kwatts,

Thank you for the reply. The JobPreLoad one seems quite straight forward. In the example that you said you were willing to share, does it query which GPU or which Slave is running the particular task? I’d love to see that part in action.

I feel like you could do this:

job = deadlinePlugin.GetJob()
SlaveName = job.SlaveName

Cheers

This definitely sounds like we should roll it into the XSI plugin so everyone’d benefit.

What are the variables we should be adding?

Here’s an example from Maya that makes use of the GPU affinity for the Slaves:

        if self.OverrideGpuAffinity():
            overrideGPUs = self.GpuAffinity()
            if gpusPerTask == 0 and gpusSelectDevices != "":
                gpus = gpusSelectDevices.split( "," )
                notFoundGPUs = []
                for gpu in gpus:
                    if int( gpu ) in overrideGPUs:
                        resultGPUs.append( gpu )
                    else:
                        notFoundGPUs.append( gpu )

So, you can set your variables based on the DeadlinePlugin.GpuAffinity() values which I think is just a list of ints starting from 0 up to 16.

What are these magic variables so I can pass it over to the dev team as a feature request? Using a temporary directory for the prefs folder sounds like a good idea all around. GPU affinity or otherwise. It it actually “USERDIR”?

FYI. The SoftimageBatch plugin already supports GPU affinity, so if you want each Deadline Slave to use a specific GPU slot(s) only, then configure each Slave: docs.thinkboxsoftware.com/produc … u-affinity

BTW - We already switch the workgroup dynamically for each Slave when it picks up an XSI job, of course, per Slave.

Hmm, ok, yeah I could see it trying to change workgroup, but then it would still use an incorrect one - probably a Softimage issue nothing to do with Deadline.

Anyway, I would still like to change the %USERPROFILE% upon launch of a SoftimageBatch job, as I also have slaves running on peoples workstations and when they come in the next day all their workgroups are messed up. I don’t want to run the slave on another account, as I still want the user to retain control over it. So if I could set the %USERPROFILE%, that would make me happy :slight_smile:

To capture an env var at job submission time and then inject it into the XSI job, so that it is automatically applied to the render process, could be achieved with this event plugin: github.com/ThinkboxSoftware/Dea … onmentCopy

Would that work for you?

Hey,

Yeah, that should be fine.

I’ve been testing it this morning, using this:

[code]class CustomEnvironmentCopyListener (DeadlineEventListener):

def __init__(self):
    self.OnJobSubmittedCallback += self.OnJobSubmitted

def Cleanup(self):
    del self.OnJobSubmittedCallback

def OnJobSubmitted(self, job):
	if job.JobPlugin == "SoftimageBatch" or job.JobPlugin == "Softimage":
		# Set USERPROFILE to C:\Users\render - regardless of which user the slave has been started under
		key = "USERPROFILE"
		variable = "C:\Users\\render"

		# Set chosen variable to job
		self.LogInfo("Setting %s to %s" % (key, variable))
		job.SetJobEnvironmentKeyValue(key, variable)

		RepositoryUtils.SaveJob(job)

		self.LogInfo("On Job Submitted Event Plugin: Custom Environment Copy Finished")[/code]

And it correctly sets the variable to: “C:\Users\render”, but it also still sets the job the have the entire environment of my computer, which I didn’t expect or want…

Your event plugin looks good (assuming you removed the rest of the code in the previously provided GitHub example).

I would update this line to handle backslashes like this:

variable = "C:\Users\\render"

to:

variable = "C:/Users/render"

Do you have any other event plugins firing which are causing an issue here? You can control the “order” of event plugin execution via the “up”/“down” arrows in “Configure Events…” dialog under “Tools” in Monitor.

Do you know what other env vars are required to get the desired setup in your studio pipeline?

Ok, I’ll try that. Cheers.

Essentially, there is a file that Soft uses called: setenv.bat

Which does this (in one part of it):

set XSI_USERROOT=%USERPROFILE% goto DoneSetUserRoot ... :DoneSetUserRoot

Now when users open Softimage on their comp I want it to use their user dir, but when the Deadline Slave runs, I want it to use the render one (by modifying %USERPROFILE%)

…this is my second attempt at trying to get this to work, sorry haha

Could you show us a log of what is being set to be the wrong env var here? How are you confirming this is currently incorrect? We could put some print statements in the XSI plugin so we can see what env vars are actually present at certain points of the plugin process.

I am assuming you are not using the “Render As User” feature in Deadline to render the jobs are the user that submitted them? (As that would explain this situation). I also assume all your rendernodes are running as the “render” account with the exception of the artist workstations which are logged in as individual users. I have also assumed all machines are running in GUI and not running Deadline in service mode (no gui).

I’m also assuming you are NOT using this XSI feature either: “%XSI_BINDIR%\SiteDeploy.bat” to override anything else here.

I highly doubt this is the case, but a quick review of this: “setenv.bat” also shows that: “%USERNAME%” needs to be set as well to your user account if your OS is pre Windows Vista? Win XPx64?

A review of my local Win10 machine shows the following with the “set” command (I swapped my username for ‘render’):

APPDATA=C:/Users/render/AppData/Roaming HOMEPATH=/Users/render LOCALAPPDATA=C:/Users/render/AppData/Local TEMP=C:/Users/render/AppData/Local/Temp TMP=C:/Users/render/AppData/Local/Temp USERNAME=render USERPROFILE=C:/Users/render

We may need to set more env vars here.

I guess first thing, is to ascertain what is going wrong here. See attached for a temp, updated SoftimageBatch py plugin which will print out all env vars at different stages of our plugin executing.

SoftimageBatch.py.zip (6.34 KB)

Hey,

My bad! I just realised what was going on - the job I was using to test had these variables preset, so when I resubmitted the job it copied them over - when relaunching a fresh job, the %USERPROFILE% variable is indeed the only one being set. Sorry!

All your assumptions are correct btw and our machines are all windows 7 or 10.

It’s not really wrong behavior, I would expect a slave running as a particular user to use that users directory - I just wanted to override that and tell it to use another one.

I’m doing some more tests now with your modified plugin.

Damn. Ok, with that variable set I get this error and the tasks fail:

Could not find a part of the path 'C:\Users\render\AppData\Local\Thinkbox\Deadline8\slave\<*computername*>\plugins\59300c5bf2ba963d4c4fcb29\SoftimageBatch.param'.

So it’s partly working (as I can see it tried to use render) but like you said I suspect more variables will need to be changed.

Maybe setting all of these ones to render will do it:

USERPROFILE=C:/Users/render
HOMEPATH=\Users\render
USERNAME=render
LOCALAPPDATA=C:\Users\render\AppData\Local
APPDATA=C:\Users\render\AppData\Roaming

Unfortunately, I get the same error even after setting all of these:

APPDATA=C:/Users/render/AppData/Roaming HOMEPATH=/Users/render LOCALAPPDATA=C:/Users/render/AppData/Local TEMP=C:/Users/render/AppData/Local/Temp TMP=C:/Users/render/AppData/Local/Temp USERNAME=render USERPROFILE=C:/Users/render

Could not find a part of the path ‘C:\Users\render\AppData\Local\Thinkbox\Deadline8\slave<computername>\plugins\59301b9df2ba963d4c4fcb31\SoftimageBatch.param’.

Also, sorry I know I’ve gone off topic from my initial request

I think you might be causing Deadline itself some trouble here.

The Slave is running as the user account here right? I think the changes you’re making are causing the sandbox or something else to look for the render plugin data to be checked in the “render” user’s app data now. The main thing is that SoftImage.param is a Deadline file, so I think there’s some trouble here.

Another thought: Why not create a batch file that sets those things and directly calls XSIBatch.bat, then you can configure Deadline to run that batch file instead of XSIBatch.bat directly?

[Written before Edwin’s reply above got posted]
[See my EDIT post after reading this one]

ok, so I guess a choice needs to be made here now. Deadline copies various files locally including the current Deadline job’s “jobsData” & “plugin” directories, which of course includes the current job’s plugin files such as the *.param file. These directories and files are all copied down to local user space via %APPDATA%:

%LOCALAPPDATA%\Thinkbox\Deadline[VERSION]\slave\[SLAVENAME]

this explains why your getting error messages for missing “SoftimageBatch.param” file, etc.

Now you could configure the local client “deadline.ini” with the KEY:

SlaveDataRoot=C:\LocalSlaveData

as explained in the docs:
docs.thinkboxsoftware.com/produc … vedataroot

you could also automate this via Auto-Configure as this same setting is exposed here:
docs.thinkboxsoftware.com/produc … -ref-label
docs.thinkboxsoftware.com/produc … -ref-label

However, this is all a bit hacky for your purposes, because really, you should be using the “Render As User” system in Deadline:
docs.thinkboxsoftware.com/produc … r-security

So you would go into Repo Options and under Security, enable the setting: “Render Jobs As User” and then under your local settings via Monitor:
docs.thinkboxsoftware.com/produc … tings.html

you would need to enter the user account as: “render” and then the domain and password for this account:
docs.thinkboxsoftware.com/produc … r-settings

then ALL jobs submitted by your user account will be rendered as the user account: “render”. Obviously, this will fix your edge-case issue with XSI but will also cause any other Deadline jobs (ie: not just XSI) that your user account submits to also process as the user account: “render”, with the obvious consequence that this user account will be the ‘creator’ of any output files back onto your network filer.

Unfortunately, it will be a choice of continuing to see if we can hack and make your XSI edge-case rig work by side-stepping what “user” account is really running the process in the eyes of Windows OS or going down the official route of doing the Windows equivalent of “su/sudo” via the Deadline “Render As User” feature.

This is one of your times, where Linux/Mac OSX work so much better in this area.

EDIT: Edwin’s latest reply is actually easier than my long-winded explanation, although both would work. In Edwin’s reply, he is suggesting you replace the existing XSIBatch script file and get Deadline to call that, and inside that bat file, you hard-code the user account you wish to use, then you won’t break Deadline.

Alternatively, you will have to go down the “render as user” route.

Thank you everyone for helping me out and it was cool to see what DL can do Mike :slight_smile:

I ended up going with Edwin idea (more or less).

Now that’s solved. I’m going to go back to the original reason I started this topic.

Three slaves on one machine clobbering over each other - at least I thought that’s what it was…

So after learning a bit more about it I can see that upon launching SoftimageBatch each Slave does indeed try to launch a workgroup it just slightly messes it up for some reason. After doing some testing, it appears to mess it up regardless of whether I have one slave running per machine or three. Note: This doesn’t happen all the time, but when it does, the only way to fix it is to delete the C:\Users\render\Autodesk\Softimage_2015_R2-SP2 folder.

This is what we see happen: (I’ve marked the important lines with **–>)

2017-05-27 08:25:15: 0: INFO: Start Job **--> 2017-05-27 08:25:15: 0: INFO: Switching workgroup before rendering to \\<SERVER>\jobs\<JOB>\<WORKGROUP> 2017-05-27 08:25:15: 0: INFO: Rendering with Softimage version 13 2017-05-27 08:25:15: 0: INFO: Enforcing 64 bit build of Softimage 2017-05-27 08:25:17: 0: STDOUT: ======================================================= 2017-05-27 08:25:17: 0: STDOUT: Autodesk Softimage 13.2.162.0 2017-05-27 08:25:17: 0: STDOUT: ======================================================= 2017-05-27 08:25:19: 0: STDOUT: License information: using [Processing] 2017-05-27 08:25:20: 0: STDOUT: ' INFO : tb_PolyExtract has been loaded. 2017-05-27 08:25:20: 0: STDOUT: ' INFO : tb_CopyMaterial_v1 has been loaded. 2017-05-27 08:25:20: 0: STDOUT: ' INFO : tb_PasteMaterial_v1 has been loaded. 2017-05-27 08:25:20: 0: STDOUT: ' INFO : tb_SymmetrizeLeftSide has been loaded. 2017-05-27 08:25:20: 0: STDOUT: ' INFO : CopyToClipboard has been loaded. 2017-05-27 08:25:20: 0: STDOUT: ' INFO : PasteFromClipboard has been loaded. 2017-05-27 08:25:20: 0: STDOUT: ' INFO : mch_SelectionTools has been loaded. 2017-05-27 08:25:21: 0: STDOUT: ' INFO : ExocortexAlembicSoftimage1.1 - Exocortex Core Services for ExocortexAlembicSoftimage1.1 (v1.1) Initialized. 2017-05-27 08:25:21: 0: STDOUT: ' INFO : ExocortexAlembicSoftimage1.1 - Alembic: ------------------------------------------------------------------------------------------ 2017-05-27 08:25:21: 0: STDOUT: ' INFO : ExocortexAlembicSoftimage1.1 - Alembic: Build date: Dec 11 2014 11:38:24 2017-05-27 08:25:21: 0: STDOUT: ' INFO : ExocortexAlembicSoftimage1.1 - Alembic: OS version: 6.1.7601 (Service Pack 1) 0x100-0x1 2017-05-27 08:25:21: 0: STDOUT: ' INFO : ExocortexAlembicSoftimage1.1 - Alembic: Executable path: C:\Program Files\Autodesk\Softimage 2015 R2-SP2\Application\bin\XSIBATCH.exe **--> 2017-05-27 08:25:21: 0: STDOUT: ' INFO : ExocortexAlembicSDIFFERENT_1 - AlDIFFERENT_ path: \\<DIFFERENT_SERVER>\jobs\<DIFFERENT_JOB>\<DIFFERENT_WORKGROUP>\Application\Plugins\bin\nt-x86-64\Softimage2015ExocortexAlembic1.1.dll 2017-05-27 08:25:21: 0: STDOUT: ' INFO : ExocortexAlembicSoftimage1.1 - Alembic: ------------------------------------------------------------------------------------------ 2017-05-27 08:25:21: 0: STDOUT: ' INFO : ExocortexAlembicSoftimage1.1 - Alembic: PLUGIN loaded 2017-05-27 08:25:23: 0: STDOUT: ' INFO : [Redshift] Redshift for Softimage 2014 SP2 2017-05-27 08:25:23: 0: STDOUT: ' INFO : [Redshift] Version 2.0.86, Feb 24 2017 **--> 2017-05-27 08:25:24: 0: STDOUT: ' INFO : \\<DIFFERENT_SERVER>\jobs\<DIFFERENT_JOB>\<DIFFERENT_WORKGROUP>\Addons\emTopolizer2\Application\Plugins\

As you can see it should be using: \\jobs<JOB><WORKGROUP>

But it instead, out of nowhere starts referencing: \<DIFFERENT_SERVER>\jobs<DIFFERENT_JOB><DIFFERENT_WORKGROUP>

(I’ve obviously hidden the names of our servers!)

As a result it loads a different version of Redshift and we are left with render images which look quite different. In this case it should actually be using Redshift 2.0.93 not 2.0.86.

Is this a Softimage problem?

Hmmm…I’m not entirely sure what might be causing this. The “workgroup” does indeed look to be getting set correctly via the line here:

I’d be interested to see the uncensored full XSI render log and also the zipped contents of this folder via our private support ticket system to see if anything stands out here: “C:\Users\render\Autodesk\Softimage_2015_R2-SP2 folder” via: support.thinkboxsoftware.com/

It would be good to see a selection of logs which are both good and bad, so we can try and narrow down when it goes wrong or for what reason.

We can also look at extracting the exact render command that Deadline executes and try running that in a shell. If it still goes wrong, then we know its outside of Deadline there is an issue here. First step, would be to disable “Use Batch Plugin” in XSI, so we can capture the simple CLI string that gets executed.

Hey,

Ok, cool. I’ve made this ticket: #616957

If you need any further info, let me know.

Cheers

Privacy | Site terms | Cookie preferences