Maintenance Job with saved output frame from each slave?

Hi all,

I’ve got a farm with 150 nodes and the IT team rolled out some render plugins which seems to have not deployed correctly to all slaves. I need a simple way to find out which machines are working properly and which don’t in order to create my groups accordingly.

I was thinking to use a maintenance job to run a test frame through each machine which uses a certain plugin, the problem is that the maintenance jobs render the same frame once on all machines and then overwrite the output frame (if it was set to be saved). The plugin doesn’t generate an error in the monitor when it is not present, it just omits some elements in the scene during the render so I can’t use this method for that reason.

The only other way I can think of is to render an animation which uses the plugin and then look through the saved frames to see which slaves have not rendered correctly and need attention.

Is there a way to configure this like a maintenance job so that each slave in the farm only renders one frame of this animation and then saves the output as a sequence? Or is there a way to send it as a maintenance job but have each slave save its output frame with a unique filename, i.e. with the machine name appended?

Thanks for any help or pointers to alternatives as to how this can be solved.

What if you set the output path of the Job to a local folder on C:\ ? Each Slave would then save out the same frame on it’s local disk. You would then need to collect and rename this frame from each Slave. I’d use a one-off Python script that.

That sounds quite complicated. And I would probably be hitting all sorts of permissions problems while trying to collect the files, and still have the same problem that half the machines would be switched off by the time I get round to collecting files because the users are out of the office or on holiday or what not.

There must be some other way surely. It would be great if there was some sort of wildcard that can be added to the output file name and then gets replaced by the current machine name, or some other unique identifier…

Hi,

If we rewind for a second, can I get some more specifics from you here? What plugins? What application(s)? What OS platform? I don’t think you need to actually do any actual “rendering” here…

Let’s say, it’s 3dsMax or Maya. You could submit a MAXScript or MELScript job to the respective plugin and in the script, it could carry out checks for whatever plugin(s) need checking and then report back to a system of your choosing. Perhaps, if it’s simpler than that, a “Python” job could be sent to Deadline, which just executes a Python Script that does whatever, to check for the existence of say, certain plugin files?

Any of the above jobs could easily be re-submitted as a Maintenance Job, by simply right-click in Monitor the job, “re-submit job” and tick “Maintenance” in the dialog.

We would need more specific info of what you are trying to achieve here to best advise any further. However, there is quite a few different ways to skin that cat. :wink:

Hi Mike,

we’re using Windows 7, 3ds Max 2014 with vray and the plugins (for 3ds Max) in question are Forest Pack and RailClone by iToo Software.

If the Slave picking up a job hasn’t got the plugin but the job requires it there are no errors generated through Monitor. It renders everything in the scene except the objects which were created using the 3ds Max plugin.

Hence my logic “if I can’t see the errors in Monitor I could at least render the same frame on each machine like in a maintenance job and manually look through the 150 images to see which have a problem”. If only I could save them out easily.

I appreciate this might not be the best or most straightforward solution but that’s why I ask here. If it can be done with Python I would need a few pointers as to how that needs to be set up.

Thanks!

Right, I would submit a MAXScript job to Deadline running with the 3dsMax plugin in Deadline and try and write a MAXScript to test for the existence of either of these plugins. Do we just want to test for their existence, a specific version or do we need to do more here to verify their are ok?

Do you know MAXScript? Do you know what level of MAXScript access these 2 plugins have? I’d have to do a bit of research here as I’m not too hot on either of these plugins.

We don’t currently have a “Submit as Maintenance Job” checkbox in the SMTD -> MaxScripts section, so I might do something about that in a bit. For the time being, you can submit the MAXScript job once (as suspended) and then simply, right-click and re-submit it as a Maintenance Job.

Hi Mike, thanks for your time looking into this for me!

Just testing for their existence would probably be good enough in this instance, I can see versions becoming useful if we ever upgrade the plugin though… However i personally do not know MAXScript well enough to know how this would be set up or what level of MAXScript access the plugins have, if you could give me some fairly easy-to-follow instructions on how this would work and how it’s done I would be really grateful! :slight_smile:

ok, if you just want to test for the existence of a file, we can just submit a simple “Python” - “Maintenance Job” to Deadline using the code below:

[code]import os.path
import platform

pluginFile = “C:\Program Files\Autodesk\3ds Max 2016\plugins\Ephere.Gui.dll”
localMachine = platform.node()

if os.path.exists( pluginFile ):
print ( “pluginFile: %s OK on machine: %s” % ( pluginFile, localMachine ) )
else:
raise Exception ( “pluginFile: %s missing on machine: %s” % ( pluginFile, localMachine ) )[/code]

Save the above code to a file called: “CheckPluginExists.py” and in Deadline Monitor, submit a “Python” job to Deadline. Ensure you use the drop-down list to select job type as: “Maintenance” job. Submit the job and each task of the job will only ever execute on a specific machine in your farm. For each task log report in your “Job Log Reports” panel, it will show either a “Log Report” or an “Error Report”. For those “Error Reports” sort by the column called: “Slave Name” for a list of your machines missing the above: “Ephere.Gui.dll” plugin in 3dsMax 2016. Add code to the above example, to check for other files.

Hey, thanks for this. I’ve had to dig into the settings a bit and configure the Python plugin as we had never used it before.

Am I right in doing this via the Monitor > Plugin Settings and pointing Python (which version should I use??) to “C:\Program Files\Thinkbox\Deadline7\bin\dpython.exe” ?

So I have modified the script to

[code]import os.path
import platform

pluginFile = “C:\Program Files\Autodesk\3ds Max Design 2014\plugins\ForestPackPro.dlo”
localMachine = platform.node()

if os.path.exists( pluginFile ):
print ( “pluginFile: %s OK on machine: %s” % ( pluginFile, localMachine ) )
else:
raise Exception ( “pluginFile: %s missing on machine: %s” % ( pluginFile, localMachine ) )
[/code]

It seems to run on some slaves (task completes) but the overall job quickly fails before it has had a chance to run on each slave due to number of errors being generated. How do I overcome this? Can I mark a slave bad after one failure and how do I stop the job from re-attempting bad slaves for this job only?

Typically, studios already have 1 or more versions of Python installed somewhere in their pipeline, so we leave it up to the discretion of the studio to configure Python exe paths as they see fit. You can also use our shipping dpython.exe as well, which technically speaking is v2.7, so what you have done is valid.

What’s the error message? The script is designed to FAIL by “raising an exception”, so you should get the error reports. This is all a bit of a hack trick really, but depending on your job settings -> “Failure Detection” in your repo options, you could limit any slave to only generate say, a max of “3” errors per task, so it would execute this script super fast, either complete and move on, or fail 3 times and then move on. However, as this is a Maintenance job, it won’t move on, but just fail 3 times on a specific slave and then that’s it for this particular machine, as no other task is allowed to execute on this particular machine.

docs.thinkboxsoftware.com/produc … -detection

Check out the settings called: “Mark a Slave as bad after it has generated this many errors for a job in a row” & “Mark a task as failed after it has generated this many errors” and maybe, set them both to “3”.

Truth be told, a true IT based “Software Configuration Management” solution is best placed for this kind of system checking / pipeline configuration, such as SaltStack, Puppet, SCCM, etc. We do provide event API hooks to play nice with these systems as well as explained here:
docs.thinkboxsoftware.com/produc … ntegration

as well as software deployment here:
docs.thinkboxsoftware.com/produc … yment.html

but in both cases, it’s best to use the tool fit for purpose here.

Just to update you, it seems to be doing the job now. Many thanks.
Slaves that complete the task have the plugins and the ones that don’t get put in the blacklist of the job, so I can use that to adjust the groups accordingly.

Yes I agree, something like SCCM would be the better solution but I don’t have sufficient knowledge or even access to it to do it myself through that. That’s one for the IT folks to sort out :slight_smile: