JobPreload.py not cleaned up after job?

seth.lippman · September 4, 2013, 5:47pm

Version 6.1.0.52433
Windows 7

            The new Beta is working much better, and does seems to be refreshing the JobPreload files more reliably. I did just again run into the error where a Draft job was picking up a JobPreload File from a Maya Batch Job that had previously run on that machine. (See picture attached). I confirmed with the DataOps team that the slave was running the latest beta version. Is there a reason why these files are not named uniquely per-plugin, or that the temp files are not copied to a unique folder name within the slave plugin path? i.e. : C:\Users\temrender\AppData\Local\Thinkbox\Deadline6\slave\RENDER227\plugins\<jobID>. It seems like that would make cleanup easier, but also prevent situations where more than one process is running on a slave at any given time?

When I manually deleted the jobPreload File from the slave and requed, the Draft job completed. The Draft plugin does not have a JobPreload file – is it maybe putting them both in the same folder on the slave, because MayaBatch launched Draft? Even so – Draft was complaing out a syntax error in the file, (JobPreLoad) that ran just fine under the MayaBatch Plugin…

=======================================================
Error

Error in StartJob: Job preload script “C:\Users\temrender\AppData\Local\Thinkbox\Deadline6\slave\RENDER227\plugins\JobPreLoad.py”: Python Error: IOError : (Python.Runtime.PythonException)
Stack Trace:
[’ File “none”, line 60, in main\n’, ’ File “Y:/pipeline/repo/fac/python/common\rgh\environ\configRghEnviron.py”, line 156, in init\n self.createMayaEnvDict()\n’, ’ File “Y:/pipeline/repo/fac/python/common\rgh\environ\configRghEnviron.py”, line 183, in createMayaEnvDict\n f = open(r’%s\maya%s\modules\mtoa.mod’ % (self.extensionsPath,self.mayaVersion), “r”)\n’]
(System.Exception)
at FranticX.Scripting.PythonNetScriptEngine.a(Exception A_0)
at FranticX.Scripting.PythonNetScriptEngine.CallFunction(String functionName, PyObject[] args)
at Deadline.Scripting.DeadlineScriptManager.CallFunction(String scopeName, String functionName, PyObject[] args)
at Deadline.Plugins.ScriptPlugin.d(String A_0)
at Deadline.Plugins.ScriptPlugin.d(String A_0)
at Deadline.Plugins.ScriptPlugin.StartJob(Job job, String& outMessage, AbortLevel& abortLevel)

Thanks for any help!

rrussell · September 4, 2013, 6:57pm

Hey Seth,

Thanks for reporting this. After seeing this behavior, it would definitely be safer to put them in a unique folder as you’ve suggested. We do have cleanup code in place already, but that seems to be failing here for some reason. We may want to consider this for the local job folder as well. We’ll put it on the todo list and look into it.

Thanks!

Ryan

anon70742899 · September 5, 2013, 8:13pm

Is there any possibility that we might be able to be of assistance in determining why the existing cleanup code isn’t doing what it’s supposed to do?

Edit: Is the cleaning up happening at the end of a job or the start of a job? It’s possible if it runs at the end of a job that a slave crash or some other abnormal termination might interfere, but if it runs at the start (before loading plugins etc. into that temp directory) then it might be more effective?

rrussell · September 5, 2013, 8:25pm

It is done at the start of a new job. My guess is that a rogue process (like a crashed slave) is holding a lock on one or more files, so the cleanup fails.

In our internal working version, we’ve already changed the local job and plugin file syncing to use a subfolder with the job’s ID as the name, which should prevent this from happening in the future. In addition, we still clean up the root jobsData and plugin folders before starting a new job, so even if some files are left behind for a job for some reason, they will still eventually get cleaned up.

We’ll be including this change in beta 4.

Cheers,

Ryan

anon83881742 · September 9, 2013, 7:09pm

Hey Russell,

Just to add a new wrinkle, I’m wondering if it has less to do with clean-up and more to do with the initial copy. We had some machines that were throwing errors. After looking at the slave, I realized that the JobPreload.py was missing on that machine. I copied the JobPreLoad from one of the working slaves to the problematic slaves and the errors stopped.

So, going back to Seth’s post, I wonder if the reason we’re getting old JobPreLoads is because it’s not copying the new one over (as opposed to not /cleaning/ up the job files). And of course, all this is intermittent making it harder to debug!

Jeremy

rrussell · September 9, 2013, 7:13pm

Hey Jeremy,

I guess the best thing to do at this point is for you guys to upgrade to beta 4 when it is released (should be later this week). In theory, the changes we’ve made should fix any potential deletion issues, so if it turns out that there are still copying issues after you’ve upgraded, we can tackle them then.

Cheers,

Ryan

rrussell · September 11, 2013, 8:17pm

Hey Jeremy,

Just a heads up that beta 4 was uploaded today.

Cheers,

Ryan

anon83881742 · September 12, 2013, 12:35am

We just grabbed Beta 4. We’ll be testing it out tomorrow. We’ll let you know how it goes.

Thanks Russell!

MikeOwen · September 12, 2013, 8:07am

@Ryan - totally random thought whilst reading this thread…have the API commands been updated to reflect this change in path to the job & plugin directories on the local slave?

RepositoryUtils.GetPluginsDirectory()
RepositoryUtils.GetPluginNames()
RepositoryUtils.GetSlaveDirectory()

rrussell · September 13, 2013, 1:05pm

Those commands actually get the repository paths, so they remain unchanged.

You’re probably thinking of these DeadlinePlugin functions:

GetPluginDirectory()
GetJobsDataDirectory()

And yes, they have been updated.

Cheers,

Ryan

JobPreload.py not cleaned up after job?

======================================================= Error

=======================================================
Error