Event Plugin for checking Auxiliary Files?

Benjamin_A_Robins · February 17, 2024, 2:53pm

I am wondering if would be possible to have an Event Plugin look for a jobs auxiliary file and if it finds it, resume the job?

We use the RCS and it can take several minutes (sometimes more) for our auxiliary files to sync to the on-prem environment before rendering can successfully start. This currently requires manual intervention in order to resume jobs.

I would be surprised if we are the first outfit to come up against this so I am wondering if such a script exists already? If not, any pointers would be super helpful.

Thank you!

Justin_B · February 19, 2024, 5:40pm

I’m surprised the job gets released before the aux file transfer is done. But I think the assumption in that feature is that aux files are just little config files. Either way, the move would be to add your list of auxiliary files as asset dependencies in an OnJobSubmitted event so the job doesn’t start until the transfer is done.

You’ll have to pull the jobid to figure out the path to where the aux files are, but that’s on the job object.

If you’ve never done any Deadline scripting check out this page to get oriented.

Benjamin_A_Robins · February 20, 2024, 1:31pm

Hi Justin, thanks for the reply.

I think one of the reasons the job gets released is maybe due to the fact that the repro is looking for aux files on a network location that our remote guys have a mirrored setup for i.e. R:\deadline\jobs

Files saved here will be synced using a cloud sync app up from the local remote artists environment and thus down onto the network drive with same mapping that the repro is directly connected to and looking for the aux files from

Question, how would Deadline know when the aux files are synced and ready to begin using asset dependencies as technically the Job has finished submitting on the remote artists side when the temp aux file has been saved to R:\deadline\jobs\64f759d5d110cf28de34670b\file.max

Thank you

Justin_B · February 20, 2024, 2:39pm

Oh! That makes much more sense. I thought it was just the RCS that was making files available for the render nodes.

Question, how would Deadline know when the aux files are synced and ready to begin using asset dependencies as technically the Job has finished submitting on the remote artists side when the temp aux file has been saved to R:\deadline\jobs\64f759d5d110cf28de34670b\file.max

If there’s an asset dependency for R:\deadline\jobs\64f759d5d110cf28de34670b\file.max on a job, whenever a Worker does a pending job scan it’ll see if R:\deadline\jobs\64f759d5d110cf28de34670b\file.max is accessible for that Worker. If it is the Worker will be able to dequeue tasks from that job. If not it’ll skip it and move along.

That’s a check done per-Worker, so if you’ve got some machines that are slower to download they won’t start the job till that check passes.

Benjamin_A_Robins · February 20, 2024, 3:21pm

I am trying this manually to begin with and not getting very far and I have a hunch.

“This job will resume when its dependencies complete.” Is what the docs say this is looking for, but the dependent file is not a Job that Completes, or Fails, etc, its a file that either exists or doesnt.

In my tests, I have a Nuke job with an Asset Dependency to the Nuke aux file (that is there sat waiting) and the Job is stuck on Pending no matter what?

We were looking at adding the following to our Nuke Jobs submission JobInfo file;

ResumeOnCompleteDependencies=true
RequiredAssets=nukepath.nk

But until I can get it working as expected on the manual Job I dont think it will do much for us.

Any advice would be appreciated.

Thanks

Justin_B · February 20, 2024, 3:53pm

Instead of doing it manually, how’s the job behave if you add the asset dependency manually?

Adds the AssetDependency0 key to my jobinfo:

And it gets picked up right away.

“This job will resume when its dependencies complete.” Is what the docs say this is looking for, but the dependent file is not a Job that Completes, or Fails, etc, its a file that either exists or doesnt.

That’s from the script dependencies section, the job willl only kick off once that script has completed. So if you were doing your asset transfer with a script instead of whatever you’ve got at the moment that’s where you’d put them. The issue is that get evaluated per job not per worker like the assets are.

I’d fiddle around with what the submitters create before trying to write your own jobinfo, just because it’s simpler than working backwards from the docs in most cases. And it’s usually the thing I do first.

Benjamin_A_Robins · February 20, 2024, 4:15pm

Ah ok, this seems to be working now. What was catching me off is the pop up to ask if I wanted to mark the Job as Pending after setting the Asset Dependency. Clicking no seems to be the one we want here? The Job doesn’t want to technically be marked as Pending in the Queue. It wants to be Queued and will pick up when the Assets in the Decency list can be found by the Worker?

Justin_B · February 20, 2024, 4:18pm

Oh you want yes on that popup. “Pending” jobs have dependencies that need to be checked, “Queued” jobs are ready for a Worker to start dequeuing tasks.

Hitting ‘no’ is going to skip that asset dependency check. Try right clicking the job that’s sitting in the pending state and choose ‘Find Render Candidates’ to see if there’s a callout there if there’s nothing in the worker logs.

The logs will be on the machine in one of these locations:
Windows: C:\ProgramData\Thinkbox\Deadline10\logs
Linux: /var/log/Thinkbox/Deadline10
Mac OS X: /Library/Logs/Thinkbox/Deadline10

Benjamin_A_Robins · February 20, 2024, 4:41pm

Ok I seem to have moved backward a step or two.

When the Job is set to Pending, and there is an Asset Dependency and there are available appropriate Workers, nothing happens.

The logs for the Worker also dont seem to show much movement however, the Message printed in the ‘Job Render Candidates’ pop up states ‘The limit was not found.’ on the Idle Worker in Question?

Benjamin_A_Robins · February 21, 2024, 1:27pm

Just an update on this. This is definitely not behaving as you outline for me.

We have our submissions add the correct info to the JobInfo file;

AssetDependency0=R:\deadline\nukejobs\1L73iIgc\_VCR_v24.nk

The asset dependency is set, the Job is Pending, the Workers have access to the file and the Job and its just stuck…

Benjamin_A_Robins · February 21, 2024, 3:29pm

Could it be something to do with the fact that the file path in Question;

R:\deadline\nukejobs\1L73iIgc\_VCR_v24.nk

Is a Windows path but our Repro and primary Pulse is running on Linux? My thinking here is that the primary Pulse does not know what R:\deadline\nukejobs\1L73iIgc\_VCR_v24.nk is.

All the Workers in the farm are Windows based and have access to R:\deadline\nukejobs\1L73iIgc\_VCR_v24.nk

Just to add. If I run Tools\Perform Pending Job Scan manually in Monitor, the Jobs are released without issue.

Thanks

Benjamin_A_Robins · February 21, 2024, 5:02pm

An update.

I finally have it working, I think! As I suspected, the issue was caused by the Repro (Linux) not knowing what the R: drive was. The fix was to add a Global Mapping rule…

I can only assume that it is the Primary Pulse (Linux) that is running the Pending Job Scan and not the Windows based Workers?

Appreciate all the help getting this one sorted.

Cheers

Justin_B · February 22, 2024, 6:09pm

My mistake, I’ve had it in my head that asset dependencies were checked by the Worker. That’s because the farm I interact with the most is the one that runs on my laptop without an RCS, so I’ve been seeing the ‘Pending Job Scan’ in the application logs for ages. Sorry for the confusion, I didn’t think hard enough in that one.

As long as pulse is running yes. You can change who and what runs House Cleaning (which contains the Pending Job Scan) in the Monitor under Tools → Configure Repository Options → House Cleaning. In there you could have Workers run the scan if Pulse goes down.