Connect Facillities / Send job to another Deadline repo

umbe · October 25, 2021, 11:06am

Dead forum,

we need help and need some bits of advice/directions on how to approach connecting different repositories to different locations/facilities.

We want to connect our facilities and render jobs in a remote facility. In other words, we want to share our render capabilities with another facility.

These are our thoughts so far:
Our wish is, right-clicking a submitted job on the deadline monitor )on our local repo) and clicking “render job @ facility A/B/C”. The script then packages the scene file with the corresponding assets (textures, geo’ etc.) as zip-file, connects to the remote facility, sends the package, and auto remotely executes our job on the ‘remote’ farm. The (remote) rendered output is written in a synchronized folder, where it gets automatically sent back to the first facility.

Is this the right approach to connect our repositories in different facilities?
Is there a better way?

Please share some knowledge you might have or experienced along your way.
Thanks for helping out in advance.
Marc

RicardoMusch · October 25, 2021, 4:00pm

Are these separate companies or multiple locations of the same?
Because in the latter case I would look into creating one repo and maybe hosting it on the cloud so that both facilities can access the same render nodes?

If not, it is definately achievable withs ome custom work as deadlinecommand has functionality to connect to another repo.

umbe · October 25, 2021, 8:16pm

Hi Ricardo,
Thanks for your reply.

It’s the same company, with four different facilities.

Regarding to your idea with the cloud repo:
Would that include that licensing and environments vars, render plugins are handled by the cloud repo and for every submitted job, the requiered assets, are transferred to the cloud and then downloaded again to a facility (remotely connected) slave in order to render?
(surely not, what am I missing)?

I have another idea, which could work as a workaround.

–
I found this ‘Transfer Job’ (deadline) help article . It gets unclear to me on the point where I have to select the New Repository.
Here it states:

This is the path to the remote repository that the original Job will be transferred to. Note that the Slaves that the transfer Job will be running on must be able to see this path in order to transfer the original Job to the new repository.

Do I understand that correctly - that I could create (custom) TransferJob, e.g. with a preJob, which connects the machine to the remote repository for the transferring process, then triggering another preJob, which collects the textures/assets, remaps the paths and zips all into one file, returning this file to the TransferJob?

Does the connection to the remote repository need to be a mapped network path?

regards,
Marc

mois.moshev · October 26, 2021, 7:10am

I don’t know if it fits your case, but connecting all facilities to a single repo will be much simpler. E.g. the repo lives in one location (I think it can be replicated?) and all machines connect to it via vpn or whatever.
With shared file systems it will be easy. This way synchronization, licenses, etc. are all solved.
Might not be what you want, though.
They you can have groups for each facility, allowing you to target one of them.

Stefan · October 26, 2021, 9:35am

I have customized the “Transfer Job” plugin in the past to move jobs from an inhouse repository to a cloud repository (custom built in pre-AWS times).

Basically, the “Transfer Job” is a plugin in Deadline (like Nuke, Houdini,…) so it submits a job to your local farm with some parameters and a single task. Then, a worker will grab that task and run the “Transfer Job” plugin script (which you can view/edit to see what it does). Basically, it will create copies of your job submission files for the job you want to transfer. These will contain the target repository as a parameter so when deadlinecommand is subsequently called to submit this job, it will submit the job as a new job to the other farm.

The way I edited the transfer job Python script was to check which frames had already been rendered locally and made it adjust the frame range accordingly so frames wouldn’t get rendered twice. But in its original form, the process is as follows:

“Transfer Job Submission” → select job and remote repository → transfer job sent to local farm → local worker runs task → submits to remote repository

If you edit that plugin, you can make it submit any other jobs (to your own farm or to the remote farm) to collect assets etc…

RicardoMusch · October 26, 2021, 12:33pm

To make a cloud repo you would install the Repository server on a cloud host like AWS, make sure it’s accessible for your machines in each location and they can then connect to that via the RCS client (or via VPN, site-to-site vpn/AWS VPC etc.)

The render machines can be on cloud or local, it’s up to you but that would at least give you 1 farm management console and a whole lot of flexibility in the future to burst into the cloud or burst into your various locations and use the renderpower in those for jobs from other locations.

umbe · October 29, 2021, 7:11am

Thanks for your inspiration!

@RicardoMusch: to the cloud repo construct; when all facilities sync their project content, and I send a job to the farm, can I precheck for every job that every assets needed for the rendering is available in that location? to make sure a rendering will not crash on the other end? whats the best approach for that?

@Stefan this is exactly what I was thinking about to do.
can you share some code snippets on how you edited your transferjob with the frames and transferring over to another repo? (how to connect to the other farm? is it a connection with network pathmapping to the other repository? how to achieve that?)

@mois.moshev
is it possible when I synchronize the folder of the deadline repo, to all facilities while managing/assigning all workers in ‘facility groups’ this works automatically within deadline? whats about the mongo-db server? or did you mean, to have each facilities a single repo, connection to a central one? or did you mean the same approach as ricardo does with the cloud repo?

to add to my interest, we want to connect the facilities, but we also want to render on the (was) cloud when it is necessary to scale.
so maybe going for the cloud repo and connecting each facility / each worker would the best option, isn’t it?
the custom transfer job, to send a job to the aws cloud clients is necessary either way.

Stefan · November 22, 2021, 5:03pm

Hi umbe,
here’s a snippt to modify the frame range before the job gets transferred. I’ve tried to reduce the snippet to what is necessary so it’s up to you to implement this properly in JobTransfer.py’s RenderTasks function. I also need to add that we did not use AWS so maybe there are totally different ways to transfer jobs to AWS repositories. I don’t have any experience with that.

The way we used our cloud repository was to set up a remote file server with exactly the same path names so no mapping was necessary. We’ve created a VPN between both locations and used rsync to upload all required files before a job was able to render on the remote workers. The remote repository (the directory) wasn’t actually remote. Only its workers were. The repository was on our in-house fileserver and we’ve used a remote connection server to connect both our Deadline Monitors as well as the remote workers to this.

This was a solution to use a remote render farm from a single office. Your requirements (and thus the best solution to sync files or transfer jobs) might differ.

Our workflow was to submit the first, last and a frame in the middle to our local repository. Once the result was to our satisfaction, we sent the job to the remote workers.

JobTransfer.py will create so-called submission files (job_info ini and plugin_info ini) for the job that will be transferred. You need to read the “Frames=…” value and then write back the modified value before these files get submitted.

# this snippt works as part of the JobTransfer plugin's RenderTasks method
# after the CreateSubmissionInfoFiles call.
# transferJob is the job object of the job that will be transferred.
# framelist is the frame range parameter from the job info ini file.
# (for example "1,5,20").
# This snippet will do the following things:
# 1) fill up that range (in the 2nd example above that would be 1-20)
# 2) remove completed tasks
# 3) create a new string for the transferred job (for example "2-3,6-19" if
#    frames 1,5 and 20 have already been rendered in my example)

framelist = list(FrameUtils.Parse(framelist, False))           # False = don't re-order those frames
self.LogInfo("current range: {} ({} frames)".format(value, len(framelist)))
try:
    _first = min(*framelist)
    _last = max(*framelist)
except TypeError:
    # just one frame instead of frame list
    _first = _last = framelist[0]
self.LogInfo("extending range to cover {} to {}".format(_first, _last))
for i in range(_first, _last + 1):
    if i not in framelist:
        framelist.append(i)

self.LogInfo("removing completed frames")
jobtasks = RepositoryUtils.GetJobTasks(transferJob, False)  # False = don't invalidate cache
for task in [t for t in jobtasks.TaskCollectionTasks if t.TaskStatus == "Completed"]:
    for framenumber in task.TaskFrameList:
        framelist.remove(framenumber)
if len(framelist) == 0:
    self.LogInfo("all frames are completed, submitting full range instead")
    framelist = range(_first, _last + 1)
newlist = FrameUtils.ToFrameString(framelist)
self.LogInfo("new render frame range: {}".format(newlist))
# new frame list must now replace the "Frames=...." line in the created job info ini