Frame dependent tasks to render frames on same machine

anon20323228 · September 5, 2015, 4:46pm

Hye Guys,
Say we have two jobs A and B which are frame dependent on each other. Job A’s first frame is rendered on machine “XYZ”, is there any way to tell Deadline to render the dependent frame of render job B also on the machine “XYZ”. We are evaluating Houdini integration in our pipeline and the time it takes in doing the network IO for accessing IFD’s when we split IFD and Mantra job is a bit on the higher side just due to the amout of data its trying to access. Any help on the same would be really appreciated.

~Abhijeet

eamsler · September 7, 2015, 2:47pm

Hey Abhijeet,

So far there’s nothing in Deadline that would take action on that from a dependency angle.

I’m assuming the IFD export is just due to expensive Houdini licensing as you could just avoid using IFDs altogether otherwise. The only thing I can think of is to create a custom plugin to handle both creating the IFDs, then starting Mantra. That’s not going to be very easy.

I’ll start an e-mail internally to see what the other guys think. The problem may be too Houdini specific to build something into the core.

Coulter · September 7, 2015, 3:36pm

Hi Abhijeet,

This is an interesting use case. One way to handle this would be to white list both Job A and Job B to the same machine. This would likely require an OnJobSubmitted event script to round-robin the assigned white list machine. Clearly this has the downside of not distributing these Jobs across the farm. However, if there are many such Jobs, the farm’s overall utilization would likely remain high. So the decision whether or not to use this approach is a question of net efficiency. Is the diminished farm utilization more than made up for by the increased efficiency of not having to transfer files around?

A alternative approach would be to write a specialized Job Plugin that treats both the parent and dependent Jobs as a single Job with two file inputs. A “Task” of this new Job type would then process a frame from the first file followed the corresponding frame from the second file, and would also do any intermediate file cleanup that is needed. The question is whether the cost or effort to write this custom plugin is worth it in the long run.

Since we are looking at other enhancements to Job Scheduling, I have logged an enhancement request to our wishlist for this use case.

LaszloSebo · September 8, 2015, 10:34pm

Hi there,

This might also be possible with dynamic limit handling.

Basically, the IFD generation is a quick process (~2 mins maybe), but is bound by expensive ‘engine’ licenses. To render the generated IFDs, it takes much longer (upwards to 2-3 hours), however you can simply use the cheaper mantra licenses.

Ideally, we could run this as a single job, but in a way that:

First stage of job (IFD) generation: uses ‘houdini_engine’ limit stub
Second stage of job (Mantra render): uses ‘houdini_mantra’ limit stub

That way, the job would pick up on N (where N is your engine license count) machines. As they enter the rendering phase, they give up their engine stub, so new machines can join this pool. Ultimately spinning up M machines processing the job (where M is your mantra/render license count).

For example:

Job starts:
Machine01: engine
Machine02: engine
Machine03: engine
Machine04: engine

Job midway through:

Machine01: render
Machine02: render
Machine03: render
Machine04: render
Machine05: render
Machine06: render
Machine07: render
Machine08: render
Machine09: render
Machine10: render
Machine11: render
Machine12: render
Machine13: engine
Machine14: engine
Machine15: engine
Machine16: engine

Job at the very end:

Machine23: render
Machine24: render
Machine25: render
Machine26: render
Machine27: render
Machine28: render
Machine29: render
Machine30: render

Is there any way to control limit usage from a job plugin manually? So we could implement the acquiry/release process for the mantra/engine licenses.

Coulter · September 8, 2015, 11:50pm

I’m fairly certain there are currently no API calls for consuming and releasing limit stubs, but I like the idea.

That said, what you are describing seems to be a slightly different problem than the original post. If the set of machines rendering is greater than the set of machines generating, then file transfer is implied. At that point, is it not just a matter of parent and dependent jobs each with separate license limits for the Job type?

In any case, notions such as Task affinity, data locality, and dynamic limits are included in our considerations for future Job Scheduling features.

LaszloSebo · September 9, 2015, 12:16am

In the ideal scenario:
The set of machines that can generate is smaller than the set of machines rendering, yes. The single task would do both tasks like so:

Machine A picks up task from deadline
Machine A acquires engine license
Machine A starts generating the IDF files (~2-5 mins)
Machine A is finished, releases the engine license (thus making it available for Machine B)
Machine A now starts rendering, for which it acquires a mantra license (~60-120 mins)
Machine A finishes rendering and returns the mantra license

There is no file transfer required, since machine A generates and then consumes the IDF files (engine -> mantra). Due to the engine portion taking much less time but also a much higher licensing fee, its preferred to NOT throttle the renders by doing this process with a single task with a single limit group.

We have looked into doing what you suggest (file transfer to central network location after the initial data is done, then another mantra job picks that up and renders), but it adds 30-120 mins per frame. The amount of data generated by the process is immense, and there is no need for that to go to the network then back, especially if the above described workflow worked. In fact, it could stay in ram if it was the same process doing both - with the ability to release limit stubs. This is basically how this has worked using qube for the TD we are working with to set this up.

The original suggestion of having 2 jobs, with file dependencies, and forcing the same machine to pick up the mantra job as the original engine job comes from this same logic, but i think that could get really messy - especially if you want the sim licenses to go onto another machine.

LaszloSebo · September 9, 2015, 12:17am

We basically need to solve this issue in the coming weeks, one way or another. If we could customize the scheduler / task dequeuing mechanism, we could tie that directly into the license server and we wouldn’t have to worry about acquiring / returning limit stubs. Is that a possibility?

rrussell · September 9, 2015, 12:34pm

You should be able to get close to the behavior you’re looking for with a custom hybrid of the Houdini/Mantra plugin and 2 Limits. The houdini_engine Limit would have its Release Progress % set to something like 1%, and the houdini_mantra Limit would be a normal Limit.

The custom plugin would run Houdini to do the export, and then Mantra to do the rendering, so it would all occur on the same machine. The custom plugin wouldn’t do any progress reporting during the export (to keep progress at 0%), which shouldn’t be a big deal if the export only takes 2-5 minutes. Then when it finishes the export, it can bump progress to 1% so that the houdini_engine Limit gets released. When the render starts, the progress handling can simply set progress to 1% if it’s less than 1%.

I think the only limitation is that the slave will hold the houdini_mantra Limit during the export, but it’s only for 2-5 minutes, and this is achievable without custom scheduling (which isn’t a possibility yet).

Would that work for you guys?

LaszloSebo · September 9, 2015, 5:09pm

I think this would work for us, thanks for the tip Ryan! I was not aware of the limit release % functionality.

Will keep you posted on how this worked.

cheers
laszlo

JakeS · September 24, 2019, 1:58pm

Hello,
What was the eventual solution? The custom plugin?
Are there any alternative solutions in more recent releases that anyone knows of?
Thanks,
-Jake