Building a dynamic Task chunk-size

Hello, my brothers/sisters/others in arms of the well-documented Deadline-API,

I am writing to you because I have a dream.

I am pretty new to Deadline so I am looking for some ideas or hints to solve my idea:

A Job that dynamically changes the frames a task has to calculate when one node is done but others are still rendering.
E.g I have 5 slaves (everyone performing a task) which are rendering with a chunk-size of 5. Slave 1,2,3,4 are done. Slave 5 is still rendering at frame 21 due to the bad luck of some stuff that happens in the frames.
I want slave 1,2,3,4 to split the remaining 4 frames (22,23,24,25) between each other so the Job is done quicker. So the chunk-size kind of dynamically gets reduced.

I am thankful for any answer that might lead up some workflows and/or scripting ideas.

Whoever answers, feel kissed.

Yours,
Heinrich

1 Like

+1 for this idea, there’s quite a few issues with it though.

If you were rendering large frame ranges you’d have to calculate the number of available slaves/workers which of course could change while this is being calculated. 1000 frames / by 10 free nodes could be 100 free nodes in 5 more minutes, do you keep recalculating?

Also if the scene is very large it could impact the render times by pulling the scene over the network and loading it which could be longer than letting the loaded scene continue to render.

Ideally you’ll be submitting the job as a batch render and doing 1 frame at a time, or using something like V-Ray offload distributed rendering which does this kind of thing reducing bucket size as the image completes

If i understand you correctly, your idea is setting the chunk-size of a slave rendering a task to 1, offering a limited amount of slaves so it automatically keeps track of it?

yeeees and no… due to our internal workflow a chunk-size has a reason and I cant just change it to 1. As you said it should add a lot of network traffic.

Thats why my Idea is to dynamically send stuff. So traffic is only produced under specific circumstances.

Hello!

Can you elaborate a bit more on the use case?

Does the above statement mean that the task is hanging indefinitely? Or these frames are just taking longer because they are more complex then the previous frames? Is this an issue with specific plugins, which ones are you using? If your plugin supports batch rendering you can likely just submit a frame per task, because the scene is going to be held in memory.

Unfortunately there is not an event that we can trigger on when a task has run too long to perform a functionality like this. There are no task based events in general. There would likely need to be a task timeout event where you could trigger a script.

You can append frames to an existing jobs frame list: AppendJobFrameRange

What you could do is trigger on an event like House Cleaning which runs every 60 seconds or via an alternative method of your choosing to check all of the jobs for tasks that are taking too long. You would need to decide what determines if a frame redistributed needs to happen. You would suspend that current task and use the AppendJobFrameRange function to redistribute/append the tasks frame range as chunks of 1.

Regards,

Charles

Thanks for your answer!

I tested some stuff with the housecleaning event you talked about. Like here
I didn’t know about this event. - it’s actually quite cool and offers a lot of things! So it could lead me to the solution you introduced.
While testing I encountered one problem tho. How is it possible to see a traceback. Or manual logging when working with this “houseclean” event? Here it says that Job Events and Reports are stored in Reports of Jobs or Slaves. Also, I ticked “House Cleaning” -> “Run House Cleaning in a Sperate Process”.

When I opened this sperate Houscleaning Logfile there was no entry about my script at all. “self.LogInfo(I like hazelnuts)” didn’t do anything too.

from Deadline.Events import *
import taskTimeout


def GetDeadlineEventListener():
    """
    This is the function that Deadline calls to get an instance of the
    main DeadlineEventListener class.
    :return:
    """
    return ScheduledEvent()


def CleanupDeadlineEventListener(deadlinePlugin):
    """
    This is the function that Deadline calls when the event plugin is
    no longer in use so that it can get cleaned up.
    :param deadlinePlugin:
    :return:
    """
    deadlinePlugin.Cleanup()


class ScheduledEvent(DeadlineEventListener):
    """
    This is the main DeadlineEventListener class for ScheduledEvent.
    """

    def __init__(self):
        # Set up the event callbacks here
        self.OnHouseCleaningCallback += self.OnHouseCleaning

    def Cleanup(self):
        del self.OnHouseCleaningCallback

    def OnHouseCleaning(self):
        # check which checks should be done
        timeout_active = self.GetConfigEntry("TimeOutActive")

        self.LogInfo( "looking for hazelnuts " ) <-- nothing to find in the house cleaning.log
        # Process TimeoutCheck
        if timeout_active:
            self.LogInfo("hazelnuts timeout") <-- nothing to find in the house cleaning.log
            time_out_multiplier = self.GetConfigEntry("TimeOutMultiplier")
            job_progress_before_check = self.GetConfigEntry("JobProgressBeforeCheck")
            min_frames_job = self.GetConfigEntry("MinFramesJob")
            start_time_offset_in_seconds = self.GetConfigEntry("startTimeOffset")
            test_job_name = start_time_offset_in_seconds = self.GetConfigEntry("TestJob")
            do_test = start_time_offset_in_seconds = self.GetConfigEntry("DoTest")

            taskTimeout.checkTasks(time_out_multiplier,job_progress_before_check, min_frames_job, start_time_offset_in_seconds, test_job_name,do_test)

Check the Pulse log if you have pulse running?

yep its running and no, there is no Entry in the Pulse log. I just realized when I perform manual Housecleaning via the monitor, it works. But the standard house cleaning doesn’t work.

:frowning:

works out, now! I had to restart pulse and since pulse didnt have the same dir I can store modules in, it failed…