AWS Thinkbox Discussion Forums

Deadline - Frames getting stuck in a loop or completing with 0 bytes

Hi everyone,
I’ve written a script that I’ve inserted inside Deadline’s Repository (vers. 10.2) because I had the following problems:
1- a few frames get stuck in a loopè not completing the render for no particular reason (nuke and blender);
2- a few frames are completed in a few seconds with a 0 KB file (blender only).

The script is in the repository’s “events” in a custom folder
The script is as follows:


import os
import time
from future import absolute_import
from Deadline.Events import DeadlineEventListener
from Deadline.Scripting import PathUtils, RepositoryUtils

def GetDeadlineEventListener():
return MyDeadlineEventListener()

class MyDeadlineEventListener (DeadlineEventListener):
def init(self):
self.OnJobStartedCallback += self.OnJobStarted
self.OnTaskCompletedCallback += self.OnTaskCompleted

def OnJobStarted(self, job):
    if job.JobStatus == "Rendering":
        self.ScheduleTaskChecking(job)

def OnTaskCompleted(self, task):
    job = RepositoryUtils.GetJob(task.JobId)
    if job.JobStatus == "Rendering" and task.TaskStatus == "Completed":
        self.ScheduleTaskChecking(job)

def ScheduleTaskChecking(self, job):
    jobFramesFolder = job.JobOutputDirectories[0]  # Cartella dei frame renderati
    completedTasks = RepositoryUtils.GetJobTasks(job, True, "Completed")  # Elenco dei task completati
    averageTime = self.CalculateAverageTime(completedTasks)  # Calcolo del tempo medio dei task completati

    for task in completedTasks:
        if task.TaskOutputFileByteSize == 0:  # Verifica se la dimensione finale del task è pari a zero byte
            RepositoryUtils.RequeueTask(task)  # Rimanda in rendering il task

    if len(completedTasks) >= 5:  # Verifica se ci sono almeno cinque task completati
        renderingTasks = RepositoryUtils.GetJobTasks(job, True, "Rendering")  # Elenco dei task in rendering

        for task in renderingTasks:
            if self.IsTaskExceedingTimeLimit(task, averageTime):  # Verifica se il tempo di rendering supera il limite
                task.TaskStatus = "Suspended"  # Sospende il task
                RepositoryUtils.RequeueTask(task)  # Rimanda in rendering il task

    self.ScheduleCallback(30)  # Richiama la verifica ogni 30 secondi

def CalculateAverageTime(self, tasks):
    totalSeconds = 0

    for task in tasks:
        taskSeconds = (task.TaskEndTime - task.TaskStartTime).TotalSeconds
        totalSeconds += taskSeconds

    return totalSeconds / len(tasks)

def IsTaskExceedingTimeLimit(self, task, averageTime):
    taskSeconds = (DateTime.Now - task.TaskStartTime).TotalSeconds
    return taskSeconds > (averageTime * 4)

def ScheduleCallback(self, interval):
    deadlinePlugin = GetDeadlinePlugin()
    deadlinePlugin.SetScriptCallbackDeadlineCommand(interval)

It’s parameter’s file is like this:


[State]
Type=Enum
Items=Global Enabled;Disabled
Category=Options
CategoryOrder=0
CategoryIndex=0
Label=State
Default=Disabled
Description=If Global, all jobs and Workers will trigger the events for this plugin. If Disabled, no events are triggered for this plugin.

[FrameReloaderEvent]
Type=enum
Values=On Job Started;On Job Finished;On Job Started and On Job Finished
Category=Options
CategoryOrder=0
Index=1
Default=On Job Started
Label=Perform Frame Check
Description=It analyses frames at 0 KB or stuck in a loop and reques them.


This is the menu I’ve built

I really have no clue in how to fix it, it has to be something simple but I really can’t figure it out.
Thank you in advance

AB

We’d have to see task reports to help figure out what’s causing Nuke and Blender to create 0 byte files. I’d suggest going through the troubleshooting guide with a couple of task reports to see if you can re-create the failure.

Unless there’s an issue in your event plugin, in which case could you elaborate on what’s going wrong?

Hi, I’ve done the troubleshooting and it is inconclusive unfortunately.
There is nothing wrong with Deadline or the softwares involved, it is more likely that I have problems with the server where the data is being written on.
I needed a quick fix but I can’t understand why it isn’t working…
No ideas?

Without task reports it’s hard to say, but if writing to your storage disks are failing I’d check the S.M.A.R.T. results for signs of incoming failure.

Also I just realized, you’re hooking into OnTaskCompleted which isn’t an endpoint. The full list is here, we don’t have any task based events and that’ll be why ScheduleTaskChecking is never getting run. You could instead look into automatic task timeouts which does a similar check to what you’ve set up.

Thanks, I appreciate your help so much!
I’ll look into it right away!

Have a great day!

2 Likes

what is this below error.

Error: FailRenderException : Renderer returned non-zero error code -1073740791. Check the renderer’s output.
at Deadline.Plugins.DeadlinePlugin.FailRender(String message) (Python.Runtime.PythonException)

Hello @Ayyappan1986

It depends on what render application you are using. I need to look at the full job report to troubleshoot it further. Share the full job report, here’s how: Controlling Jobs — Deadline 10.3.0.10 documentation

You can look at our docs for the similar errors’ meanings.
After Effects: After Effects — Deadline 10.3.0.10 documentation
SoftImage: Softimage — Deadline 10.3.0.10 documentation

Hi zainali how r u?

This is to inform you we are trying or render nuke jobs in 256gb ram machine with high cpu but 2 minutes render continuously running with more than 7hrs and same jobs gets render fine in 48gb ram machine we are trying to find out the issue but getting resolve can u pls look into it. I’m attaching render logs also
Uploading: 50668cdbcc71b102e03c340c39fac474-0.jpg…
Uploading: 50668cdbcc71b102e03c340c39fac474-1.jpg…
Uploading: 50668cdbcc71b102e03c340c39fac474-2.jpg…
Uploading: 50668cdbcc71b102e03c340c39fac474-3.jpg…
Uploading: 50668cdbcc71b102e03c340c39fac474-4.jpg…





























Hi team awaiting for your replies from long time can you please let us know is there any solution you found pls share us so we can get help to render out maximum shots in crunch situation

Sorry about the delay - for urgent issues please call us at 1-866-419-0283 ext 2 from 9am to 5pm Central Standard Time, or use the ticket system at https://awsthinkbox.zendesk.com/, they’re both better avenues to highlight urgency.

It looks like there’s something failing in your remote connection server - there are nginx “Bad Gateway” errors coming up when the Worker is attempting to push status updates. Have you got an nginx set up in front of multiple RCS’ to balance load?

The render logs seem ok, Nuke reports it successfully outputs files.

Check your nginx logs, those should explain what’s causing issues communicating with your RCS.

Also I realize you cannot upload .txt files to the forums, but you can put them into a .zip file instead of having to make all these screenshots.

Privacy | Site terms | Cookie preferences