Hello, I’ll explain my problem.
I know that to create event scripts in Deadline Monitor, you need two files: the .py and the .param files.
With that said, I need to ensure that in my render farm, when a job with its task ID and selected frames is sent to “test_pool,” if it exceeds 15 minutes of rendering, the job is automatically requeued to avoid stalls or hanging issues, etc. I am not sure if this is feasible; I have tried several things, and although the event is detected, it doesn’t fully work.
I’ve been trying with things similar to this:
.py:
from Deadline.Events import DeadlineEventListener
from Deadline.Scripting import RepositoryUtils
def GetDeadlineEventListener():
return CustomRetryEvent()
def CleanupDeadlineEventListener(deadlinePlugin):
deadlinePlugin.Cleanup()
class CustomRetryEvent(DeadlineEventListener):
def __init__(self):
self.OnJobRunningCallback += self.OnJobRunning
def Cleanup(self):
del self.OnJobRunningCallback
def OnJobRunning(self, job):
# Nodos específicos
specific_nodes = ["render09"]
# Pool específica
specific_pool = "test_2d"
# Tiempo límite en segundos (45 minutos)
time_limit = 45 * 60
if job.JobPool.lower() == specific_pool:
tasks = RepositoryUtils.GetJobTasks(job, True)
for task in tasks:
# Verificar si la tarea está siendo ejecutada en uno de los nodos específicos
if task.TaskStatus == "Rendering" and task.TaskSlaveName.lower() in specific_nodes:
if task.TaskRenderTime.TotalSeconds > time_limit:
# Reenviar la tarea
RepositoryUtils.RequeueTasks(job, [task])
.param:
EventName=CustomRetryEvent
ScriptFile=CustomRetryEvent.py
Enabled=True
Hi
I’ve reviewed your code and the documentation, but I couldn’t find an OnJobRunningCallback
mentioned anywhere. In the past, I faced a similar situation and ended up writing an external watchdog program, which worked fine for me. However, I agree that running it internally would be easier and cleaner. You might want to try using OnJobStartedCallback
or OnSlaveStartedCallback
instead.
https://docs.thinkboxsoftware.com/products/deadline/10.3/1_User%20Manual/manual/event-plugins.html
hi, thank you for answering.
I changed it to OnJobStartedCallback
but it still doesn’t respond, I don’t know if the problem is in the .param
from Deadline.Events import DeadlineEventListener
from Deadline.Scripting import RepositoryUtils
def GetDeadlineEventListener():
return CustomRetryEvent()
def CleanupDeadlineEventListener(deadlinePlugin):
deadlinePlugin.Cleanup()
class CustomRetryEvent(DeadlineEventListener):
def __init__(self):
self.OnJobStartedCallback += self.OnJobStarted
def Cleanup(self):
del self.OnJobStartedCallback
def OnJobStarted(self, job):
# Pool específica
specific_pool = "test_2d"
# Tiempo límite en segundos (15 minutos)
time_limit = 5 * 60
if job.JobPool.lower() == specific_pool:
tasks = RepositoryUtils.GetJobTasks(job, True)
for task in tasks:
# Verificar si la tarea está siendo ejecutada y si ha excedido el tiempo límite
if task.TaskStatus == "Rendering":
if task.TaskRenderTime.TotalSeconds > time_limit:
# Reenviar la tarea
RepositoryUtils.RequeueTasks(job, [task.TaskId])
.param:
EventName=CustomRetryEvent
ScriptFile=CustomRetryEvent.py
Enabled=True
Would setting a job timeout automatically work for you?
I’d do it in an OnJobSubmitted event, and use this event as an example. The only issue is I’m not finding a way in the API to set the “OnTaskTimeout=Requeue” property on an already created job.
As for troubleshooting your existing event - your .param isn’t correct. Check out this page for an example and a list of valid options.
Hi Justin, thanks for the help.
The problem with job timeout is that it applies to everything, and I only want to apply it to a specific pool.
Oh, you’re thinking of automatic job timeout, that’ll be set farm-wide. But the other one I posted can be set on a job-by-job basis.
You could do it manually in the Monitor by double clicking the job and going to ‘timeouts’ and adjusting settings there if you’d like to test first without having to write code.
I would also add some logging so you can see what’s happening in the console.
def OnJobStarted(self, job):
self.LogInfo(f"OnJobStarted {job.JobId}")
# Pool específica
specific_pool = "test_2d"
# Tiempo límite en segundos (15 minutos)
time_limit = 5 * 60
Sample .param file
[State]
Type=Enum
Items=Global Enabled;Opt-In;Disabled
Label=State
Default=Disabled
Description=Time out
1 Like