[Solved] Callbacks not getting called

We have a custom event script setup. It lives in the repository custom events folder with the folowing structure.

$DEADLINE/custom/events/Update
                                                   /Update.dlinit
                                                   /Update.py

The contents of Update.dlinit are:

Enabled=True

Update.py contents are

from subprocess import (check_output as runcmd, STDOUT, CalledProcessError)
from Deadline.Events import *

## DeadlineAPI is the stand alone python module.
from DeadlineAPI.DeadlineConnect import DeadlineCon as Connect

all_systems = "all"
aws_linux_group = "linux"
aws_windows_group = "windows"
aws_linux_pool = "linux"
aws_windows_pool = "windows"

command = "asgupdate"


######################################################################
## This is the function that Deadline calls to get an instance of the
## main DeadlineEventListener class.
######################################################################
def GetDeadlineEventListener():
    return Update()


######################################################################
## This is the function that Deadline calls when the event plugin is
## no longer in use so that it can get cleaned up.
######################################################################
def CleanupDeadlineEventListener(deadlinePlugin):
    deadlinePlugin.Cleanup()


######################################################################
## This is the main DeadlineEventListener class for UpdateAWS.
######################################################################
class Update(DeadlineEventListener):
    def __init__(self):
        # Set up the event callbacks here
        self.OnJobSubmittedCallback += self.OnJobSubmittedOrResumed
        self.OnJobResumedCallback += self.OnJobSubmittedOrResumed
        self.OnSlaveStartedCallback += self.OnSlaveStarted
        self.OnSlaveStoppedCallback += self.OnSlaveStopped
        self.OnSlaveIdleCallback += self.OnSlaveIdle
        self.OnSlaveStalledCallback += self.OnSlaveStalled
        self.conn = Connect("localhost", 8080)

    def Cleanup(self):
        del self.OnJobSubmittedCallback
        del self.OnJobResumedCallback
        del self.OnSlaveStartedCallback
        del self.OnSlaveStoppedCallback
        del self.OnSlaveIdleCallback
        del self.OnSlaveStalledCallback
        del self.conn
        self.LogInfo(
            "Cleanup called ...<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<")

    def OnJobSubmittedOrResumed(self, job):
        # Job is an http://docs.thinkboxsoftware.com/products/deadline/7.1/2_Scripting%20Reference/class_deadline_1_1_jobs_1_1_job.html
        self.LogInfo(
            "{}.OnJobSubmittedOrResumed fired, attempting to upate desired capacity <<<<<<<<<<<<<<<<<<<<<<<".format(
                self.__class__))
        try:
            runcmd([command, "-op", "job", "-group", job["Plug"]],
                   stderr=STDOUT)
        except CalledProcessError as e:
            self.LogInfo("Failed to run deadliner, exit code {}: {}".format(
                e.returncod, e.output))

    def OnSlaveStarted(self, slave_name):
        # Slave name is a string
        self.LogInfo(
            "{}.OnSlaveStarted fired, attempting to upate desired capacity <<<<<<<<<<<<<<<<<<<<<<<".format(
                self.__class__))
        slave = self.get_slave_info_settings(slave_name)
        os = "windows" if slave.SlaveInfo.MachineOperatingSystem == "Windows" else "linux"
        self.LogInfo(self.conn.Slaves.AddPoolToSlave("aws_{}_pool".format(os)))
        self.LogInfo(self.conn.Slaves.AddGroupToSlave("aws_{}_group".format(
            os)))

    def OnSlaveStopped(self, slave_name):
        # Slave name is a string
        self.LogInfo(
            "{}.OnSlaveStopped fired, attempting to upate desired capacity <<<<<<<<<<<<<<<<<<<<<<<".format(
                self.__class__))
        self.delete_slave(slave_name)

    def OnSlaveIdle(self, slave_name):
        # Slave name is a string
        self.LogInfo(
            "{}.OnSlaveIdle fired, attempting to upate desired capacity <<<<<<<<<<<<<<<<<<<<<<<".format(
                self.__class__))
        self.delete_slave(slave_name)

    def OnSlaveStalled(self, slave_name):
        # Slave name is a string
        self.LogInfo(
            "{}.OnSlaveStalled fired, attempting to upate desired capacity <<<<<<<<<<<<<<<<<<<<<<<".format(
                self.__class__))
        self.delete_slave(slave_name)

    def get_slave_info_settings(self, slave_name):
        # slave_name is a string
        return self.conn.Slaves.GetSlaveInfoSettings(slave_name)

    def is_aws_node(self, slave_name):
        # slave_name is a string
        snig = self.conn.Slaves.GetSlaveNamesInGroup
        if slave_name in snig(aws_windows_group) or slave_name in snig(
                aws_linux_group):
            return True
        return False

    def delete_slave(self, slave_name):
        ## We only delete the slave if it is an aws node. See the README.
        if self.is_aws_node(slave_name):
            res = self.conn.Slaves.DeleteSlave(slave_name)
            self.LogInfo("Deleted slave {}: {}".format(slave_name, res))

The plug in seems to get loaded as the cleanup method gets called, but the other callbacks never do. Or they at least don’t make it to the log statement or otherwise log an error.

I am very new to Deadline so I am sure there is a missing incantation or mispelling or something. Does anyone see anything or have any guidance for me?

Going to need to have someone review the code on this and see why it might not be working. Just for my own understanding, can you advise what the script is supposed to be doing?

Hi,

I took a look at your event plugin and your trying to use our Standalone Python API (a wrapper around our RESTful http API) when your inside of Deadline, which is unnecessary as executing anything inside of the Deadline environment already gives you access to our Scripting API:
docs.thinkboxsoftware.com/produc … dline.html

However, I realised that essentially what your trying to get an event plugin to do (start/stop/delete VM instances in AWS cloud, based on pools/queue capacity) is exactly what our “Balancer” application has been designed to do! It also has an abstracted Python API to let you build your own “Balancer Algorithm or Balancer Plugin” as we call it.

Balancer:
docs.thinkboxsoftware.com/produc … ancer.html

Balancer Plugin API:
docs.thinkboxsoftware.com/produc … ugins.html

Good to know this is redundant. I’ll look at converting it to use the non native python scripting API once we have the callbacks getting called as expected. Currently the aren’t called as the LogInfo method never gets exercised in the callbacks, although it does in the cleanup method.

We much appreciate the links to the balancer plugin and API. However, we aren’t looking to alter instances directly, but to alter the auto scaling group’s desired capacity and then let the ASG handle starting/stopping spot instances.

Do you have any ideas why the callbacks aren’t getting called?

Again, I am sure I must be missing something in the magic incantation – a config directive or a case issue or something. The cleanup method gets called but not a single callback does. Thoughts about that?

I think the callbacks are executing, it’s just your use of our built-in LogInfo command is getting borked.

I’m out of the office for the next few days, but at a guess, I would say that the self.LogInfo function probably doesn’t like the use of “.format”. If you want to use that format (pun intended!), I would just use a standard “print” command, like in this example:
github.com/ThinkboxSoftware/Dea … tyClamp.py

We have a few examples of event plugins, if you need to compare your code / sanity check yourself here:
github.com/ThinkboxSoftware/Dea … tom/events

Thanks, Mike Owen!

I’ll convert the loginfo statements. That hadn’t occurred to me, but I am also new to “IronPython”. I can see where that could cause issues. I should be getting to this in the next day or so. I’ll update you here when I do. In the mean time, enjoy your days out of the office, and Happy Thanksgiving if you celebrate it:)

I added a “{}”.format(“foo”) style string format to the loginfo statement in the cleanup method. It still logs fine. Regardless of that I changed everything else to use string interpolation for string substitutions. Tomorrow we’ll run a couple jobs and see if the callbacks work…

Using string interpolation in lieu of string formatting yielded no callbacks getting executed.

I can call the attached method in the cleanup method and it logs that it is trying to execute, but since I am calling it with the wrong argument type – string instead of the expected job – it fails as expected. So it seems that the callbacks really aren’t getting executed by Deadline. I still don’t know why.

Any more thoughts when you get back to the office?

What exact version of Deadline and OS are your running? The EventCallbackListener is indeed working in Deadline as your event script is being executed, as the cleanup function is logging for you. Can you share your latest code? All the examples in our GitHub site work just fine, so you could use these as the most basic example to see what may be wrong with your script:
github.com/ThinkboxSoftware/Dea … tom/events

To test something simple like “onJobFinished”. Simply right-click a job in your queue and mark it as completed. Then in the “console” panel in monitor, any StdOut will be displayed, together with a log report will be generated in the job’s report’s panel. Does anything stand out here?

Until last Friday we were running the latest 7.1.x.x code. On Friday it was upgraded to run 7.2.0.18. I now see the callbacks getting called. So it “seems” that the upgrade fixed it.

I’ll modify the plugin to not use the native python library and I’ll post any further questions in a new thread as this one is now solved:)

Thanks for your help Mike Owen!

Cool. I’m glad it’s now working for you.

The EventListener’s logging methods are definitely broken in 7.1. Glad to hear they’re fixed in 7.2 though.