AWS Thinkbox Discussion Forums

Slaves don't update status with startup Event

If I run this startup Event plugin the slaves never come online. If you connect to the slave log they think they’re online and waiting for jobs but they aren’t in the monitor as online.

Also when I enable this event it errors in the monitor.
ConfigSlave.zip (1.24 KB)

I’ve made some tweaks to your event plugin. See attached. This is working fine for me (no errors and Slave always comes online), although I do get a RPC error, but that is networking/windows related and shows that the plugin is indeed executing fine from Deadline’s point of view.

What errors do you get?

ConfigSlave.zip (1.29 KB)

Same error when enabling:

An unexpected error occured while Saving Event Plugin Settings: The given key was not present in the dictionary. (Deadline.Plugins.PluginException) at Deadline.events.SandboxedEventsmanager.a(DeadlineMessage A_O) at Deadline.Events.SandboxedEventManager.CheckForUpdates() at Deadline.Monitor.WorkItems.SaveEventPLuginSettingsWI.InternalDoWork() at Deadline.Monitor.MonitorWorkItem.DoWork()

Also I do get the output of the event (usually) but it still doesn’t show up as a slave in the monitor.

2016-11-03 11:52:24: BEGIN - RENDER-VM-01\renderadmin 2016-11-03 11:52:24: Deadline Slave 8.0 [v8.0.7.3 Release (f33fcb7d3)] 2016-11-03 11:52:25: Scanning for auto configuration 2016-11-03 11:52:26: Auto Configuration: A ruleset has been received from Pulse 2016-11-03 11:52:26: Connecting to repository 2016-11-03 11:52:28: Info Thread - Created. 2016-11-03 11:52:41: Hello World: Render-VM-01

That’s the last of the log. It never does anything after that.

Hmmm, I’ve tested this quite a bit this morning and afternoon on 8.1.5.3, which is inline with 8.0.11.1 and I’ve never hit any error at all. I wonder if this is the event sandbox either having died or being orphaned and you simply need to update to 8.0.11.1, as per release notes in 8.0.10.4:

docs.thinkboxsoftware.com/produc … e-8-0-10-4

Can you update?

No change.

2016-11-03 12:41:28: BEGIN - RENDER-VM-11\renderadmin 2016-11-03 12:41:28: Deadline Slave 8.0 [v8.0.11.1 Release (75400c3ee)] 2016-11-03 12:41:29: Scanning for auto configuration 2016-11-03 12:41:30: Auto Configuration: A ruleset has been received from Pulse 2016-11-03 12:41:30: Connecting to repository 2016-11-03 12:41:31: Info Thread - Created. 2016-11-03 12:41:36: Hello World: render-vm-11

I should note. This is apparently limited to running as a service. I just ran it as a regular desktop app and it worked fine. Shut it down, started it back up as a service and same non-connection.

Ah. Run as Service. Have you tried not using subprocess - shell=True?

docs.thinkboxsoftware.com/produc … a15c781df7

I’ve not tested this, but see if this helps?

ConfigSlave.py.zip (831 Bytes)

No dice.

I’ve actually managed to hit the “The given key was not present in the dictionary.” multiple times now. It seems to be caused by making changes to the param file while developing a plugin, and should be unrelated to the Slave not starting.

Try re-naming Mike’s variant to “ConfigSlave2” and see if it helps. If so, I’mma need a database dump of the “EventSettings” collection to see if yours is broken like mine is. You can just run “mongodump --port=27080” in the database’s bin directory and e-mail it to be direct (edwin@thinkboxsoftware.com). I’ll add it to the internal tracking issue we’ve got for this.

Can you just pull the existing database dump from a week ago? I was getting the error months ago so it should be ‘up to date’.

Renaming the event to ‘2’ does fix the problem.

EDIT: Both problems the error message and the failure to show up online.

Hmm the reason it’s not failing anymore is because it’s also apparently the 2 version of the script needed an update before functioning so it was erroring out before executing the script.

v2 doesn’t error in the monitor but does still fail on slave startup.

Sorry about not seeing your posts until now.

Yeah, it looks like the problem is caused by duplicate entries for the plugin’s settings in the database. We’re going to be fixing that in 9.0 when we can change the DB format. For the time being though, we need to find out how we’re making duplicates. For me, I think it’s because I copied the script over from my main folder to my ‘custom/events’ folder.

Here’s your DB entries for the plugin before the rename:

{
    "_id" : "57226b3119418f19044501a3",
    "Name" : "ConfigSlave",
    "Icon" : null,
    "Limits" : [ ],
    "DlInit" : {
        "Enabled" : "False",
        "RenderPool" : "",
        "RenderGroup" : "",
        "State" : "Disabled"
    }
}
{
    "_id" : "57226b312eceb80538c35252",
    "Name" : "ConfigSlave",
    "Icon" : null,
    "Limits" : [ ],
    "DlInit" : {
        "Enabled" : "False",
        "RenderPool" : "",
        "RenderGroup" : "",
        "State" : "Disabled"
    }

They’re identical, so I’m blaming this on the duplicates.

Also, this issue is on the sprint, so I’m expecting it to be fixed soon assuming things go smoothly.

Interesting. Doesn’t explain though why slaves can’t report being online though in either plugin.

I wonder if it’s how long it takes to run getmac. We’ve hit issues with WMIC calls that can take upwards of 15 minutes to return (caused some problems with the NIC/Disk usage reporting in Deadline 7.1).

I’m going to modify your example to have a 3 minute pause in it to see if it causes the Slave some trouble. Can you try changing he command to something like an echo? Also, we should probably be fixing this MAC address issue if it’s something that’ll benefit everyone.

Is this resolved fully in latest builds?

Doesn’t seem to be working yet. Hangs on "ProcessUtils.WaitForExit ( process, -1 )

Hmm. Have you tried replacing the getmac command with say, c:\windows\system32\cmd.exe /c echo to see if it works with some other tool?

Everything on the Deadline side looks good. I’m just wondering if getmac is the issue. Maybe we can call ipconfig /all and grab the one on the machine for the particular machine the Slave is starting on.

The problem is we’re running our farm on Hyper-V so the GetMac is getting the mac of the host system for WOL not the VM. We want the host machine to shut down when no VMs are running and then startup when one of its VMs are activated.

Doesn’t the SpawnProcess python command require an exe and command and startupdir?

Privacy | Site terms | Cookie preferences