GetSlavesInfoSettings throwing Errors

BlueLegend78 · October 19, 2018, 2:57pm

Hi,

In the standalone api, the Slaves.Slaves.GetSlavesInfoSettings function is suppose to select all slaves if no arguments are given. However, when I call it with no arguments i end up with this error.

Error: The given SlaveInfo and SlaveSettings must be for the same Slave. (System.ArgumentException)

So i assumed that one or more of the slaves are corrupted somehow. To debug i made a list of every slave in the slave tab and passed that into GetSlavesInfoSettings and it worked as intended. I also tried the other 2 Slaves.GetSlaveInfos(),GetSlavesSettings() without any arguments and those 2 are working as intended as well. The number of results returned from those 2 functions are also consistant with the number of slaves we currently have on the farm. What could be causing the first function to error out?

We are currently using Deadline 7.2
Thanks!

eamsler · October 19, 2018, 4:38pm

That is some obscure Deadline trivia there. Do you have multiple Slaves on the same machine?

It’s rare the SlaveInfo (updates regularly and includes RAM/IP/etc) exists without SlaveSettings (overrides, CPU affinity, etc). I have this weird feeling there was a bug back then…

Can you paste your test code and I’ll run it over here in 10.0 land?

BlueLegend78 · October 30, 2018, 3:24pm

Hey sorry for the late responds.

The code would be just GetSlavesInfoSettings() from the standalone api. That alone would already give the error above.

I found some more information here.
If i right click a deadline task on the monitor, i get the following error on the console. I would get the same error if i click on connect to slave log

2018-10-30 11:15:45: Traceback (most recent call last):
2018-10-30 11:15:45: File “DeadlineMonitor\UI\Controls\TaskListControl.py”, line 1173, in selectedSlaveInfoSettings
2018-10-30 11:15:45: ArgumentException: The given SlaveInfo and SlaveSettings must be for the same Slave.
2018-10-30 11:15:45: at Deadline.Slaves.SlaveInfoSettings…ctor(SlaveInfo slaveInfo, SlaveSettings slaveSettings)
2018-10-30 11:15:45: at Deadline.StorageDB.MongoDB.MongoSlaveStorage.GetSlaveInfoSettings(Boolean invalidateCache)
2018-10-30 11:15:45: Traceback (most recent call last):
2018-10-30 11:15:45: File “DeadlineUI\UI\Commands\TaskCommands.py”, line 1663, in InnerUpdate
2018-10-30 11:15:45: File “DeadlineMonitor\UI\Controls\TaskListControl.py”, line 1160, in selectedSlaveInfos
2018-10-30 11:15:45: TypeError: argument 2 to map() must support iteration

Another error i see would be when i first open deadline monitor, This would appear on the console

2018-10-30 11:26:29: Error occurred while updating slave cache: The given SlaveInfo and SlaveSettings must be for the same Slave. (System.ArgumentException)

This might also be related, under job properties → machine limit we would see these hashcodes in the slave list, even though they are not listed on the slave window.

eamsler · October 30, 2018, 4:40pm

Ah, that Slave list is interesting… Here’s how you can delete them from the database:

First, run “C:\DeadlineDatabase10\mongo\application\bin\deadline_mongo.bat”

Then search for the Slaves you want to remove by running the following (note the “1” on “deletedCount”):

use deadline10db;
db.SlaveSettings.find({_id: "[random_letters_and_numbers]"}, {"Name":1}).pretty()

Then remove it with this:

db.SlaveSettings.deleteOne({_id: "[random_letters_and_numbers]"})

You’ll want to replace “[random_letters_and_numbers]” with the actual IDs you’re seeing in the “Slave List” there. If you’ve removed them successfully it will show the following:

> db.SlaveInfo.deleteOne({_id: "MyMachine"})
{ "acknowledged" : true, "deletedCount" : 1 }

BlueLegend78 · October 30, 2018, 4:58pm

Here is a snippet of code that i just used for testing. If i just call GetSlavesInfoSettings() without any arguments it would error. But if i call it with a list of every slave on the farm, it would be fine. I’m wondering if removing those 2 corrupted slaves from the database would fix this.

raw = “”“L924
L925
L926
L927
L928
L929
…
…
…
every slave on deadline “””

slave_list = raw.split(“\n”)
print len(slave_list),slave_list

foo = connection.Slaves.GetSlavesInfoSettings(slave_list) # this is okay
# foo = connection.Slaves.GetSlavesInfoSettings()# this will error

pprint(foo)

kwatts · October 30, 2018, 7:30pm

Hey Edwin,

If we look in SlaveInfo, there is a count of 418 slaves, this matches the amount of slaves that show up in the slaves panel.
if we take a look at SlaveSettings, there are 812 entries, from the looks of it all the extra entries are all mongo object id’s.

Are we safe to remove all the the extra Slave setting entries?
of course we will backup the collection first.

Cheers
Kym

eamsler · October 31, 2018, 3:33pm

Yeah, please do clean out those extra entries.

I have no idea why you’re seeing those. I’ve adjusted my instructions above to account for that.

I’m not seeing it on my side here. I’ll see if our production farm has any of these wayward object ids.

kwatts · October 31, 2018, 6:43pm

Confirmed that blowing them all away fixes the issue.

eamsler · November 1, 2018, 2:50pm

I have some details around this too. The current guess is that if you create a SlaveSettings object and save it, it will likely not have its “_id” set and MongoDB will grant it that usual ObjectID.

Now, my guess at the moment is that that’s happening via the web service as that bypasses some of the core code. @BlueLegend do you have any standalone API code that modifies Slave Settings in any way? I know you’re both using the Sandalone API but I have to assume that’s the commonality here.

BlueLegend78 · January 10, 2019, 10:28pm

Yup, I just did it again Turns out i was using the standalone API to run Slaves.SaveSlaveSettings on a SlaveInfoSetting dictionary by mistake. This would cause a crash in the database for a few moments and generate all those extra entities.