.17 Update - WoL not working

delineator · June 29, 2018, 2:30pm

Hey all, there might be a bug in the .17 update. WoL and power management worked fine yesterday (on .16 - was able to wake up machines via the “start machine” command).

However, in .17, I’m getting this error:

Failure: Could not send Wake-on-LAN message to 192.168.1.81 because: is not a valid MAC address Parameter name: macAddress (System.ArgumentException) (System.Exception)

Appears is trying to send the WoL packet to the local IP, instead of mac address, but not sure why? Pulse is up and running btw

eamsler · June 29, 2018, 4:27pm

I’m hoping it’s just a configuration issue. Could you check the settings for that Slave’s settings and make sure there isn’t an IP address in the MAC Address box?

delineator · June 29, 2018, 6:03pm

Nope, both overrides are blank. WoL does still work (using standalone WoL utilities), seems to be an issue mixing up mac/ip addressed in .17?

edit: its something with pulse and running launcher as a server. investigating now

delineator · June 29, 2018, 7:31pm

nope, reverting to the non-service install of launcher did not fix it. Seems like its an issue with .17 - tried it on various machines running pulse to eliminate variables. I have been having issues with pulse “running” but not running and not performing pending job scans. Not sure if its all related to .17 pulse or a separate WoL bug or what

eamsler · July 2, 2018, 2:01pm

I’ll file an issue for when the guys are back tomorrow. Canada Day long weekend at the office.

Doing some thinking here, I don’t think it’s mixing up MAC and IP but is instead saying it’s trying to wake X by using a MAC address, then it’s saying it’s not valid… I tried breaking my MAC address in the UI and that didn’t work. I’ll see what the dev team say.

eamsler · July 3, 2018, 2:36pm

Hmm. No luck here reproducing with SP17 with just my machine.

You’re using Pulse to forward the request because you’re connecting from home correct? I remember discussions before about it, but I wanted to make sure. We did change things now to properly forward through Pulse after your suggestion, so that’s probably what’s up here.

Update: As soon as I redirected through Pulse I got the same:
[attachment=0]2018-07-03 09_40_20-Remote Commands.png[/attachment]

We’ll get this fixed up.

eamsler · July 4, 2018, 6:11pm

Okay, while we’ve got my regression case solved we’re not exactly sure if we’ve got yours solved Jay.

Would you have some time today to give me a call and we can try messing around with that feature? It’d be nice if you could upgrade first in case we managed to get it in the SP.

The short of the fix is that we were looking up the machine based on the host name, and given that you don’t have overrides I’m not sure it’ll help.

delineator · July 5, 2018, 4:09pm

to follow up, I updated to .17.5 (was running .17.4 previously), and WoL is now working. Not sure how to explain it, but seems good now. Everything else is exactly the same (vpn, same repo settings, same RCS settings, launcher installed not as a service, etc etc).

Anywho, one problem down, onto test the next one!

eamsler · July 5, 2018, 4:24pm

Great! I’ll close out the issue then.

delineator · July 5, 2018, 7:41pm

So, weird quick update: I switched over to laucher as service, got some crazy issues like pulse running multiple instances, crashing out, etc. Switched back, seemed ok, still got some disconnection issues. Weird.

I made my own machine as prime pulse, and seemed ok, but then would run a power management check (which I have set up to put machines to sleep), and my machine would turn off. huh. Super user mode, and any attempt to put any machine to sleep (which has worked before), and it puts MY own workstation to sleep.

Not sure what to make of this, but obviously I have to disable all the power management stuff for right now so I can get some work done. Its clearly, for some reason, routing sleep and pulse commands to whatever machine is the prime?

eamsler · July 6, 2018, 2:43pm

Now, the sleep call is particularly weird and I’ll try and drive with that and see what I can figure out. It’s running through a mechanism that shouldn’t have been touched by the WoL changes, but we’ll see what goes on here. I did have trouble getting my machine to go to sleep using Pulse the other day so maybe there is some sort of routing issue…

Update:

I tried some experiments by creating two fake machines with IP addresses of “10.1.2.3” and “255.255.255.255” and also tried with and without Pulse running and with and without redirection enabled. The only thing that came from that are this error message:

could not resolve slave name (MyMachine) to IP address. The machine may not exist on the network.

What happens if you ping the machines you’re trying to sleep? What IP addresses does it come up with?

delineator · July 6, 2018, 3:47pm

Yeah, just confirmed - I tried to restart a machine thru monitor, and it restarted whatever machine is running prime on Pulse. The weird thing is the ip is listed correctly for the slave in monitor (no hostname of IP overrides at all). However, when I sent the command, the remote command panel popped up, and under the machine name column, it pointed to the Pulse IP address, not the IP address of the target.

Pinging hostnames, I get a ipv6 of the pulse machine and the machine I was trying to restart - which is odd because I have completely disabled ipv6 on my router. Again, I’m not very good with networking stuff, but something seems odd?

eamsler · July 9, 2018, 3:47pm

Apparently we got it reproduced here. I think my test failed too soon to see the issue. I’m following up with the dev team on getting this guy fixed.

I’ll also see what workarounds we can provide in the mean time.