AWS Thinkbox Discussion Forums

[7.1.0.27] closed VMs showing as stalled in Monitor

After a job is done and blanacer has spun the VMs down, they still show up in red as stalled. Shouldn’t they disappear if they no longer exist? I have red in my ledger…er…monitor.

-ctj

Indeed they should. Thanks for the report. We are investigating.

Eric checked into this and this behavior would be expected when launching and terminating VMs from the cloud panel, as this does not automatically refresh the Slaves list. However, the Balancer should clean up the Slaves list after terminating VM instances. Were you running Balancer when you saw this behavior? If so, do you see anything that looks like an error message in the Deadline Console or the Balancer log?

thanks for the response folks. yes, that’s with the balancer. I don’t see any log errors. We upgraded to 7.1.0.30 this morning, and still the same issue, btw.

Just to clarify, the VMs are stopped by the balancer, but they are still showing in the monitor slave list as ‘stalled’, even though the are no longer with us.

Thanks for the info. We are continuing to look into this.

Hey ctj,

I have a couple questions about your setup. How long is it taking for your instances to shutdown? The reason I ask is because it could be possible that the balancer is deleting the slave when it terminates the instance but the instance takes too long to shutdown. The slave could be reporting it’s status again and then shutting down. You’d probably notice it being removed and then coming back though.

Also, are you doing anything to change the hostname of the instance? That could cause a problem for the balancer when it tries to remove those slave entries.

Thanks,
Eric

Morning Eric,

I have a couple instances that have a last status update of 2015/04/24, and they are still in my monitor as stalled. We are not changing the names of the instance from when the balancer launches them, to when it is supposed to get rid of them. The balancer is spinning them down just fine, so I would assume it would remove them just fine as well.

-ctj

Hey ctj,

Can you check you’re balancer log for when it requested those instances to be shutdown and compare that time with the last status updated time. If the status time is much after the deletion time it might be that the slaves are taking too long to shutdown and updating their status after the balancer has deleted their entry in the slave list.

I’m really not sure what else it could be. If there was hostname weirdness you’d have extra dummy entries in the slave list. Hopefully we can get to the bottom of this soon.

Thanks,
Eric

after some modifications, we’re no longer getting the ‘stalled’ or ‘idle’ nodes, but they are still hanging out as ‘offline’. Granted, a great improvement, but I’d love the balancer to delete them entirely.

baby steps.

-ctj

Hey Chris,

Looks like there was a bug with my original fix. It’ll be in 7.1. Sounds like it’ll be out tomorrow.

Thanks for testing this stuff out and thanks for all your patience.

Eric

Privacy | Site terms | Cookie preferences