Worker shutdown by spot event plugin

Hi,

I am not sure if it is a bug or my lack of understand of the Spot Event Plugin. I have Spot Event plugin enabled for two specific groups in Deadline. I also have the idle timeout set to 8 mins to terminate a spot instance.

I have an on-demand instance running a worker for a third group that is not part of the Spot Event plugin configuration. But the spot plugin is stopping the worker process on that instance after 8 mins also. So is the spot event plugin suppose to stop the worker even though it is not part of the groups in the spot configuration?

Thanks

So I’ve gone poking into the Spot.py file that drives the Spot Event Plugin (SEP) on my 10.1.9.2 install.

Looking at the OnSlaveInfoUpdated function we’re assuming that any worker that can successfully hit the AWS metadata endpoint is an AWS worker, and therefore under the control of the SEP. Which explains why it’s killing your On-Demand machine.

If you’re comfortable with Python and our API, you could add a check for your On-Demand Worker’s Group at the top of OnSlaveInfoUpdated so it gets ignored by the SEP.

Let me know if the Worker’s group is the best way to identify that worker or if there’s a better way and if you need a hand making that change to the Spot.py file.

Hi Justin,

I finally got a chance to add the check to OnSlaveInfoUpdated method. It is working well. I changed it to ignore any instances in a particular group. Thanks for your help.

1 Like