Spot Plugin Not Handling Limits Well Across Groups

Hi there,

We’re running into issues with the Spot event plugin not dealing with limit shared across groups very well. Wondering if anyone else is facing this issue, and if so how they’re combating it?

Example issue scenario we’re facing:

Jobs that require a limit, for example “arnold”, can be submitted to 2 groups, 3d_medium or 3d_high depending on machine specs required to complete the job.
When job A is submitted to 3d_medium, the spot plugin will scale that groups spot fleet until all the limits are utilised.
Now job B is submitted to 3d_high, with a higher priority than job A. As all the “arnold” limits are in use, the Spot plugin won’t scale up the 3d_high fleet. It’s only when enough of job A has finished and workers become idle in 3d_medium that they spin down, allowing 3d_high to start spinning up.
The real issue is when we’re busy, users will continually submit to 3d_medium, so 3d_high never gets a chance to spin up.

It seems like the spot plugin needs to take into account the priority of jobs in other groups that require a limit when determining the target capacity of a groups spot fleet.

This is a particular issue if your farm is fully cloud based relying on the Spot plugin, with no supplementary static instances.

Keen to hear if others are facing this and how they’re resolving it?