This one is a bit weird to explain, but some of our slaves have been losing their group/pool assignments.
It always happens to the same slaves, in our case the last 35 slaves on our list.
Sometimes they lose assignment on their own randomly. Its also sometimes repeatable: if I edit a group of slaves (e.g. right click > modify slave properties > exclude jobs in none group) after I click OK the pools and/or groups are often blank on those last 35 slaves.
We upgraded through a lot of beta builds recently, so this week we cleared database and started from scratch. After setting up repository fresh we’re still having the issue, but now its only the last 13 slaves. So far its happened 3 times, each time those same nodes. We probably had more slaves online prior to the database refresh, possible this behavior only happens with large # slaves > 230?
Let me know if you guys have any ideas to try or any useful info I can gather.
We’ll be upgrading to beta 5 this week and report back if anything changes.
Hey Brian,
Just to confirm, were you seeing this behavior with a 6.2 beta version, or were you still on 6.1? We had made some tweaks in 6.2 that we thought would fix this problem, so if you were seeing this with a previous 6.2 beta, then obviously we still have some work to do.
The problem seems to be related to the way the slave’s settings are cached in the Monitor. In 6.1 and earlier, the slave’s state would be loaded first, followed by the slave’s settings (because they’re stored in separate collections in the database). So in that initial loading phase, the slaves would have a default slave settings object, which would contain empty pools and groups lists. If the slave’s settings were modified and committed during this initial loading phase, then the previous settings would get wiped.
We thought we could address in this 6.2 by loading the slave settings first, and then the slave states. That way, in theory, any slaves you see in the list during the initial loading phase will already have had their settings object loaded.
Cheers,
Ryan
Hey Ryan, I’m sorry I should have specified, this was still on 6.1 (release build). Sounds good! We’ll be upgrading to 6.2 in coming days.
Just confirming, as expected this issue is fixed with 6.2.
We’ve been on 6.2.0.23 now for a few days, smooth sailing so far on Linux & OSX!
Thanks Ryan & Thinkbox!
Awesome! Thanks so much for confirming!