AWS Thinkbox Discussion Forums

Loss of power = scrambled slave settings

Weird…

Just to confirm, are slave settings actually getting wiped, or is it just a display issue in your Monitor? The pending changes you are referring to is really just a difference between what your Monitor thinks the slave settings are, and what the slaves think they are.

Can you post the last 5 monitor logs from your machine? I’m wondering if maybe you’re having connectivity issues to the database, which results in your display being messed up. It shouldn’t really have anything to do with the launcher…

Thanks!
Ryan

Okay well the previous times this happened, I didn’t know about it until a compositor notified me that there were no 2D jobs that were rendering, and it was because I didn’t catch this until all of the pools were completely wiped. So yes, it’s actually wiping the pools and stuff. That was 3 times ago, and since then I’ve figured out that you can prevent a total loss of the settings by closing the monitor and/or the launcher. Laszlo and I verified that the pending wipes could be seen on multiple computers including his workstation and other artists could see it too.

After closing my monitor/launcher and re-opening them everything was reverted (machines that should have been disabled were again disabled, and pools, groups, descriptions were normal again). I’m totally confused how this would happen, but if it’s a monitor display bug it’s happening everywhere, and if I had ignored it I’m sure I’d be getting a call from some artists by now about their jobs sitting idle.
deadlinemonitor-LAPRO2056-2014-02-06-0000.log (1008 Bytes)
deadlinemonitor-LAPRO2056-2014-02-05-0001.log (15.2 KB)
deadlinemonitor-LAPRO2056-2014-02-05-0000.log (77 KB)

Here’s the other two logs.
deadlinemonitor-LAPRO2056-2014-02-06-0002.log (588 Bytes)
deadlinemonitor-LAPRO2056-2014-02-06-0001.log (895 Bytes)

Just wanted to point out again, that this sort of thing appears to be kickstarted when either Robert or I start the monitor up when we come into work in the morning. I’m wondering if there’s some stale commands sitting in my launcher that get released when the monitor is started or something like that. All of our workstations are supposed to kick into render mode each night, and upon entering farm mode the deadline monitor is killed, but the launcher remains. Maybe that helps narrow down the search.

We may actually have a lead on what could be causing this. The fact that multiple monitors were showing it helped to steer us in the right direction.

We double checked our slave auto-updating code, and currently we load all the slave states first, and then the slave settings (this is because they’re stored in separate documents in the database). So when you launch the Monitor, the slaves will initially have “default” slave settings until those settings are loaded in. This could explain the problem you’re seeing, and if you were to modify any slave settings during this initial load (before the slave settings get loaded in), that could explain why they might get wiped.

We think the solution to this would be to load the slave settings first, and then load the slave states afterwards. It would mean that on initial load, the slaves will show up in an “unknown” state, but at least their settings will be accurate, and should in theory fix this problem. After making this change, we’re going to populate our test database with a few thousand slaves to see if we can reproduce what you’re seeing.

Cheers,
Ryan

That’s great, I’m glad you guys have some leads. Let me know if you need any more logs or info, I am glad to help.

Robert noticed this happened again soon after he logged in around 12:35 today. I’m attaching a screenshot to better show what is happening. This time, it appeared to only affect a portion of machines that have been offline since the last version. They have their status, description, pools, and groups blanked out. I’m attaching a screenshot, and two monitor logs from my machine during the time this happened.
deadlinemonitor-LAPRO2056-2014-02-07-0002.log (673 Bytes)
deadlinemonitor-LAPRO2056-2014-02-07-0001.log (5.81 KB)

And here’s the same logs from Robert’s machine.
deadlinemonitor-LAPRO2047-2014-02-07-0002.log (670 Bytes)
deadlinelauncher-LAPRO2047-2014-02-07-0003.log (3.31 KB)

Hey guys, just wanted to follow up on this one. We have not experienced this sort of thing since upgrading to 6.2

Thanks again for all the assistance! :smiley:

That’s great to hear! Thanks for letting us know.

Cheers,
Ryan

Privacy | Site terms | Cookie preferences