AWS Thinkbox Discussion Forums

Loss of power = scrambled slave settings

We had another power outage over the weekend that affected the entire farm. Once the power came back, most of the slave settings had been lost or in some cases, apparently randomized. 100% of our secondary slaves had lost descriptions, pools, groups, and even status (disabled slaves were now enabled). A large handful of our primary slaves also experienced complete loss of these settings as well, and some of them only lost partial settings while others were given totally new settings, i.e. pool assignment. On a small handful of machines, secondary slaves were spontaneously created where there hadn’t been secondary slaves in the first place.

While looking at the log files of some of these slaves, it looks like all of the changes were coming from my workstation. It’s strange, because I’m fairly confident that my machine didn’t even have the monitor open during this time (our workstations kill the monitor upon entering render farm mode and it would have been in render mode during this period of time).

I’m told that we lost power around 2:45 pm on Saturday, and the power didn’t come back on for a couple of hours. The log entries show that I blanked out the pools at 8:35pm on every slave I look at.

2013/12/23 11:55:17 scanlinevfx LAPRO0203 (LAPRO0203\ScanlineVFX): Modified Pool list to: [2dshared, python]
2013/12/23 11:57:14 scanlinevfx LAPRO0203 (LAPRO0203\ScanlineVFX): Modified Pool list to: [2dshared, python]
2013/12/23 11:58:28 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Group list to: [nuke, python]
2013/12/23 12:12:12 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [2dshared, python]
2013/12/23 16:34:59 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [2dshared, python]
2013/12/23 17:47:08 viet.nguyen lapro2009.scanlinevfxla.com (lapro2009.scanlinevfxla.com\viet): Modified Pool list to: [2dshared, python]
2013/12/26 16:11:56 viet.nguyen lapro2009.scanlinevfxla.com (lapro2009.scanlinevfxla.com\viet): Modified Pool list to: [2dshared, python]
2013/12/31 14:45:42 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [2dshared, python]
2013/12/31 17:16:26 robert.crowther VCPRO1007 (SCANLINEVFXLA\robert.crowther): Modified Pool list to: [2dshared, python]
2013/12/31 18:02:33 robert.crowther VCPRO1007 (SCANLINEVFXLA\robert.crowther): Modified Pool list to: [2dshared, python]
2014/01/11 20:35:24 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: []
2014/01/11 20:42:44 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Group list to: [python, nuke]
2014/01/11 20:52:45 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2014/01/13 11:15:46 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2014/01/13 14:28:34 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Viewed Slave History.

Can you check the Monitor logs on your machine to see if they indicate if your Monitor was running or not? You can launch your Monitor and then select Help -> Explore Log Folder, and then see if there are logs from the weekend. The “Modified Pool list to” messages only come from the Pools Management dialog, and only if OK is pressed, so I can’t see how these could have come from anywhere else except your Monitor…

Looks like you were right, the monitor was running. Sorry for the late response. This is all I could find in the logs that relate to the power outage. I narrowed down the timeframe of the power failure to around 2:54 pm on Saturday the 11th. Is there any other log I can dig in that would help narrow this down?

deadlinemonitor-LAPRO2056-2014-01-11-0000.log

2014-01-11 14:24:43: BEGIN - LAPRO2056\ScanlineVfx_user
2014-01-11 14:24:43: Error occurred while updating slave cache: Timeout waiting for a MongoConnection. (System.TimeoutException)
2014-01-11 15:08:24: Error occurred while updating job cache: Timeout waiting for a MongoConnection. (System.TimeoutException)
2014-01-11 15:08:26: Error occurred while updating pulse cache: Timeout waiting for a MongoConnection. (System.TimeoutException)

We could take a look at the mongodb log from that time to see what commands the database was receiving. The log will probably be pretty big, so if you just upload +/- 30 minutes or so around the time the slave settings were wiped, that should be sufficient.

Cheers,
Ryan

Started up Monitor this morning and found that almost all of the slaves we had set as disabled and offline were enabled again but were missing the description and comment. Looked in the slave history and it didn’t show history of it being enabled. But there was history of Jbird and myself changing pools which we did not do. The history log shows:

2014/01/11 20:52:42 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Slave disabled
2014/01/13 11:15:42 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: []
2014/01/14 14:35:25 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: []
2014/01/14 15:06:10 robert.crowther LAPRO2047 (LAPRO2047\robert.crowther): Modified Pool list to: []
2014/01/15 14:49:22 robert.crowther LAPRO2047 (LAPRO2047\robert.crowther): Modified Pool list to: []
2014/01/15 14:53:48 robert.crowther LAPRO2047 (LAPRO2047\robert.crowther): Modified Pool list to: []
2014/01/15 18:17:02 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: []
2014/01/15 18:19:24 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: []
2014/01/16 11:15:37 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: []
2014/01/20 10:36:15 jon.bird LAPRO2056 (LAPRO2065\ScanlineVfx_user): Modified Pool list to: []
2014/01/20 10:55:20 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: []
2014/01/20 11:02:44 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: []
2014/01/20 12:29:01 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: []
2014/01/20 15:50:45 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: []

Most of our secondary slaves have also had their description and comments erased. The pools and groups were wiped clean on over half of them as well. The ones that kept their pools and groups were given the [] pool.

We didn’t have a power outage last night so I’m not too sure what could have caused this.

Today we noticed what I think are two distinct issues.

#1 All of our slaves that were both 1) disabled in deadline and, 2) machine was offline, had their pools, groups, descriptions, and comments removed and the slave was re-enabled.

#2 All secondary slaves had their information removed as well.

Here is a log that shows what happened today on 100% of the secondary slaves running. The log also shows what happened on the 11th/12th which was our power outage. Today I came in to work around 9:45 am and opened my monitor. The monitor remained open all day up to this point. Now when I look at the logs, it appears I cleared out all of the pools to the secondary slaves. My workstation is lapro2056

2013/12/06 16:57:46 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, 2dshared, python]
2013/12/06 17:25:57 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, 2dshared, python]
2013/12/08 16:04:50 stephan.trojansky LAPRO1044 (LAPRO1044\ScanlineVFX): Slave disabled
2013/12/09 11:42:16 stephan.trojansky LAPRO1044 (LAPRO1044\ScanlineVFX): Slave enabled
2013/12/09 12:41:05 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, 2dshared, python]
2013/12/12 22:33:24 stephan.trojansky LAPRO1044 (LAPRO1044\ScanlineVFX): Modified Pool list to: [all, 2dshared, python]
2013/12/13 14:47:03 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, 2dshared, python]
2013/12/13 16:01:37 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, 2dshared, python]
2013/12/16 14:08:03 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, 2dshared, python]
2013/12/17 16:42:54 viet.nguyen lapro2009.scanlinevfxla.com (lapro2009.scanlinevfxla.com\viet): Modified Pool list to: [all, 2dshared, python]
2013/12/19 18:13:49 robert.crowther VCPRO1007 (SCANLINEVFXLA\robert.crowther): Modified Pool list to: [all, 2dshared, python]
2013/12/20 13:40:54 viet.nguyen lapro2009.scanlinevfxla.com (lapro2009.scanlinevfxla.com\viet): Modified Pool list to: []
2013/12/20 14:20:02 scanlinevfx LAPRO0203 (LAPRO0203\ScanlineVFX): Modified Pool list to: []
2013/12/20 14:20:13 scanlinevfx LAPRO0500 (LAPRO0500\scanlinevfx): Modified Pool list to: []
2013/12/20 14:21:11 scanlinevfx LAPRO0203 (LAPRO0203\ScanlineVFX): Modified Pool list to: []
2013/12/20 14:22:09 scanlinevfx LAPRO0500 (LAPRO0500\scanlinevfx): Modified Pool list to: []
2013/12/20 14:22:19 scanlinevfx LAPRO0203 (LAPRO0203\ScanlineVFX): Modified Pool list to: []
2013/12/20 14:24:01 scanlinevfx LAPRO0203 (LAPRO0203\ScanlineVFX): Modified Pool list to: [all, python, 2dshared]
2013/12/20 14:24:05 scanlinevfx LAPRO0500 (LAPRO0500\scanlinevfx): Modified Pool list to: []
2013/12/20 14:27:02 scanlinevfx LAPRO0203 (LAPRO0203\ScanlineVFX): Modified Pool list to: []
2013/12/20 14:27:05 scanlinevfx LAPRO0500 (LAPRO0500\scanlinevfx): Modified Pool list to: []
2013/12/20 14:29:00 scanlinevfx LAPRO0203 (LAPRO0203\ScanlineVFX): Modified Pool list to: []
2013/12/20 14:29:25 scanlinevfx LAPRO0500 (LAPRO0500\scanlinevfx): Modified Pool list to: []
2013/12/20 14:30:19 scanlinevfx LAPRO0203 (LAPRO0203\ScanlineVFX): Modified Pool list to: []
2013/12/20 14:31:32 scanlinevfx LAPRO0500 (LAPRO0500\scanlinevfx): Modified Pool list to: []
2013/12/20 14:32:00 scanlinevfx LAPRO0203 (LAPRO0203\ScanlineVFX): Modified Pool list to: [all, python, 2dshared]
2013/12/20 14:33:58 scanlinevfx LAPRO0500 (LAPRO0500\scanlinevfx): Modified Pool list to: []
2013/12/20 14:35:42 scanlinevfx LAPRO0500 (LAPRO0500\scanlinevfx): Modified Pool list to: [all, python, 2dshared]
2013/12/20 14:37:33 scanlinevfx LAPRO0203 (LAPRO0203\ScanlineVFX): Modified Pool list to: [all, python, 2dshared]
2013/12/20 14:37:37 scanlinevfx LAPRO0500 (LAPRO0500\scanlinevfx): Modified Pool list to: [all, python, 2dshared]
2013/12/20 14:38:57 scanlinevfx LAPRO0203 (LAPRO0203\ScanlineVFX): Modified Pool list to: [all, python, 2dshared]
2013/12/20 14:41:00 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2013/12/20 14:50:25 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Group list to: [nuke, python]
2013/12/20 15:30:00 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2013/12/20 18:03:22 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2013/12/20 18:22:05 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2013/12/20 18:39:19 robert.crowther VCPRO1007 (SCANLINEVFXLA\robert.crowther): Modified Pool list to: [all, python, 2dshared]
2013/12/23 11:53:54 scanlinevfx LAPRO0203 (LAPRO0203\ScanlineVFX): Modified Pool list to: [all, python, 2dshared]
2013/12/23 11:55:17 scanlinevfx LAPRO0203 (LAPRO0203\ScanlineVFX): Modified Pool list to: [all, python, 2dshared]
2013/12/23 11:57:14 scanlinevfx LAPRO0203 (LAPRO0203\ScanlineVFX): Modified Pool list to: [all, python, 2dshared]
2013/12/23 12:12:12 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2013/12/23 16:34:59 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2013/12/23 17:47:09 viet.nguyen lapro2009.scanlinevfxla.com (lapro2009.scanlinevfxla.com\viet): Modified Pool list to: [all, python, 2dshared]
2013/12/26 16:11:57 viet.nguyen lapro2009.scanlinevfxla.com (lapro2009.scanlinevfxla.com\viet): Modified Pool list to: [all, python, 2dshared]
2013/12/31 14:45:42 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2013/12/31 17:16:27 robert.crowther VCPRO1007 (SCANLINEVFXLA\robert.crowther): Modified Pool list to: [all, python, 2dshared]
2013/12/31 18:02:35 robert.crowther VCPRO1007 (SCANLINEVFXLA\robert.crowther): Modified Pool list to: [all, python, 2dshared]
2014/01/11 20:35:24 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: []
2014/01/11 20:44:39 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Group list to: [python, nuke]
2014/01/11 20:52:46 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2014/01/13 11:15:46 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2014/01/14 14:35:29 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2014/01/14 15:06:16 robert.crowther LAPRO2047 (SCANLINEVFXLA\robert.crowther): Modified Pool list to: [all, python, 2dshared]
2014/01/15 14:49:26 robert.crowther LAPRO2047 (SCANLINEVFXLA\robert.crowther): Modified Pool list to: [all, python, 2dshared]
2014/01/15 14:53:52 robert.crowther LAPRO2047 (SCANLINEVFXLA\robert.crowther): Modified Pool list to: [all, python, 2dshared]
2014/01/15 18:17:05 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2014/01/15 18:19:27 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2014/01/16 11:15:40 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2014/01/20 10:36:18 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2014/01/20 10:55:24 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2014/01/20 11:02:48 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2014/01/20 12:29:05 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2014/01/20 15:50:50 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2014/01/21 13:43:28 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Group list to: [nuke, python]
2014/01/21 13:45:44 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: []
2014/01/21 13:49:16 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: []
2014/01/21 13:52:12 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2014/01/21 13:57:56 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]
2014/01/21 14:03:44 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, 2dshared]

I’m working on getting you the mongo log. It looks like the log hasn’t rotated since the loss of power, so I’m searching a 44 gig file. I’ll post what I find.

I’m also wondering why there’s so many log entries of me modifying the pools. That is alarming, because that’s surely not coming from me. I have no reason to be doing that all of the time.

Okay, it JUST happened again. Robert gracefully shut down his monitor and then did a reboot. As soon as it came back up, everything was removed again. Here is one of the logs from a primary slave that had been disabled and offline when robert rebooted his machine. All of our secondary slaves still have a pending removal of the pools right now.

2014/01/16 11:15:46 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, deliveries, 3d, 2d]
2014/01/20 10:36:24 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, deliveries, 3d, 2d]
2014/01/20 10:55:29 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, deliveries, 3d, 2d]
2014/01/20 11:02:53 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, deliveries, 3d, 2d]
2014/01/20 12:29:11 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, deliveries, 3d, 2d]
2014/01/20 15:50:55 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, deliveries, 3d, 2d]
2014/01/21 10:20:24 robert.crowther LAPRO2047 (SCANLINEVFXLA\robert.crowther): Slave disabled
2014/01/21 13:45:50 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: []
2014/01/21 13:49:22 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: []
2014/01/21 13:52:19 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: []
2014/01/21 13:58:03 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, deliveries, 3d, 2d]
2014/01/21 14:03:50 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, deliveries, 3d, 2d]
2014/01/21 14:09:32 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Group list to: [max2012_64, maya, nuke, python]

Here’s another log from a machine that just had all its settings removed. There is nothing that shows the machine being re-enabled. Robert rebooted his machine around 15:45 today and we disabled the machine again at 15:49.

2014/01/13 10:15:12 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Slave disabled
2014/01/13 11:15:40 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [python, 3d, 2d]
2014/01/14 14:35:21 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [python, 3d, 2d]
2014/01/14 15:06:08 robert.crowther LAPRO2047 (SCANLINEVFXLA\robert.crowther): Modified Pool list to: [python, quicktime, 3d, 2d]
2014/01/15 14:49:20 robert.crowther LAPRO2047 (SCANLINEVFXLA\robert.crowther): Modified Pool list to: [python, 3d, deliveries, 2d]
2014/01/15 14:53:46 robert.crowther LAPRO2047 (SCANLINEVFXLA\robert.crowther): Modified Pool list to: [python, deliveries, 3d, 2d]
2014/01/15 18:16:59 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, deliveries, 3d, 2d]
2014/01/15 18:19:22 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, deliveries, 3d, 2d]
2014/01/15 18:20:11 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Slave Exclude None Pool modified to: ‘True’
Slave Exclude None Group modified to: ‘True’
2014/01/16 11:15:34 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, deliveries, 3d, 2d]
2014/01/20 10:36:11 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, deliveries, 3d, 2d]
2014/01/20 10:55:18 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, deliveries, 3d, 2d]
2014/01/20 11:02:42 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, deliveries, 3d, 2d]
2014/01/20 12:28:57 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, deliveries, 3d, 2d]
2014/01/20 15:50:44 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, deliveries, 3d, 2d]
2014/01/21 10:20:24 robert.crowther LAPRO2047 (SCANLINEVFXLA\robert.crowther): Slave disabled
2014/01/21 13:45:38 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: []
2014/01/21 13:49:10 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: []
2014/01/21 13:52:06 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: []
2014/01/21 13:57:51 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, deliveries, 3d, 2d]
2014/01/21 14:03:37 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Pool list to: [all, python, deliveries, 3d, 2d]
2014/01/21 14:09:31 jon.bird LAPRO2056 (LAPRO2056\ScanlineVfx_user): Modified Group list to: [max2012_64, maya, nuke, python]
2014/01/21 15:49:37 robert.crowther LAPRO2047 (SCANLINEVFXLA\robert.crowther): Slave disabled

On my monitor, for the secondary slaves I see a pending removal of the pools, which looks like this.
all,python,2dshared []

Robert’s monitor doesn’t show the brackets at the end.

I hope I’m correct in assuming that machines that have a pending pool assignment are waiting for the slave to become idle before making the change. In these cases, even though the slave is idle it is not changing the pools. If I stop the slave and then relaunch it, the change takes effect immediately and the pools are removed.

For the time being, Robert and I will be logging out of super user mode before closing the monitor. I hope that at least improves the frequency of this problem.

I tried closing down my monitor so that maybe all of these pending pool changes wouldn’t go through. After launching the Monitor again the brackets on the pool assignments were gone, and I successfully closed and re-opened a slave without its pools being lost. Not sure what’s going on here, but I would really like to have multiple super users without having to worry about all of the info being lost because of it.

Which version of Deadline 6.1 are you guys currently running? A bug was fixed in 6.1.54343 (RC 1) that could cause slave settings that haven’t been modified to be re-saved when clicking OK in the Pool Management dialog. I wouldn’t expect the bug to wipe anything, but regardless it would be good to know which version you guys are running.

In the meantime, I’m trying to figure out what would cause this behavior…

Thanks!
Ryan

We’re using v6.1.0.54178

Thanks! I would definitely recommend upgrading then to see if the changes we made for RC 1 help here. At the very least, it should cut down on the amount pool changes in the slaves’ history, since a history entry will only be logged for slaves when their pool list is actually modified. If the problem still occurs, there will be less to sift through.

Just a heads up that we should be releasing RC 3 later today, so if you are planning to upgrade, you might as well wait until RC 3 is available.

Cheers,
Ryan

Oh yes, we’re going to do the upgrade. We’re waiting for a good time, might happen on Sunday, might happen mid-week.

Okay so this is happening again this morning. I can see right now, pending changes to the pools/groups all over the farm. We’re losing descriptions, pools, groups, statuses… We are upgraded to RC3 but still experiencing random blackouts of our slave settings. What should I do?

Here’s a screenshot of the pending wipes of the slave settings. There is nothing appearing in the slave history. You can see these pending changes on multiple machines, including non-super users.

Do you guys have any custom scripts that use deadlinecommand or the Deadline scripting API to change any slave settings (like pools or groups)? That’s the only thing we can think of that could affect the slave settings like this, since any other saving of the slave settings requires user interaction (like in the pools dialog or the slave settings dialog).

Cheers,
Ryan

I have a handful of scripts for Deadline API, but none of them interact with the slaves. I’ll give you a full list of the scripts we’ve been using.

Job scripts:
1 - A script that increases job priority and machine limit
2 - A script that increases task timeout
3 - Another script that adjusts machine limit and priority
4 - One more script that adjusts machine limit and priority

Slave scripts:
Nothing that makes any actual changes, but just send emails with lists of selected slaves, or copy the slaves to clipboard.

As for Deadlinecommand, we have not really been using that. I wanted to make sure that these pending wipes wouldn’t go through, so I closed the monitor and re-opened it (that worked last time), but this time when I re-opened the monitor there were still some pending changes to the slaves. I decided to close the monitor again, and also exit the launcher. After I re-opened the monitor a 3rd time, it looked like nothing had happened. It not only stopped the pending wipes, but it reverted mostly everything it had done. This makes be feel like it’s a bug with the launcher on my local box.

Privacy | Site terms | Cookie preferences