Deadline Slave takes ages to connect to the repository


#1

Hey,

Happy new year to all of you.

We recently updated some of our render nodes to Windows 10 (1809 - fresh installation) and installed all of our host applications. Deadline runs in version 10.0.23.4 and everything seems to work. What is odd is that the slave takes up to 5 minutes until its started since the process of “Connecting to Repository” seems to be stuck somehow.

I disabled the Windows Firewall, the client can connect to the Repository in the explorer without any delay.

Does anyone of you experienced a similar behaviour of Windows 10 or maybe had this in the past? I’m a bit stuck now and don’t know where to look.

Thanks in advance.


#2

Hello!

Can you please send a slave log from a machine that takes 5 minutes to connect? Hopefully the log will indicate where the hold up is.

https://docs.thinkboxsoftware.com/products/deadline/10.0/1_User%20Manual/manual/logs.html#application-logs

Thanks!

Charles


#3

Hey Charles,

here is the last log from one of these machines:

2019-01-03 18:39:58: BEGIN - BLADE-2D-110\render
2019-01-03 18:39:58: Operating System: Windows 10 Pro for Workstations
2019-01-03 18:39:58: CPU Architecture: x64
2019-01-03 18:39:58: CPUs: 4
2019-01-03 18:39:58: Video Card: Microsoft Basic Display Adapter
2019-01-03 18:39:58: Deadline Slave 10.0 [v10.0.23.4 Release (e0d42871d)]
2019-01-03 18:39:58: Auto Configuration: No auto configuration for Repository Path could be detected, using local configuration
2019-01-03 18:41:52: Info Thread - Created.
2019-01-03 18:42:00: INFO: Synchronizing with network Font folder ‘\nyx\DL_Fonts’
2019-01-03 18:42:00: INFO: Synchronizing to local Font folder ‘C:\Users\render\AppData\Local\Temp\DeadlineFonts’
2019-01-03 18:46:23: INFO: On Slave Started
2019-01-03 18:46:23: Auto Configuration: Picking configuration based on: blade-2d-110 / 10.40.30.110
2019-01-03 18:46:23: Auto Configuration: Match found for rule render nodes (IPMatch)
2019-01-03 18:46:23: Auto Configuration: A ruleset has been found

Seems to be the Font sync?


#4

The fonts are being sync’d to a temp folder which means they can be cleaned up which would require syncing each time that happens. In Tools > Configure Events > FontSync you can disable the temp folder option.


#5

I did that and also set the ‘Skip existing fonts’ to TRUE but it still needs around 4 minutes until it’s sync’d.

> 2019-01-03 19:26:35:  BEGIN - BLADE-2D-110\render
> 2019-01-03 19:26:35:  Operating System: Windows 10 Pro for Workstations
> 2019-01-03 19:26:35:  CPU Architecture: x64
> 2019-01-03 19:26:35:  CPUs: 4
> 2019-01-03 19:26:35:  Video Card: Microsoft Basic Display Adapter
> 2019-01-03 19:26:35:  Deadline Slave 10.0 [v10.0.23.4 Release (e0d42871d)]
> 2019-01-03 19:26:35:  Auto Configuration: A ruleset has been received from Pulse
> 2019-01-03 19:26:36:  Info Thread - Created.
> 2019-01-03 19:26:42:  INFO: Synchronizing with network Font folder '\\nyx\DL_Fonts'
> 2019-01-03 19:26:42:  INFO: Synchronizing to local Font folder 'C:\WINDOWS\Temp'
> 2019-01-03 19:30:22:  INFO: On Slave Started
> 2019-01-03 19:30:22:  Auto Configuration: Picking configuration based on: blade-2d-110 / 10.40.30.110
> 2019-01-03 19:30:22:  Auto Configuration: Match found for rule render nodes (IPMatch)
> 2019-01-03 19:30:22:  Auto Configuration: A ruleset has been found

I checked on one of the old machines and it only takes around 30 seconds. That’s a bit weird, isn’t it?

I mean the font folder is around 600MB but I’m wondering why there is such a big difference in the sync time.


#6

Have you confirmed if the sync is working for the machine that takes 4 minutes, maybe there is an error that is not being printed.

One of the older machines is a different version of Deadline? Or just a different WIndows version?


#7

I’ve created a new DeadlineFonts folder on that node where the fonts are being sync’d to. I’ve started the slave and watched the fonts being copied. The actual copy time was only around 25 seconds, after that it did just nothing for another 8 minutes.

2019-01-03 22:07:42:  INFO: Synchronizing to local Font folder 'C:\WINDOWS\Temp\DeadlineFonts'
2019-01-03 22:16:16:  INFO: On Slave Started

I’ve restarted the slave application again to see if the fonts are copied again or just being sync’d and it seems the sync worked this time.

2019-01-03 22:25:15:  INFO: Synchronizing to local Font folder 'C:\WINDOWS\Temp\DeadlineFonts'
2019-01-03 22:25:20:  INFO: On Slave Started

After restarting the machine, the slave start-up takes 4 minutes again. The fonts itself are not touched / copied again, but I can’t see what is happening during this time.

2019-01-03 22:28:40:  INFO: Synchronizing to local Font folder 'C:\WINDOWS\Temp\DeadlineFonts'
2019-01-03 22:32:22:  INFO: On Slave Started

The older machines are the same setup, same Deadline version, just Windows 7 instead of 10.


#8

Just to confirm if the issue is with the Font Sync event or not. Can you disable the event plugin and run your tests again to see if there are still long load times?


#9

I’ve tested the start-up time without the Font Sync event and it starts almost instantly.

2019-01-08 00:22:25:  Info Thread - Created.
2019-01-08 00:22:31:  INFO: On Slave Started

#10

A little more input here and kind of a new issue.
I setup another new Windows 10 node and the font sync shows a different issue (at least I think it is one):

2019-01-10 14:50:34:  BEGIN - BLADE-3D-135\render
2019-01-10 14:50:34:  Operating System: Windows 10 Pro
2019-01-10 14:50:34:  CPU Architecture: x64
2019-01-10 14:50:34:  CPUs: 24
2019-01-10 14:50:34:  Video Card: Microsoft Basic Display Adapter
2019-01-10 14:50:34:  Deadline Slave 10.0 [v10.0.23.4 Release (e0d42871d)]
2019-01-10 14:50:34:  Auto Configuration: A ruleset has been received from Pulse
2019-01-10 14:50:36:  Info Thread - Created.
2019-01-10 14:50:44:  INFO: Synchronizing with network Font folder '\\nyx\DL_Fonts'
2019-01-10 14:50:44:  An error occurred in the "OnSlaveStarted" function in events plugin 'FontSync': AttributeError : 'FontSyncListener' object has no attribute 'FailRender' (Python.Runtime.PythonException)
2019-01-10 14:50:44:    File "none", line 44, in SlaveStarted
2019-01-10 14:50:44:    File "none", line 71, in SyncFont
2019-01-10 14:50:44:   (Deadline.Events.DeadlineEventPluginException)
2019-01-10 14:50:44:     at Deadline.Events.DeadlineEventPlugin.b(String aoi, Exception aoj)
2019-01-10 14:50:44:     at Deadline.Events.DeadlineEventPlugin.OnSlaveStarted(String slaveName)
2019-01-10 14:50:44:     at Deadline.Events.DeadlineEventManager.OnSlaveStarted(String slaveName, DataController dataController)
2019-01-10 14:50:44:  ---------- Inner Stack Trace (Python.Runtime.PythonException) ----------
2019-01-10 14:50:44:    File "none", line 44, in SlaveStarted
2019-01-10 14:50:44:    File "none", line 71, in SyncFont
2019-01-10 14:50:45:  INFO: On Slave Started
2019-01-10 14:50:45:  Auto Configuration: Picking configuration based on: blade-3d-135 / 10.40.30.135
2019-01-10 14:50:45:  Auto Configuration: Match found for rule render nodes (IPMatch)
2019-01-10 14:50:45:  Auto Configuration: A ruleset has been found

What is it actually complaining about?


#11

Hmm. So, the problem here is that our reporting on the failure is trying to call a function that doesn’t exist when the Slave is starting up. The error it’s trying to report is thrown when it can’t detect the folder in play here, namely “\nyx\DL_Fonts”.

Are your shares password protected? You may need to add authentication information into Windows so that the user who is running the Slave doens’t need to provide a password while connecting.


#12

Weird. The authentication information is already in place.
If I connect to the share on the slave I can access it without any authentification.

I’ll have a look if I may have overlooked something…


#13

Found the issue.

It was not the share it couldn’t connect to but the Local Windows Font Folder where the fonts get copied to.
I created a new “DeadlineFonts” folder in C:\Windows\Temp\ when I tested something with Charles last week. This slave simply did not have the folder yet. The sync works now but it needs some time again after the sync has been finished until it starts up.

2019-01-10 18:19:26:  BEGIN - BLADE-3D-135\render
2019-01-10 18:19:26:  Operating System: Windows 10 Pro
2019-01-10 18:19:26:  CPU Architecture: x64
2019-01-10 18:19:26:  CPUs: 24
2019-01-10 18:19:26:  Video Card: Microsoft Basic Display Adapter
2019-01-10 18:19:26:  Deadline Slave 10.0 [v10.0.23.4 Release (e0d42871d)]
2019-01-10 18:19:27:  Auto Configuration: A ruleset has been received from Pulse
2019-01-10 18:19:28:  Info Thread - Created.
2019-01-10 18:19:33:  INFO: Synchronizing with network Font folder '\\nyx\DL_Fonts'
2019-01-10 18:19:33:  INFO: Synchronizing to local Font folder 'C:\WINDOWS\Temp\DeadlineFonts'
2019-01-10 18:24:24:  INFO: On Slave Started
2019-01-10 18:24:24:  Auto Configuration: Picking configuration based on: blade-3d-135 / 10.40.30.135
2019-01-10 18:24:24:  Auto Configuration: Match found for rule render nodes (IPMatch)
2019-01-10 18:24:24:  Auto Configuration: A ruleset has been found

It is not a major problem and we could “live” with it but I’d be intrested where this delay comes from, since the slave start-up on our “old” setup on Windows 7 only took a few seconds after the sync has been completed. Not sure if it is OS related but that is the only obvious change I can think of…

Any ideas?


#14

It may be the broadcast message we send to tell all windows to sync:

Note that the regfont tool mentioned in that thread isn’t required now that we’re broadcasting that message.

Just to make sure it’s the same problem, can you check a few other logs? I’m expecting the gap length for that call to complete to be random. I’m not able to access some old cases at the moment, but I’ll try and take a look on your next response.


#15

Hey Edwin,

Thank you for your reply.

I tidied up our fonts folder (since there were also some pdf’s, some weird font formats and some hidden files in there) and created a new temp font folder on the root level of that node.

When I did the first (fresh) sync it only took 40 seconds until the slave was up.

2019-01-14 13:23:08: INFO: Synchronizing to local Font folder ‘C:\WINDOWS\Temp’
2019-01-14 13:23:49: INFO: On Slave Started

The second sync took only 5 seconds and the slave was up and running.

2019-01-14 13:26:03: INFO: Synchronizing to local Font folder ‘C:\WINDOWS\Temp’
2019-01-14 13:26:08: INFO: On Slave Started

I switched the the temp folder back to the old one, cleaned it up as well and started the slave again and voilà… No delay anymore. So I guess it was something within that font folder which caused this weird behaviour. Also tested it on a second node with the same result.

I’ll have a look if I can figure out which file it was that caused the delay. Maybe it is helpful for someone else.