AWS Thinkbox Discussion Forums

connecting to slave log - long wait

Sometimes it takes in the range of 5+ minutes to connect to a slave log. Not sure why,… I would say 9/10 times, it takes longer to connect to the slave log than to render the image. Making log hunting a bit of a challenge…

What i do now to illustrate the issue:

  • click connect to slave log
  • wait 10-20 seconds
  • when nothing happens, remote desktop into the machine
  • click connect to slave log on the local slave
  • wait 10-20 seconds
  • when nothing happens, navigate to the log folder
  • open the latest log, find the task
  • refresh the file as it renders
  • frame finishes
  • local log opens
  • close it
  • close remote desktop
  • remote log opens on my workstation

I think this is related somehow to how deadline loads its libraries in .net

We have several site packages folders in the site directory listing, and it seems that every single dll deadline tries to import goes through all those searchpaths, then after it didnt find them, it goes and finds it right next to the executable…

For example, i get the following dlls being searched for in each and every site package folder we have:

System.IEquatable.dll
System.IEquatable.exe
System.IObservable.dll
System.IObservable.exe
System.IObserver.dll
System.IObserver.exe
System.IProgress.dll
System.IProgress.exe
System.Lazy.dll
System.Lazy.exe

For example:

4:58:47.3417266 PM deadlinecommandbg.exe 9440 CreateFile C:\Program Files\Thinkbox\Deadline6\bin\python26.zip\System.IObservable.dll PATH NOT FOUND
4:58:47.3418889 PM deadlinecommandbg.exe 9440 CreateFile C:\Program Files\Thinkbox\Deadline6\bin\python26.zip\System.IObservable.exe PATH NOT FOUND
4:58:47.3427226 PM deadlinecommandbg.exe 9440 CreateFile C:\Program Files\Thinkbox\Deadline6\bin\Dlls\System.IObservable.dll NAME NOT FOUND
4:58:47.3435198 PM deadlinecommandbg.exe 9440 CreateFile C:\Program Files\Thinkbox\Deadline6\bin\Dlls\System.IObservable.exe NAME NOT FOUND
4:58:47.3443004 PM deadlinecommandbg.exe 9440 CreateFile C:\Program Files\Thinkbox\Deadline6\bin\lib\System.IObservable.dll NAME NOT FOUND
4:58:47.3450743 PM deadlinecommandbg.exe 9440 CreateFile C:\Program Files\Thinkbox\Deadline6\bin\lib\System.IObservable.exe NAME NOT FOUND
4:58:47.3452342 PM deadlinecommandbg.exe 9440 CreateFile C:\Program Files\Thinkbox\Deadline6\bin\lib\plat-win\System.IObservable.dll PATH NOT FOUND
4:58:47.3453783 PM deadlinecommandbg.exe 9440 CreateFile C:\Program Files\Thinkbox\Deadline6\bin\lib\plat-win\System.IObservable.exe PATH NOT FOUND
4:58:47.3464574 PM deadlinecommandbg.exe 9440 CreateFile C:\Program Files\Thinkbox\Deadline6\bin\lib\lib-tk\System.IObservable.dll NAME NOT FOUND
4:58:47.3474940 PM deadlinecommandbg.exe 9440 CreateFile C:\Program Files\Thinkbox\Deadline6\bin\lib\lib-tk\System.IObservable.exe NAME NOT FOUND
4:58:47.3481557 PM deadlinecommandbg.exe 9440 CreateFile C:\Program Files\Thinkbox\Deadline6\bin\System.IObservable.dll NAME NOT FOUND
4:58:47.3487957 PM deadlinecommandbg.exe 9440 CreateFile C:\Program Files\Thinkbox\Deadline6\bin\System.IObservable.exe NAME NOT FOUND
4:58:47.3496251 PM deadlinecommandbg.exe 9440 CreateFile C:\Program Files\Thinkbox\Deadline6\bin\lib\site-packages\System.IObservable.dll NAME NOT FOUND
4:58:47.3504754 PM deadlinecommandbg.exe 9440 CreateFile C:\Program Files\Thinkbox\Deadline6\bin\lib\site-packages\System.IObservable.exe NAME NOT FOUND

Then it starts with our internal network searchpaths:
4:58:47.3654572 PM deadlinecommandbg.exe 9440 CreateFile \s2.scanlinevfxla.com\exchange\software\managed\pythonScripts\sitecustomize\System.IObservable.dll NAME NOT FOUND
4:58:47.3798876 PM deadlinecommandbg.exe 9440 CreateFile \s2.scanlinevfxla.com\exchange\software\managed\pythonScripts\sitecustomize\System.IObservable.exe NAME NOT FOUND
4:58:49.0057182 PM deadlinecommandbg.exe 9440 CreateFile \inferno2\exchange1\software\managed\pythonScripts\sitecustomize\System.IObservable.dll BAD NETWORK NAME
4:58:50.5982457 PM deadlinecommandbg.exe 9440 CreateFile \inferno2\exchange1\software\managed\pythonScripts\sitecustomize\ BAD NETWORK NAME
4:58:52.9255490 PM deadlinecommandbg.exe 9440 CreateFile \inferno2\exchange1\software\managed\pythonScripts\sitecustomize\System.IObservable.exe BAD NETWORK NAME
4:58:54.5392811 PM deadlinecommandbg.exe 9440 CreateFile \inferno2\exchange1\software\managed\pythonScripts\sitecustomize\ BAD NETWORK NAME
4:58:55.3482471 PM deadlinecommandbg.exe 9440 CreateFile \inferno2.scanlinevfxla.com\exchange\software\managed\pythonScripts\sitecustomize\System.IObservable.dll NAME NOT FOUND

etc. It goes on for a while. Every dll takes about 5-20 seconds, and there are hundreds (each class separately)

Thenonce it failed to find that file:

4:59:04.0629254 PM deadlinecommandbg.exe 9440 CreateFile C:\Program Files\Thinkbox\Deadline6\bin\Dlls\System.IObservable.exe NAME NOT FOUND
4:59:04.0637100 PM deadlinecommandbg.exe 9440 CreateFile C:\Program Files\Thinkbox\Deadline6\bin\UI\System.IObservable.dll NAME NOT FOUND
4:59:04.0644803 PM deadlinecommandbg.exe 9440 CreateFile C:\Program Files\Thinkbox\Deadline6\bin\UI\System.IObservable.exe NAME NOT FOUND

It moves on to the next one…:

4:59:04.0658324 PM deadlinecommandbg.exe 9440 CreateFile C:\Program Files\Thinkbox\Deadline6\bin\python26.zip\System.IObserver.dll PATH NOT FOUND
4:59:04.0659899 PM deadlinecommandbg.exe 9440 CreateFile C:\Program Files\Thinkbox\Deadline6\bin\python26.zip\System.IObserver.exe PATH NOT FOUND

Yeah this looks to be a side-effect of the way Python.NET tries to load modules. Whenever an import is made, it checks Python first, to see if it can find a Python module that matches (which essentially amounts to a file exists check); I’ve always thought this was kind of dumb, but it never seemed to have enough impact on start-up to justify tinkering with code that was that deep (until now).

I guess it gets a lot worse when you have more site packages :frowning:. I’ll have a look at switching the order of this import process to make it faster (hopefully for the next 7 beta build).

Cheers,
Jon

Thanks!

Hey Laszlo,

After doing some poking around, I do believe we can make the load times better on our end, but I’ve also not been able to replicate the same crazy load times you’re seeing; even if there’s a non-existent network path in my PYTHONPATH. Looking at your procmon logs, it seems the real killer are the attempts from this particular share:

After some googlin’, it seems “BAD NETWORK NAME” can be returned when there’s a folder in the hierarchy that doesn’t have read permissions… Can you guys check on your end to see if one of the folders in the “\inferno2\exchange1\software\managed\pythonScripts\sitecustomize” chain is lacking proper (ie: read) permissions?

Cheers,
Jon

Its not a python path issue, create a sitecustomize.py in libs/site-packages, and then add some slow network drives to site.addsitedir.

Then you will see the behavior im referring to.

On the particular machine i copied the log from, the windows opentext (nfs) client is not installed, so that network path is unavailable. Its just one out of 5 paths though, so its not the biggest contributor, but yes, its a factor.

Yeah, I added a bunch of network shares to my site config for testing (I used a .pth file, but that shouldn’t be any different; either way they get added to sys.path), but never got to multiple minutes like you seem to be seeing. I plan on stripping out the main source of these calls in Python.NET completely, unless I find an issue with that, which should speed things up a lot.

I just wanted to make sure that the one share wasn’t what was causing the majority of your headaches, since it seems the failed calls to that share in particular were taking ~1.5s each (as opposed to the ~10-20ms per share I’m seeing here).

No, its a share that exists on all farm machines, but not workstations. I started debugging the issue on farm boxes, but i happened to do the packet trace on a workstation which did not have that share. It certainly is slower on boxes missing that link…

If it’s a share that doesn’t exist everywhere, you can try switching to adding paths through .pth file in the site-packages folder (or at least that one specifically). It checks to make sure the path actually exists before adding it to the sys.path:
docs.python.org/2/library/site.html

Might speed things up on your workstations, at least :slight_smile:

I’ve been sitting here watching deadline slave take 10+ mins to launch…

2014-08-29 11:19:35: Deadline Slave 6.2 [v6.2.1.33 R (1e480b6c5)]
2014-08-29 11:29:17: Auto Configuration: A ruleset has been received

Would be great if this could be patched at one point

Hey Laszlo,

Given that this affects python imports, it’s a pretty wide-reaching change, so I’m not sure that it will make it into a public release of 6.2.

That said, since the changes are entirely restricted to the Python.NET dll, it should just be a drop-in replacement for anyone who needs it; I’ve re-compiled it with my changes, targeting python 2.6, and attached it here. Try dropping it into your client install folder on a machine that’s taking a while, and let me know how it goes!

Cheers,
Jon
Python.Runtime.zip (41.6 KB)

Thanks Jon, will be testing this today!!

I wont lie. This is F*N FANTASTIC!!! Deadline monitor now starts in <3 seconds (vs 2-3 mins previously)!!!

I have tears in my eyes guys. This is fantastic. I assumed the slow startup was due to database loads, the amount of data, whatever.

But now it starts almost instantly.

I want to dance (i wont)

That’s great news! I was definitely hoping for that kind of speedup, but was afraid of promising anything :slight_smile:

Definitely be mindful of any weird import errors that weren’t there before though, I tested most of the cases I could think of but there might still be some weird import flow that might not work with my changes. So let me know if you run into any of those!

Cheers,
Jon

I tried a couple of our scripts, and they all seem to have imported fine, but in any case, i notified our wranglers if they run into anything.

I’m restarting the monitor now just for fun.

Privacy | Site terms | Cookie preferences