long startup time

Hi,

our big multicore linux rendermachine is doing strange things with nuke renders for a while now. It renders the frames, the rendertime is normal, but the startup time is abnormally long, 20-25s, while on other machine it is only 3-4sec. What is included in the startup time, that could last for so long?
Snippet from log:

0: INFO: Process exit code: 0
0: Render time for frame: 27.783 s
0: Total time for task: 38.502 s

Another strange thing, that as I looking at the terminal of the slave, listing the log lines, it displays an error, that is not included in the deadline log:

...0: Got task: 000_050_001_37998998_00112_113-113
0: Plugin already loaded: Nuke
0: Task timeout is disabled.
0: Job already loaded: EON_100_001-co_cp_v007.nk
0: Plugin rendering frame(s): 113
ERROR: List of process IDs must follow p.

What could it be?

Thanks,
Gabor

Hi Gabor,

Which version of Deadline are you running? This sounds like a known problem in Deadline 4.0 that was fixed in 4.1.

Cheers,

  • Ryan

I installed Deadline yesterday to give it another go. (I was on the Mac beta a while back while everything was still running on Frantics site.)
I have this problem as well. Startup times are very long (average 43sec) for Nuke scripts, but only via Deadline. Manual terminal commands and submitting the scripts to Smedge (with the same command line settings) takes literally seconds.

The job report log output doesn’t reveal any errors (at least to me):

=======================================================
Log Message
=======================================================
0: Task timeout is disabled.
0: Loaded job: farmtest.nk (999_050_999_41ad93cb)
0: INFO: StartJob: initializing script plugin Nuke
0: INFO: About: Nuke Plugin for Deadline
0: INFO: Scrubbing the LD and DYLD LIBRARY paths
0: INFO: Prepping OFX cache
0: INFO: Checking Nuke temp path: /var/tmp/nuke-u501
0: INFO: Path already exists
0: INFO: OFX cache prepped
0: Plugin rendering frame(s): 1-10
0: INFO: Any stdout that matches the regular expression "READY FOR INPUT" will be handled as appropriate
0: INFO: Any stdout that matches the regular expression ".*ERROR:.*" will be handled as appropriate
0: INFO: Any stdout that matches the regular expression ".*Error:.*" will be handled as appropriate
0: INFO: Any stdout that matches the regular expression ".* seconds to execute" will be handled as appropriate
0: INFO: Any stdout that matches the regular expression ".* took [0-9]*\.[0-9]* seconds" will be handled as appropriate
0: INFO: Stdout Handling Enabled: True
0: INFO: Popup Handling Enabled: True
0: INFO: Using Process Tree: True
0: INFO: Hiding DOS Window: True
0: INFO: Creating New Console: False
0: INFO: Render Executable: "/Applications/Nuke6.1v2/Nuke6.1v2.app/Contents/MacOS/Nuke6.1v2"
0: INFO: Using 1 threads for rendering
0: INFO: Render Argument: -V -x -m 1 "/var/folders/Fs/FsnxBpwKECKs0vvL4S4pC++++TI/-Tmp-/farmtest_thread0.nk" 1,10
0: INFO: Startup Directory: "/Applications/Nuke6.1v2/Nuke6.1v2.app/Contents/MacOS"
0: INFO: Process Priority: BelowNormal
0: INFO: Process is now running
0: STDOUT: Nuke 6.1v2, 64 bit, built Oct  1 2010.
0: STDOUT: Copyright (c) 2010 The Foundry Visionmongers Ltd.  All Rights Reserved.
0: STDOUT: Loading /Applications/Nuke6.1v2/Nuke6.1v2.app/Contents/MacOS/plugins/init.tcl
0: STDOUT: Loading /Applications/Nuke6.1v2/Nuke6.1v2.app/Contents/MacOS/plugins/init.py
0: STDOUT: Loading /Applications/Nuke6.1v2/Nuke6.1v2.app/Contents/MacOS/plugins/setenv.tcl
0: STDOUT: Loading /Applications/Nuke6.1v2/Nuke6.1v2.app/Contents/MacOS/plugins/Tracker3.dylib
0: STDOUT: Loading /Applications/Nuke6.1v2/Nuke6.1v2.app/Contents/MacOS/plugins/formats.tcl
0: STDOUT: Loading /Users/alex/.nuke/init.py
0: STDOUT: Loading /Applications/Nuke6.1v2/Nuke6.1v2.app/Contents/MacOS/plugins/getenv.tcl
0: STDOUT: Loading /Applications/Nuke6.1v2/Nuke6.1v2.app/Contents/MacOS/plugins/tifReader.tcl
0: STDOUT: Loading /Applications/Nuke6.1v2/Nuke6.1v2.app/Contents/MacOS/plugins/tiffReader.dylib
0: STDOUT: Loading /Applications/Nuke6.1v2/Nuke6.1v2.app/Contents/MacOS/plugins/ColorCorrect.dylib
0: STDOUT: Loading /Applications/Nuke6.1v2/Nuke6.1v2.app/Contents/MacOS/plugins/jpegWriter.dylib
0: STDOUT: Writing /Volumes/.../farmtest.0001.jpg .1.2.3.4.5.6.7.8.9
0: STDOUT: Writing /Volumes/.../farmtest.0001.jpg took 0.14 seconds
...
0: STDOUT: Total render time: 1.21 seconds
0: STDOUT: Allocated 25.2MB, 2% of usage limit of 1.07GB, sbrk = 4.41GB.
0: STDOUT: free_*() calls: 0, new_handler() cleanups: 0.
0: STDOUT: Cache Report
0: STDOUT: ===========================================
0: STDOUT: Current Size: 000 B
0: STDOUT: Max Size: 644.245 MB
0: STDOUT: Percentage Used: 0
0: STDOUT: Read4: 24.9MB 1920x1080 rgb 100% w 10
0: STDOUT: Read4: 259kB tiff reader strip 3x1920x45 w 100
0: INFO: Process exit code: 0


=======================================================
Log Details
=======================================================
Log Date/Time = Dec 08/10  14:09:50
Frames = 1-10

Slave Machine = Happy
Slave Version = v4.1.0.43205 R

Plugin Name = Nuke

Sounds like there is still a performance issue with our path mapping check, which is done prior to the render. Can you possibly post an example Nuke script that takes reproduces this long startup time so that we can investigate further?

In the meantime, you can disable this check to see if it helps (which it should). Go to \your\repository\plugins\Nuke and open Nuke.py in a text editor. Find this line:

		CheckPathMappingInFileAndReplaceSeparator( sceneFilename, self.TempSceneFilename, "\\", "/" )

Place a ‘#’ in front of it and save the file, like this:

		#CheckPathMappingInFileAndReplaceSeparator( sceneFilename, self.TempSceneFilename, "\\", "/" )

Now submit a new Nuke job and let us know if the long startup time persists.

Cheers,

  • Ryan

Hi Ryan,

I didn’t even get that far today. I restarted today (which fixed a few problems I had in my other thread), but now Nuke renders don’t work anymore. Below is the error log. Something about a QtOpenGL version mismatch.
Happy_00000_20101209130041.errorReport (3.57 KB)

However, using the exact same commandline Deadline was going to use works fine.

/Applications/Nuke6.1v2/Nuke6.1v2.app/Contents/MacOS/Nuke6.1v2 -V -x -m 1 /private/var/folders/Fs/FsnxBpwKECKs0vvL4S4pC++++TI/-Tmp-/farmtest_thread0.nk 1,10

Trying to disable the code line you mentioned doesn’t work, as now the Deadline submitter window doesn’t show up anymore in Nuke. When I uncomment it, the window works again.

It is as simple as it gets in my test case. Two file sequences go into a switch, takes switches from one sequence to the other at some time and the whole thing gets written to an EXr sequence on the same disk. Nothing fancy at all in any way.

Hmm, it gets more curious. I switched back to the standard X11 and now Nuke renders fine, but Maya has problems. (see other thread, don’t want to pollute this one)
But I still cannot get that commented code line to work. When I comment it out, I now get the submission window, but the renders on the farm error out, because the file doesn’t get saved properly, or so it seems.

0: STDOUT: Can't read /var/folders/Fs/FsnxBpwKECKs0vvL4S4pC++++TI/-Tmp-/farmtest_thread0.nk: No such file or directory 0: STDOUT: Loading /Volumes/Drobo01/Archiv/Projekte/x_Pipeline/x_AppPlugins/Nuke/plugins/template.nk 0: STDOUT: There are no Write operators in this script 0: INFO: Process exit code: 1 0: An exception occurred: Exception during render: An error occurred in RenderTasks(): Error in CheckExitCode(): Renderer returned non-zero error code, 1. Check the log for more information. at Deadline.Plugins.ScriptPlugin.RenderTasks (Int32 startFrame, Int32 endFrame, System.String& outMessage) [0x00000] in <filename unknown>:0 (Deadline.Plugins.RenderPluginException) 0: Unloading plugin: Nuke Scheduler Thread - Render Thread 0 threw an error: Scheduler Thread - Exception during render: An error occurred in RenderTasks(): Error in CheckExitCode(): Renderer returned non-zero error code, 1. Check the log for more information. at Deadline.Plugins.ScriptPlugin.RenderTasks (Int32 startFrame, Int32 endFrame, System.String& outMessage) [0x00000] in <filename unknown>:0
But with the path checking enabled, the renders run fine, apart from the high startup times. Which, given in my case, are extreme, since I am pretty much just passing through frames and the actual rendertime is negligible.

In fact, I’ll put the Maya error here as well, since it turns out that it is probably related. It reads like there is no file for Maya to render either:
Happy_00000_20101209145012.errorReport (3.46 KB)

Color me confused. I switched back to XQuartz and now Maya renders under XQuartz also report that filename unknown error. Nothing else has changed, but it seems Deadline gets worse with every restart. The only thing that works now, albeit with long startup times, are Nuke renders. No matter which X11 version I am using. And I have a feeling that X11 has nothing to do with scripts not working as expected, right?

The first thing you should do is completely remove the xquartz version of X11 and reinstall X11 that came on your OSX install disk. You’re right that X11 has nothing to do with the render errors you’re getting, but we have not testing Deadline with xquartz, and removing it should hopefully remove a potential problem. Also, which version of Mono do you have installed? Deadline 4.1 requires the latest version of Mono 2.6 (which is currently 2.6.7).

The nuke errors are my bad. By commenting out that line, the Nuke plugin no longer creates the temp scene file, which is why you get the error about the missing Nuke script. After the line you placed the ‘#’ in front of, add this additional line:

		File.Copy( sceneFilename, self.TempSceneFilename, True )

So now the modified code should look like this:

		#CheckPathMappingInFileAndReplaceSeparator( sceneFilename, self.TempSceneFilename, "\\", "/" )
		File.Copy( sceneFilename, self.TempSceneFilename, True )

Save the file and try to render again. Also, I know you’re test Nuke script is simple, but can you post it anyways? We can’t reproduce these long start up times on our end, so maybe it’s something specific about your script.

Finally, let’s leave Maya alone for now and just focus on Nuke and Deadline’s stability. Also, we’ll keep the Maya discussion to your other thread:
viewtopic.php?f=11&t=4424

Cheers,

  • Ryan

Ok, I will keep you posted on my progress. And sorry for spamming you with two separate threads.

My Mono version is Mono JIT compiler version 2.8.1 (tarball Mon Nov 22 09:52:37 MST 2010) downloaded from the Mono site.

I will upload the Nuke script and a couple of sample frames tonight and give you the link.

I disabled the path check and now startup times have halved to 15-19sec. Definitely an improvement. Still a bit slow, but then you are doing some additional checking which isn’t done in manual submission and via Smedge, which pretty much shoots off a simple command line as well as far as I can see.

Thanks for checking this. You should definitely downgrade to 2.6.7, since Deadline is not supported under 2.8.x at this time.

Sounds good. Thanks!

Yeah, there are some additional sanity checks we do (like prepping the OFX cache), but I’m surprised it’s still this slow for you. Maybe we could make the OFX cache prep an optional plugin setting.

I see, I was under the impression that the versions listed in the docs are meant to be minimum versions. Maybe you should think about adjusting the docs to say that you need a specific version. Well, I guess they already do that and I misinterpreted the minimum requirements.

Maybe change:

to

Or something to that extent. It might also just be me being a non-native English speaker.

I’m not working from the most efficient file server at the moment (a Drobo 4 HDD raid with a R/W rate of about 30-40MB/s attached via FireWire). Maybe that has something to do with the slow sanity checks?

I’m also uploading the Nuke test bundle at the moment, but we have a pretty slow network connection at home so it will take a while. It should be up in about 3 hours from now. It contains the Nuke 6.1v2x script and 20 frames of each file sequence I was doing the test with.
alexanderkucera.com/filechute/farmtest.rar (66,8MB) - Edit: It’s up now.

Ok, I’ve downgraded to Mono 2.6.7. Everything works (or doesn’t) as before.

Those sanity checks, how important are they to the operation in a small farm? I mean, we are looking at maybe 6-8 machines in the beginning. And one central server that has all the plugins, etc. Maintenance for such a small farm should be manageable and (maybe) doesn’t really need all the sanity checks all the time. Don’t really know. What do you think?

Thanks for the test script. We ran it on one of our Macs, and it only took a second to perform the path mapping check and all the other sanity checks. Here are the specs for our Mac:

Version: 10.6.4
Processor: 2 GHz Intel Core 2 Duo
Memory: 2 GB

I’ve attached a modified Nuke plugin that doesn’t do the path mapping (like your current working copy), and it also removes the other sanity checks. To install, go to \your\repository\plugins\Nuke and unzip the attached file. Then submit a new Nuke job and see if it makes a difference.

Cheers,

Hi Ryan,

sorry for my dropping off the face face of the earth for such a long time. First I was sick, then on vacation and then we moved. We are still not completely set up, so I am unable to test this further at the moment. I just wanted to let you know that I haven’t lost my interest in Deadline as our farm manager, I just haven’t gotten back to it yet. I will update this thread as soon as possible again.

All the best,
Alex

No worries! :slight_smile:

Hi Ryan,

I am back in the office and have run a quick test with that new Nuke plugin. Everything is unchanged. Average startup times are still 14 seconds while render times are 6 seconds. Slave delay between tasks is set to 1 as are pretty much all other delay or wait settings. I have Pulse running with a repository check frequency of 5 seconds.

I am working on a 8 core MacPro with 2.26GHz, 12GB RAM and OS X 10.6.6. Deadline is running with X11 2.3.6 and Mono is now 2.6.7. The repository is on a relatively slow NAS (Drobo connected via Firewire, but it is very full and quite sluggish at the moment). Just for giggles, I copied all files locally and ran the Nuke script again. Still the same speed. In fact, rendering speed increases to 2 seconds, but startup time stays at an average of 14 seconds.

Is there any debugging I could enable to help you track this down?

Would you be willing to join the 5.0 beta program to see if this problem still persists? We have made additional improvements to the path mapping code that the Nuke plugin uses, and it would be interesting to see if they help your situation at all. If not, at least we have an up-to-date starting point that we can work from to improve things.

This one still baffles me, since we can’t reproduce it (nor has anyone reported a slowdown since 4.0).

If you want to join, see this announcement for details:
viewtopic.php?f=10&t=4497

Cheers,

  • Ryan