Remote control of slaves and crashing slaves/launcher

H guys, I installed Deadline yesterday to give it another go. (I was on the Mac beta a while back while everything was still running on Frantics site, but never got around to setting Deadline up for our farm until now.)

While general rendering works fine (have only tested Maya and Nuke so far), I have a couple of problems that are potential show stoppers.

I work on OS X 10.6.5 with the latest Deadline release 4.1.0.43205 and the moment on just a single machine to test things out. So networking isn’t really and issue. The repository is also connected directly, since this would be the machine used for the repository in the future. /Applications/Deadline and the repository folder have all permissions set to 777, so there should not be any write or read permission errors.

  • Like a few other people I am not able to start the Deadline .apps. They bounce twice and then quit. For now I am resorting to either calling the executables in *.app/Content/MacOS in Terminal (the path is set correctly and they launch by just calling their name) or just by double-clicking them. Launching other tools from within the apps works fine. No logs get produced by either app when they bounce and die. I have attached the Console output of a Launcher.app fail though. Launcherapp.log.zip (9.08 KB)

  • As noted in another thread, I experience really long startup times when submitting a Nuke script.

  • and the most severe problem I have is that I am unable to pause or delete a render without killing the launcher and slave on that machine. Basically, I pause a job and it takes until the next Slave refresh. Then the slave notices that the render has been canceled or resubmitted, quits the render, decides to take a vacation and quits as well. And for good measure it takes its buddy DeadlineLauncher with it. Pulse and Monitor keep running. Below is a typical log output from the slave:

0: STDOUT: RCI 0.2 info : allocated 561 MB, max resident 561 MB, RSS 1821332 MB 0: STDOUT: PHEN 0.2 info : ----------------------------------------------- 0: STDOUT: PHEN 0.2 info : mayabase version 10.8, compiled on Sep 6 2010. 0: STDOUT: PHEN 0.2 info : ----------------------------------------------- Scheduler Thread - Cancelling task because task filename "/Volumes/.../DeadlineRepository/jobs/999_050_999_0c1a216f/Rendering/999_050_999_0c1a216f_00000_640-640.Happy" could not be found, it was likely requeued sending cancel task command to plugin 0: In the process of canceling current task: ignoring exception thrown by PluginLoader Killed Happy:~ alex$

Any help to any of the topics would be greatly appreciated.

Thanks for the Launcher console log. The problem seems to be an issue related to carbon:
developer.apple.com/carbon/tipsa … aunchError

When you installed Deadline, did you leave the option to use Mono’s X11 driver enabled or did you disable it? We find everything to be much more stable with the X11 driver, so if you disabled it when you initially installed, try installing again and leave it enabled to see if that makes a difference.

I’ve replied in that thread.

This is really strange. It’s one thing if the slave is crashing, but I can’t think of any reason why it would take down the launcher with it. The first thing we should do is enable verbose logging. In the Repository Options (which can be accessed from the Tools menu in the Monitor while in super user mode), find the Logging section and enable Slave Verbose Logging. Then restart the slave so that it recognizes the changes immediately.

Then submit a job, let the slave pick it up, and then suspend the job from the Monitor. When everything explodes, go to this folder on the slave machine:
/Applications/Deadline/Resources/logs

Find the most recent slave log and launcher log and post them. We’ll take a look to see if we can figure out what’s going on.

Thanks!

  • Ryan

Hi Ryan,

sorry for my somewhat slow reply, I was out of the office this morning.

I left the X11 driver enabled. In fact, I also tried the Carbon driver yesterday, but that was horribly slow and ugly, so I switched back to the X11 version. But, I am not using the X11 that is supplied with OS X. I am using XQuartz, which is a few versions ahead of the “official” X11, but on which Apples X11 is based on. But a search for Mono and XQuartz doesn’t reveal any incompatibilities. I’ll try and disable XQuartz and see if that helps.

The last problem (slave crashing) seems to have fixed itself with a restart. I am able to suspend and restart jobs without issues now.

Actually, the slave and the launcher are still crashing, but not every single time like yesterday. Unfortunately, the verbose log looks exactly the same as the normal log to me.
deadlineslave(Happy)-2010-12-09-0000.log.zip (38.2 KB)deadlinelauncher(Happy)-2010-12-09-0000.log.zip (569 Bytes)

Hmm, as reported in the Nuke thread, I switched back to regular X11, but now the submissions and file paths seem to get screwed up. I cannot render Maya files anymore. And since the slave/launcher crashes only really happened with the Maya renders, I am not able to investigate those crashes either. Maya doesn’t get that far with regular X11. Tricky business.

One thing I have noticed with XQuartz running is that the slave crashing only occurred after Maya has actually started rendering. When I suspended a render rather early on while Maya is still loading files and parsing the scene, I am able to suspend the renders alright.

We’re going to focus on Nuke and Deadline’s stability first in this thread:
viewtopic.php?f=11&t=4377&start=0

Once we sort that stuff out, we’ll return to this thread to try and figure out the Maya problem.

Cheers,

  • Ryan

Hi Ryan,

I just wanted to let you know that I upgraded to the release version of Deadline 5 and everything is working fine now. I am able to render Maya, Nuke and modo 5 files without problems.

The only thing that tends to happen after a couple of minutes of use (use as browsing in the menus, clicking on things, etc. it doesn’t happen when the window just sits there) is that the menus of the monitor start flickering and from that moment on the monitor is as good as dead from an interaction point of view. I can close it, but if won’t restart. I have to quit X11 and start all things up again for the monitor to work again.

I don’t have a logfile that show any errors at the moment. When I stumble over that flicker issue again, I will upload the log here.

Glad to hear things have improved since upgrading!

Sounds good!

Cheers,

  • Ryan