AWS Thinkbox Discussion Forums

Disappearing deadline launcher

On most of our slaves, the deadline launcher disappears after a day or two… this makes remote management very painful, as it requires a wrangler waste a day by remoting into each machine manually, and start the launcher / slave up.

We get no crash reports in the event viewer, nor anything out of ordinary in the launcher logs…

For example:

2013-10-03 15:57:22: BEGIN - LAPRO0431\scanlinevfx
2013-10-03 15:57:22: Deadline Launcher 6.1 [v6.1.0.52622 R]
2013-10-03 15:57:25: Local version file: C:\Program Files\Thinkbox\Deadline6\bin\Version
2013-10-03 15:57:25: Network version file: \inferno2\deadline\repository6\bin\Windows\Version
2013-10-03 15:57:25: Comparing version files…
2013-10-03 15:57:25: Version files match
2013-10-03 15:57:25: Launching Slave:
2013-10-03 15:57:25: Launcher Thread - Launcher thread initializing…
2013-10-03 15:57:25: Remote Administration is now enabled
2013-10-03 15:57:25: Launcher Thread - Remote administration is enabled
2013-10-03 15:57:25: Launcher Thread - Launcher thread listening on port 17060

Then after that the launcher disappeared at one point, cause it wasnt there when i just remoted in. I started it again, and got this in the log:

2013-10-04 17:16:13: BEGIN - LAPRO0431\scanlinevfx
2013-10-04 17:16:13: Deadline Launcher 6.1 [v6.1.0.52622 R]
2013-10-04 17:16:14: Local version file: C:\Program Files\Thinkbox\Deadline6\bin\Version
2013-10-04 17:16:14: Network version file: \inferno2\deadline\repository6\bin\Windows\Version
2013-10-04 17:16:14: Comparing version files…
2013-10-04 17:16:14: Version files match
2013-10-04 17:16:14: Launching Slave:
2013-10-04 17:16:14: Launcher Thread - Launcher thread initializing…
2013-10-04 17:16:14: Remote Administration is now enabled
2013-10-04 17:16:14: Launcher Thread - Remote administration is enabled
2013-10-04 17:16:14: Launcher Thread - Launcher thread listening on port 17060

I cant find anything deadline related in the event viewer between yesterday and today… :\

Any verbose logging options for the launcher that we could turn on to see whats going on?

There is actually a memory leak in the beta 6 launcher that will be fixed in beta 7. It’s likely that this is causing the launcher to die.

Cheers,

  • Ryan

Ah awesome, hopefully that fixes the issue!

Sitting on nails waiting for beta7 :slight_smile:

Ha, I was just looking at my taskbar thinking “that’s weird, my launcher is gone.” We’ll update today.

So far so good! Out of the 40 or so machines i checked, the launcher was running on every single one of them!

Bad news, the launcher is gone again on about 50% of all the machines that are running 53080 :frowning:

Hmm, I’m guessing there is nothing in the Launcher log or event viewer again? This is so strange…

We have the same version of the Launcher that’s been running on a test machine for a week now, with no signs of memory issues (so at least the memory leak has been fixed). We also have an unhandled exception handler in the Launcher, so if an exception was being thrown, it should be caught and written to the log.

Here’s something to try if you’re willing. Maybe pick 5 or 10 machines, close the launcher if it’s still running, and then open a command prompt and run the launcher with the -console flag:

"C:\Program Files\Thinkbox\Deadline6\bin\deadlinelauncher.exe -console

This will attach a Windows console to the launcher so that everything the launcher prints out goes to stdout. If it is throwing an exception that’s not making it to the logs, it should at least show up in the console. Then if one of these launchers dies, send us the console output and we’ll take a look.

Thanks!

  • Ryan

I’ll try the console option!

Do you think it would make sense to add a keepalive log signal into the launcher, that gets fired with some stats etc every say 5 minutes. We could turn on this verbose logging on ~100 slaves or so, then see what happens.

Yeah, we’ll add some log information when the Launcher updates its repository options, which is every 5 minutes already.

im running console mode on 46 machines now, ill keep you posted!

Well the console helped to narrow it down a bit. On the machines where the launcher went missing, the console was also missing :slight_smile:

So it seems that thelauncher does not survive a reboot. I tried rebooting a couple slaves, and the launcher would start, trigger the slave startup, then it would disappear right away. When i go into these boxes later, i see the slave running, but no launcher.

Well that’s odd. The launcher always starts up on our test machines and stays up, so I can’t imagine why it closes itself on your machines after launching the slave.

I’m assuming you guys have your nodes setup to auto-login? Also, what does the DeadlineLauncher key in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Run look like? It should just contain the full path to deadlinelauncher.exe.

I actually saw this with my own eyes now on my workstation.

I started the launcher, it started the slave, then the launcher disappeared right away

Yes: autologin is set up, and the registry entry only has the launcher there

If i start the launcher by clicking on the icon, it works.

If i start it like this:

runas /user:VCPRO1014\Scanlinevfx_user “c:\Program Files\Thinkbox\Deadline6\bin\deadlinelauncher.exe”

It pops up, starts the slave, then disappears. 100% reproable

It disappears with psexec as well:

psexec.exe -u VCPRO1014\Scanlinevfx_user “C:\Program Files\Thinkbox\Deadline6\bin\deadlinelauncher.exe”

pops up, starts the slave, launcher is gone

Try changing directories to “C:\Program Files\Thinkbox\Deadline6\bin” first before running those commands. Currently, the deadline apps require that the current working directory be the bin folder so that it can find some dlls. I’m checking to see if there is a way to avoid this requirement.

The only difference i see in the log of a regular startup and an interrupted one is that after the Launcher thread initializing entry, the interrupted one quits, the proper one puts Remote Administration is not enabled

Good startup (via manual icon start):

2013-10-22 12:05:30: BEGIN - VCPRO1014\ScanlineVfx_user
2013-10-22 12:05:30: Deadline Launcher 6.1 [v6.1.0.53080 R]
2013-10-22 12:05:32: Local version file: C:\Program Files\Thinkbox\Deadline6\bin\Version
2013-10-22 12:05:32: Network version file: \inferno2.scanlinevfxla.com\deadline\repository6\bin\Windows\Version
2013-10-22 12:05:32: Comparing version files…
2013-10-22 12:05:32: Version files match
2013-10-22 12:05:32: Launching Slave:
2013-10-22 12:05:32: Launcher Thread - Launcher thread initializing…
2013-10-22 12:05:33: Remote Administration is now enabled
2013-10-22 12:05:33: Launcher Thread - Remote administration is enabled
2013-10-22 12:05:33: Launcher Thread - Launcher thread listening on port 17060

Bad startup (starts slave, then disappears):
2013-10-22 12:05:15: BEGIN - VCPRO1014\Administrator
2013-10-22 12:05:15: Deadline Launcher 6.1 [v6.1.0.53080 R]
2013-10-22 12:05:18: Local version file: C:\Program Files\Thinkbox\Deadline6\bin\Version
2013-10-22 12:05:18: Network version file: \inferno2.scanlinevfxla.com\deadline\repository6\bin\Windows\Version
2013-10-22 12:05:18: Comparing version files…
2013-10-22 12:05:18: Version files match
2013-10-22 12:05:18: Launching Slave:
2013-10-22 12:05:18: Launcher Thread - Launcher thread initializing…

Bingo…!

I cant do this via the registry Run entry though…

We found a way to workaround this that will be included in beta 9. When the Deadline applications startup, they just set their current working directory to their bin folder, and then Deadline can find those dlls just fine.

Privacy | Site terms | Cookie preferences