We have been running a cronjob that triggers a restart of pulse every 30 minutes due to issues with the dependency checking. Tonight this process failed.
After many successful restarts, the launcher started failing to start pulse with the following exception:
2014-08-31 21:00:05: Local version file: /opt/Thinkbox/Deadline6/bin/Version
2014-08-31 21:00:05: Network version file: /mnt/isila/deadline/repository6/bin/Linux/Version
2014-08-31 21:00:05: Comparing version files...
2014-08-31 21:00:05: Version files match
2014-08-31 21:00:05: Launching Pulse
2014-08-31 21:00:05: Launcher Thread - Responded with: Success|
2014-08-31 21:00:05: Failed to spawn process "/opt/Thinkbox/Deadline6/bin/deadlinepulse" with "-nogui " arguments
2014-08-31 21:00:05: Exception Details
2014-08-31 21:00:05: Win32Exception -- ApplicationName='/opt/Thinkbox/Deadline6/bin/deadlinepulse', CommandLine='-nogui ', CurrentDirectory='/opt/Thinkbox/Deadline6/bin'
2014-08-31 21:00:05: Win32Exception.NativeErrorCode: 14
2014-08-31 21:00:05: ExternalException.ErrorCode: -2147467259 (mono-io-layer-error (-2147467259))
2014-08-31 21:00:05: Exception.Source: System
2014-08-31 21:00:05: Exception.TargetSite: Boolean Start_noshell(System.Diagnostics.ProcessStartInfo, System.Diagnostics.Process)
2014-08-31 21:00:05: Exception.Data: ( )
2014-08-31 21:00:05: Exception.StackTrace:
2014-08-31 21:00:05: at System.Diagnostics.Process.Start_noshell (System.Diagnostics.ProcessStartInfo startInfo, System.Diagnostics.Process process) [0x00000] in <filename unknown>:0
2014-08-31 21:00:05: at System.Diagnostics.Process.Start_common (System.Diagnostics.ProcessStartInfo startInfo, System.Diagnostics.Process process) [0x00000] in <filename unknown>:0
2014-08-31 21:00:05: at System.Diagnostics.Process.Start () [0x00000] in <filename unknown>:0
2014-08-31 21:00:05: at (wrapper remoting-invoke-with-check) System.Diagnostics.Process:Start ()
2014-08-31 21:00:05: at FranticX.Processes.Process2.SpawnProcess (System.Diagnostics.ProcessStartInfo startInfo) [0x00000] in <filename unknown>:0
It’s been unable to restart pulse for ~2 hours before it was noticed.
Once the launcher was restarted, pulse started running as well.
According to a quick Google search, that Win32 error code supposedly maps to ERROR_OUTOFMEMORY (“Not enough storage is available to complete this operation”).
Maybe there is a memory leak in the launcher? Try issuing that command to the launcher a couple hundred times and see what happens. Since this happened a couple of days ago, i cant check the ram usage anymore. But i have a feeling it will happen again in the next 1-2 days again. So if i get a midnight call again, ill try to remember to take a ram snapshot
Current ram usage (after 2 days of running):
root 9262 0.3 0.1 4479988 54776 pts/2 Sl Aug31 6:15 /opt/mono-2.10.9/bin/mono-sgen --gc=sgen --runtime=v4.0 /opt/Thinkbox/Deadline6/bin/deadlinelauncher.exe -nogui
Thanks for the info! Definitely looks like a leak, and we’ve logged this as a bug. I’m guessing that it’s either a remote control issue, or that the launcher isn’t cleaning up resources it has for previous pulse processes.
I dont think the command was in fact sent, if i trigger the same operation from another machine via deadline monitor, it actually ‘times out waiting for reply’