AWS Thinkbox Discussion Forums

Launcher/slave memory issues and slave failures

I think the launcher’s memory issues have already been reported and documented, and I’m pretty sure that’s what caused this fun one, but I just thought I’d post it anyway, since it took me awhile to figure out why things were going wrong.

From the slave log:

2014-09-09 21:01:53: 0: STDOUT: bbcp: 140909 04:01:52 276366% done; 1.9 MB/s, avg 1.5 MB/s 2014-09-09 21:02:02: 0: STDOUT: bbcp: 140909 04:02:01 277651% done; 484.4 MB/s, avg 1.5 KB/s 2014-09-09 21:02:03: Listener Thread - OnConnect: SocketException occured: The socket is not connected 2014-09-09 21:02:03: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2014-09-09 21:02:04: Exception Details 2014-09-09 21:02:04: SocketException -- The socket is not connected 2014-09-09 21:02:04: SocketException.ErrorCode: 10057 (The socket is not connected) 2014-09-09 21:02:04: SocketException.SocketErrorCode: NotConnected (10057) 2014-09-09 21:02:04: Win32Exception.NativeErrorCode: 10057 2014-09-09 21:02:04: Exception.Source: System 2014-09-09 21:02:04: Exception.TargetSite: System.Net.EndPoint get_RemoteEndPoint() 2014-09-09 21:02:04: Exception.Data: ( ) 2014-09-09 21:02:04: Exception.StackTrace: 2014-09-09 21:02:04: at System.Net.Sockets.Socket.get_RemoteEndPoint () [0x00000] in <filename unknown>:0 2014-09-09 21:02:04: at Deadline.ListenerThread.OnConnect (IAsyncResult ar) [0x00000] in <filename unknown>:0 2014-09-09 21:02:04: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2014-09-09 21:02:04: Listener Thread - OnConnect: Restarting Listener Thread because it exited prematurely!! (System.Exception) 2014-09-09 21:02:04: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2014-09-09 21:02:04: Exception Details 2014-09-09 21:02:04: Exception -- Restarting Listener Thread because it exited prematurely!! 2014-09-09 21:02:04: Exception.Data: ( ) 2014-09-09 21:02:04: Exception.StackTrace: 2014-09-09 21:02:04: (null) 2014-09-09 21:02:04: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2014-09-09 21:02:04: Listener Thread - Waiting 5 seconds to restart the thread. 2014-09-09 21:02:11: 0: STDOUT: bbcp: 140909 04:02:10 281508% done; 1.4 MB/s, avg 1.5 MB/s 2014-09-09 21:02:13: WARNING: msgLength is very large (369295360) and is probably a garbage response. 2014-09-09 21:02:13: WARNING: To avoid out of memory issues msgLength will be clamped to 10000. 2014-09-09 21:02:20: 0: STDOUT: bbcp: 140909 04:02:19 283821% done; 872.0 MB/s, avg 1.5 KB/s 2014-09-09 21:02:29: 0: STDOUT: bbcp: 140909 04:02:28 287678% done; 1.4 MB/s, avg 1.5 MB/s 2014-09-09 21:02:33: WARNING: msgLength is very large (369295360) and is probably a garbage response. 2014-09-09 21:02:33: WARNING: To avoid out of memory issues msgLength will be clamped to 10000. 2014-09-09 21:02:39: 0: STDOUT: bbcp: 140909 04:02:37 291534% done; 1.4 MB/s, avg 1.5 MB/s 2014-09-09 21:02:43: WARNING: msgLength is very large (1195725856) and is probably a garbage response. 2014-09-09 21:02:43: WARNING: To avoid out of memory issues msgLength will be clamped to 10000. 2014-09-09 21:02:47: 0: STDOUT: bbcp: 140909 04:02:46 296418% done; 1.8 MB/s, avg 1.5 MB/s 2014-09-09 21:02:56: 0: STDOUT: bbcp: 140909 04:02:55 299503% done; 1.1 MB/s, avg 1.5 MB/s 2014-09-09 21:03:04: WARNING: msgLength is very large (369295360) and is probably a garbage response. 2014-09-09 21:03:04: WARNING: To avoid out of memory issues msgLength will be clamped to 10000. 2014-09-09 21:03:07: 0: STDOUT: bbcp: 140909 04:03:05 304645% done; 1.7 MB/s, avg 1.5 MB/s 2014-09-09 21:03:14: WARNING: msgLength is very large (369295360) and is probably a garbage response. 2014-09-09 21:03:14: WARNING: To avoid out of memory issues msgLength will be clamped to 10000. 2014-09-09 21:03:15: 0: STDOUT: bbcp: 140909 04:03:14 307987% done; 1.2 MB/s, avg 1.5 MB/s 2014-09-09 21:03:24: WARNING: msgLength is very large (1212501072) and is probably a garbage response. 2014-09-09 21:03:24: WARNING: To avoid out of memory issues msgLength will be clamped to 10000. 2014-09-09 21:03:24: 0: STDOUT: bbcp: 140909 04:03:23 311586% done; 1.3 MB/s, avg 1.5 MB/s 2014-09-09 21:03:34: 0: STDOUT: bbcp: 140909 04:03:32 313900% done; 872.0 MB/s, avg 1.5 KB/s 2014-09-09 21:03:35: Listener Thread - Exception (::ffff:10.20.100.66): Number overflow. 2014-09-09 21:03:35: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2014-09-09 21:03:39: Exception Details 2014-09-09 21:03:39: OverflowException -- Number overflow. 2014-09-09 21:03:39: Exception.Source: franticx 2014-09-09 21:03:39: Exception.TargetSite: System.Byte[] ReadBytes(System.Net.Sockets.NetworkStream, Int32, Int32) 2014-09-09 21:03:39: Exception.Data: ( ) 2014-09-09 21:03:39: Exception.StackTrace: 2014-09-09 21:03:39: at (wrapper managed-to-native) object:__icall_wrapper_mono_array_new_specific (intptr,int) 2014-09-09 21:03:39: at FranticX.Net.SocketUtils.ReadBytes (System.Net.Sockets.NetworkStream networkStream, Int32 length, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:03:39: at FranticX.Net.SocketUtils.RecvMessage (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds, Int32 maxMessageLength) [0x00000] in <filename unknown>:0 2014-09-09 21:03:39: at FranticX.Net.SocketUtils.RecvMessage (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:03:39: at Deadline.ListenerThread.WaitForMessage (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:03:39: at Deadline.ListenerThread.OnConnect (IAsyncResult ar) [0x00000] in <filename unknown>:0 2014-09-09 21:03:39: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2014-09-09 21:03:39: WARNING: msgLength is very large (33554944) and is probably a garbage response. 2014-09-09 21:03:39: WARNING: To avoid out of memory issues msgLength will be clamped to 10000. 2014-09-09 21:03:43: 0: STDOUT: bbcp: 140909 04:03:41 317756% done; 1.4 MB/s, avg 1.5 MB/s 2014-09-09 21:03:49: WARNING: msgLength is very large (1514297439) and is probably a garbage response. 2014-09-09 21:03:49: WARNING: To avoid out of memory issues msgLength will be clamped to 10000. 2014-09-09 21:03:52: 0: STDOUT: bbcp: 140909 04:03:50 320070% done; 872.0 MB/s, avg 1.5 KB/s
And then the slave went down.

This is what I found in the launcher log:

2014-09-09 21:02:10: Launcher Thread - OnConnect: SocketException occured: The socket is not connected 2014-09-09 21:02:10: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2014-09-09 21:02:10: Exception Details 2014-09-09 21:02:10: SocketException -- The socket is not connected 2014-09-09 21:02:10: SocketException.ErrorCode: 10057 (The socket is not connected) 2014-09-09 21:02:10: SocketException.SocketErrorCode: NotConnected (10057) 2014-09-09 21:02:10: Win32Exception.NativeErrorCode: 10057 2014-09-09 21:02:10: Exception.Source: System 2014-09-09 21:02:10: Exception.TargetSite: System.Net.EndPoint get_RemoteEndPoint() 2014-09-09 21:02:10: Exception.Data: ( ) 2014-09-09 21:02:10: Exception.StackTrace: 2014-09-09 21:02:10: at System.Net.Sockets.Socket.get_RemoteEndPoint () [0x00000] in <filename unknown>:0 2014-09-09 21:02:10: at Deadline.Launcher.LauncherThread.OnConnect (IAsyncResult ar) [0x00000] in <filename unknown>:0 2014-09-09 21:02:10: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2014-09-09 21:02:10: Launcher Thread - OnConnect: Restarting Launcher Thread because it exited prematurely!! (System.Exception) 2014-09-09 21:02:10: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2014-09-09 21:02:10: Exception Details 2014-09-09 21:02:10: Exception -- Restarting Launcher Thread because it exited prematurely!! 2014-09-09 21:02:10: Exception.Data: ( ) 2014-09-09 21:02:10: Exception.StackTrace: 2014-09-09 21:02:10: (null) 2014-09-09 21:02:10: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2014-09-09 21:02:10: Launcher Thread - Waiting 5 seconds to restart the thread. 2014-09-09 21:02:15: ::ffff:10.20.100.66 has connected 2014-09-09 21:02:26: Launcher Thread - SimpleSocketException (::ffff:10.20.100.66): timed out while waiting for data. Connection will be closed 2014-09-09 21:02:26: ::ffff:10.20.100.66 has connected 2014-09-09 21:02:51: Launcher Thread - SimpleSocketException (::ffff:10.20.100.66): timed out while waiting for data. Connection will be closed 2014-09-09 21:02:51: ::ffff:10.20.100.66 has connected 2014-09-09 21:03:08: Launcher Thread - SimpleSocketException (::ffff:10.20.100.66): timed out while waiting for data. Connection will be closed 2014-09-09 21:03:08: ::ffff:10.20.100.66 has connected 2014-09-09 21:03:43: Launcher Thread - SimpleSocketException (::ffff:10.20.100.66): timed out while waiting for data. Connection will be closed 2014-09-09 21:03:43: ::ffff:10.20.100.66 has connected 2014-09-09 21:03:53: Launcher Thread - SimpleSocketException (::ffff:10.20.100.66): timed out while waiting for commands. Connection will be closed 2014-09-09 21:03:53: ::ffff:10.20.100.66 has connected 2014-09-09 21:04:16: Launcher Thread - SimpleSocketException (::ffff:10.20.100.66): timed out while waiting for data. Connection will be closed 2014-09-09 21:04:16: ::ffff:10.20.100.66 has connected 2014-09-09 21:04:16: Updating Repository options 2014-09-09 21:04:17: - Remote Administration: enabled 2014-09-09 21:04:17: - Automatic Updates: enabled 2014-09-09 21:04:31: Launcher Thread - SimpleSocketException (::ffff:10.20.100.66): timed out while waiting for data. Connection will be closed 2014-09-09 21:04:31: ::ffff:10.20.100.66 has connected 2014-09-09 21:08:54: Launcher Thread - SimpleSocketException (::ffff:10.20.100.66): timed out while waiting for data. Connection will be closed 2014-09-09 21:08:54: ::ffff:10.20.100.66 has connected 2014-09-09 21:09:04: Launcher Thread - SimpleSocketException (::ffff:10.20.100.66): timed out while waiting for data. Connection will be closed 2014-09-09 21:09:04: ::ffff:10.20.100.66 has connected 2014-09-09 21:09:17: Updating Repository options 2014-09-09 21:09:17: - Remote Administration: enabled 2014-09-09 21:09:17: - Automatic Updates: enabled 2014-09-09 21:10:31: Launcher Thread - SimpleSocketException (::ffff:10.20.100.66): timed out while waiting for data. Connection will be closed 2014-09-09 21:10:31: ::ffff:10.20.100.66 has connected 2014-09-09 21:10:31: Launcher Thread - Exception (::ffff:10.20.100.66): Number overflow. 2014-09-09 21:10:31: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2014-09-09 21:10:31: Exception Details 2014-09-09 21:10:31: OverflowException -- Number overflow. 2014-09-09 21:10:31: Exception.Source: franticx 2014-09-09 21:10:31: Exception.TargetSite: System.Byte[] ReadBytes(System.Net.Sockets.NetworkStream, Int32, Int32) 2014-09-09 21:10:31: Exception.Data: ( ) 2014-09-09 21:10:31: Exception.StackTrace: 2014-09-09 21:10:31: at (wrapper managed-to-native) object:__icall_wrapper_mono_array_new_specific (intptr,int) 2014-09-09 21:10:31: at FranticX.Net.SocketUtils.ReadBytes (System.Net.Sockets.NetworkStream networkStream, Int32 length, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:10:31: at FranticX.Net.SocketUtils.RecvMessage (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds, Int32 maxMessageLength) [0x00000] in <filename unknown>:0 2014-09-09 21:10:31: at Deadline.Launcher.LauncherThread.WaitForCommands (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:10:31: at Deadline.Launcher.LauncherThread.OnConnect (IAsyncResult ar) [0x00000] in <filename unknown>:0 2014-09-09 21:10:31: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2014-09-09 21:10:31: ::ffff:10.20.100.66 has connected 2014-09-09 21:10:32: Launcher Thread - Exception (::ffff:10.20.100.66): Out of memory 2014-09-09 21:10:32: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2014-09-09 21:10:32: Exception Details 2014-09-09 21:10:32: OutOfMemoryException -- Out of memory 2014-09-09 21:10:32: Exception.Source: franticx 2014-09-09 21:10:32: Exception.TargetSite: System.Byte[] ReadBytes(System.Net.Sockets.NetworkStream, Int32, Int32) 2014-09-09 21:10:32: Exception.Data: ( ) 2014-09-09 21:10:32: Exception.StackTrace: 2014-09-09 21:10:32: at (wrapper managed-to-native) object:__icall_wrapper_mono_array_new_specific (intptr,int) 2014-09-09 21:10:32: at FranticX.Net.SocketUtils.ReadBytes (System.Net.Sockets.NetworkStream networkStream, Int32 length, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:10:32: at FranticX.Net.SocketUtils.RecvMessage (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds, Int32 maxMessageLength) [0x00000] in <filename unknown>:0 2014-09-09 21:10:32: at Deadline.Launcher.LauncherThread.WaitForCommands (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:10:32: at Deadline.Launcher.LauncherThread.OnConnect (IAsyncResult ar) [0x00000] in <filename unknown>:0 2014-09-09 21:10:32: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2014-09-09 21:10:32: ::ffff:10.20.100.66 has connected 2014-09-09 21:10:50: Launcher Thread - SimpleSocketException (::ffff:10.20.100.66): timed out while waiting for data. Connection will be closed 2014-09-09 21:10:50: ::ffff:10.20.100.66 has connected 2014-09-09 21:11:00: Launcher Thread - SimpleSocketException (::ffff:10.20.100.66): timed out while waiting for data. Connection will be closed 2014-09-09 21:11:00: ::ffff:10.20.100.66 has connected 2014-09-09 21:11:00: Launcher Thread - Exception (::ffff:10.20.100.66): Out of memory 2014-09-09 21:11:00: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2014-09-09 21:11:00: Exception Details 2014-09-09 21:11:00: OutOfMemoryException -- Out of memory 2014-09-09 21:11:00: Exception.Source: franticx 2014-09-09 21:11:00: Exception.TargetSite: System.Byte[] ReadBytes(System.Net.Sockets.NetworkStream, Int32, Int32) 2014-09-09 21:11:00: Exception.Data: ( ) 2014-09-09 21:11:00: Exception.StackTrace: 2014-09-09 21:11:00: at (wrapper managed-to-native) object:__icall_wrapper_mono_array_new_specific (intptr,int) 2014-09-09 21:11:00: at FranticX.Net.SocketUtils.ReadBytes (System.Net.Sockets.NetworkStream networkStream, Int32 length, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:11:00: at FranticX.Net.SocketUtils.RecvMessage (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds, Int32 maxMessageLength) [0x00000] in <filename unknown>:0 2014-09-09 21:11:00: at Deadline.Launcher.LauncherThread.WaitForCommands (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:11:00: at Deadline.Launcher.LauncherThread.OnConnect (IAsyncResult ar) [0x00000] in <filename unknown>:0 2014-09-09 21:11:00: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2014-09-09 21:11:00: ::ffff:10.20.100.66 has connected 2014-09-09 21:11:00: Launcher Thread - Exception (::ffff:10.20.100.66): Out of memory 2014-09-09 21:11:00: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2014-09-09 21:11:00: Exception Details 2014-09-09 21:11:00: OutOfMemoryException -- Out of memory 2014-09-09 21:11:00: Exception.Source: franticx 2014-09-09 21:11:00: Exception.TargetSite: System.Byte[] ReadBytes(System.Net.Sockets.NetworkStream, Int32, Int32) 2014-09-09 21:11:00: Exception.Data: ( ) 2014-09-09 21:11:00: Exception.StackTrace: 2014-09-09 21:11:00: at (wrapper managed-to-native) object:__icall_wrapper_mono_array_new_specific (intptr,int) 2014-09-09 21:11:00: at FranticX.Net.SocketUtils.ReadBytes (System.Net.Sockets.NetworkStream networkStream, Int32 length, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:11:00: at FranticX.Net.SocketUtils.RecvMessage (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds, Int32 maxMessageLength) [0x00000] in <filename unknown>:0 2014-09-09 21:11:00: at Deadline.Launcher.LauncherThread.WaitForCommands (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:11:00: at Deadline.Launcher.LauncherThread.OnConnect (IAsyncResult ar) [0x00000] in <filename unknown>:0 2014-09-09 21:11:00: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2014-09-09 21:11:00: ::ffff:10.20.100.66 has connected 2014-09-09 21:11:10: Launcher Thread - SimpleSocketException (::ffff:10.20.100.66): timed out while waiting for commands. Connection will be closed 2014-09-09 21:11:10: ::ffff:10.20.100.66 has connected 2014-09-09 21:11:20: Launcher Thread - SimpleSocketException (::ffff:10.20.100.66): timed out while waiting for data. Connection will be closed 2014-09-09 21:11:20: ::ffff:10.20.100.66 has connected 2014-09-09 21:11:31: Launcher Thread - SimpleSocketException (::ffff:10.20.100.66): timed out while waiting for data. Connection will be closed 2014-09-09 21:11:31: ::ffff:10.20.100.66 has connected 2014-09-09 21:11:41: Launcher Thread - SimpleSocketException (::ffff:10.20.100.66): timed out while waiting for commands. Connection will be closed 2014-09-09 21:11:41: ::ffff:10.20.100.66 has connected 2014-09-09 21:11:51: Launcher Thread - SimpleSocketException (::ffff:10.20.100.66): timed out while waiting for data. Connection will be closed 2014-09-09 21:12:26: ::ffff:10.20.100.66 has connected 2014-09-09 21:12:36: Launcher Thread - SimpleSocketException (::ffff:10.20.100.66): timed out while waiting for commands. Connection will be closed 2014-09-09 21:12:36: ::ffff:10.20.100.66 has connected 2014-09-09 21:12:36: Launcher Thread - Exception (::ffff:10.20.100.66): Number overflow. 2014-09-09 21:12:36: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2014-09-09 21:12:36: Exception Details 2014-09-09 21:12:36: OverflowException -- Number overflow. 2014-09-09 21:12:36: Exception.Source: franticx 2014-09-09 21:12:36: Exception.TargetSite: System.Byte[] ReadBytes(System.Net.Sockets.NetworkStream, Int32, Int32) 2014-09-09 21:12:36: Exception.Data: ( ) 2014-09-09 21:12:36: Exception.StackTrace: 2014-09-09 21:12:36: at (wrapper managed-to-native) object:__icall_wrapper_mono_array_new_specific (intptr,int) 2014-09-09 21:12:36: at FranticX.Net.SocketUtils.ReadBytes (System.Net.Sockets.NetworkStream networkStream, Int32 length, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: at FranticX.Net.SocketUtils.RecvMessage (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds, Int32 maxMessageLength) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: at Deadline.Launcher.LauncherThread.WaitForCommands (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: at Deadline.Launcher.LauncherThread.OnConnect (IAsyncResult ar) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2014-09-09 21:12:36: ::ffff:10.20.100.66 has connected 2014-09-09 21:12:36: Launcher Thread - Exception (::ffff:10.20.100.66): Number overflow. 2014-09-09 21:12:36: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2014-09-09 21:12:36: Exception Details 2014-09-09 21:12:36: OverflowException -- Number overflow. 2014-09-09 21:12:36: Exception.Source: franticx 2014-09-09 21:12:36: Exception.TargetSite: System.Byte[] ReadBytes(System.Net.Sockets.NetworkStream, Int32, Int32) 2014-09-09 21:12:36: Exception.Data: ( ) 2014-09-09 21:12:36: Exception.StackTrace: 2014-09-09 21:12:36: at (wrapper managed-to-native) object:__icall_wrapper_mono_array_new_specific (intptr,int) 2014-09-09 21:12:36: at FranticX.Net.SocketUtils.ReadBytes (System.Net.Sockets.NetworkStream networkStream, Int32 length, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: at FranticX.Net.SocketUtils.RecvMessage (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds, Int32 maxMessageLength) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: at Deadline.Launcher.LauncherThread.WaitForCommands (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: at Deadline.Launcher.LauncherThread.OnConnect (IAsyncResult ar) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2014-09-09 21:12:36: ::ffff:10.20.100.66 has connected 2014-09-09 21:12:36: Launcher Thread - Exception (::ffff:10.20.100.66): Number overflow. 2014-09-09 21:12:36: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2014-09-09 21:12:36: Exception Details 2014-09-09 21:12:36: OverflowException -- Number overflow. 2014-09-09 21:12:36: Exception.Source: franticx 2014-09-09 21:12:36: Exception.TargetSite: System.Byte[] ReadBytes(System.Net.Sockets.NetworkStream, Int32, Int32) 2014-09-09 21:12:36: Exception.Data: ( ) 2014-09-09 21:12:36: Exception.StackTrace: 2014-09-09 21:12:36: at (wrapper managed-to-native) object:__icall_wrapper_mono_array_new_specific (intptr,int) 2014-09-09 21:12:36: at FranticX.Net.SocketUtils.ReadBytes (System.Net.Sockets.NetworkStream networkStream, Int32 length, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: at FranticX.Net.SocketUtils.RecvMessage (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds, Int32 maxMessageLength) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: at Deadline.Launcher.LauncherThread.WaitForCommands (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: at Deadline.Launcher.LauncherThread.OnConnect (IAsyncResult ar) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2014-09-09 21:12:36: ::ffff:10.20.100.66 has connected 2014-09-09 21:12:36: Launcher Thread - Exception (::ffff:10.20.100.66): Number overflow. 2014-09-09 21:12:36: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2014-09-09 21:12:36: Exception Details 2014-09-09 21:12:36: OverflowException -- Number overflow. 2014-09-09 21:12:36: Exception.Source: franticx 2014-09-09 21:12:36: Exception.TargetSite: System.Byte[] ReadBytes(System.Net.Sockets.NetworkStream, Int32, Int32) 2014-09-09 21:12:36: Exception.Data: ( ) 2014-09-09 21:12:36: Exception.StackTrace: 2014-09-09 21:12:36: at (wrapper managed-to-native) object:__icall_wrapper_mono_array_new_specific (intptr,int) 2014-09-09 21:12:36: at FranticX.Net.SocketUtils.ReadBytes (System.Net.Sockets.NetworkStream networkStream, Int32 length, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: at FranticX.Net.SocketUtils.RecvMessage (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds, Int32 maxMessageLength) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: at Deadline.Launcher.LauncherThread.WaitForCommands (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: at Deadline.Launcher.LauncherThread.OnConnect (IAsyncResult ar) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< 2014-09-09 21:12:36: ::ffff:10.20.100.66 has connected 2014-09-09 21:12:36: Launcher Thread - Exception (::ffff:10.20.100.66): Number overflow. 2014-09-09 21:12:36: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2014-09-09 21:12:36: Exception Details 2014-09-09 21:12:36: OverflowException -- Number overflow. 2014-09-09 21:12:36: Exception.Source: franticx 2014-09-09 21:12:36: Exception.TargetSite: System.Byte[] ReadBytes(System.Net.Sockets.NetworkStream, Int32, Int32) 2014-09-09 21:12:36: Exception.Data: ( ) 2014-09-09 21:12:36: Exception.StackTrace: 2014-09-09 21:12:36: at (wrapper managed-to-native) object:__icall_wrapper_mono_array_new_specific (intptr,int) 2014-09-09 21:12:36: at FranticX.Net.SocketUtils.ReadBytes (System.Net.Sockets.NetworkStream networkStream, Int32 length, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: at FranticX.Net.SocketUtils.RecvMessage (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds, Int32 maxMessageLength) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: at Deadline.Launcher.LauncherThread.WaitForCommands (System.Net.Sockets.NetworkStream networkStream, Int32 timeoutMilliseconds) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: at Deadline.Launcher.LauncherThread.OnConnect (IAsyncResult ar) [0x00000] in <filename unknown>:0 2014-09-09 21:12:36: <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
And then once the slave went down, it continued running “normally”.

Also, this machine has (had) two slaves running on it. I restarted the one that went down, and found that after a fresh start, it’s using around 1/10th the memory of the other one (which has been running for a month).

49752 1060824 mono --runtime=v4.0 /usr/local/Thinkbox/Deadline6/bin/deadlineslave.exe -nogui -name sv-sync01-02 436812 2571020 mono --runtime=v4.0 /usr/local/Thinkbox/Deadline6/bin/deadlineslave.exe -nogui -name sv-sync01-01

So it appears the launcher isn’t the only process that leaks.

We currently aren’t aware of any slave memory leaks, so this is a new one. There are a couple of things that come to mind:

  • In Mono 2.10, I believe the default garbage collector is boehm, and it has been replaced with an improved garbage collector in 3.4 called sgen. This means that Deadline 7 would use the improved garbage collector.
  • Because the python environment isn’t sandboxed, it would be possible for the python environment to grow over time if everything isn’t cleaned up. If that were the case, then the python sandboxing we are doing in Deadline 8 would solve this problem.

We’ve had the Deadline 7 slave running for months now on some linux boxes here, and they don’t show any signs of leaking, which is why I thought it could be the python environment (since you’re pipeline is based on using heavily customized scripts with Deadline). Of course, that could also point to the new garbage collector. Maybe when you get a chance to test Deadline 7, we could see if the problem persists?

Well, we actually run all of our jobs via our own process stack that gets spawned anew for each task, so there shouldn’t be any leaks coming from there. Granted, that still leaves our Deadline plugin class, its associated JobPreLoad.py, and our EventListener class as possible culprits, but as far as I know, I’ve implemented all of the the appropriate/available cleanup hooks. The event listener does use the .NET Mongo driver that ships with Deadline directly, but all of of the objects created by that are scoped to a single listener method, and as far as I understand it, those should all be managed by the .NET runtime.

So I can’t remember if this was logged in another thread, but the launcher definitely has some kind of a leak…

Yup, we’re aware of it. Thanks!

Privacy | Site terms | Cookie preferences