AWS Thinkbox Discussion Forums

socket leak in 8.0.17.1?

Hi there,

I’ve noticed that if i leave the deadline monitor open for a few days, my machine becomes unable to initiate any new network connections. The typical error messages are like this:

Error in TightVNC Viewer: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.

It ranges from simply being unable to send emails (complains about network connectivity to the smtp server), to opening rdp sessions, or vnc sessions, sending remote deadline commands etc.
If i close the deadline monitor, the situation normalizes. If i reopen it, it will take about a day or maybe two till the machine reaches this state again.

(we have only recently updated to 8.0.17.1 from 8.0.12.4, so it seems that something happened between those releases…)

cheers
laszlo

some of the monitor log:

2017-06-12 02:34:54:  Time to connect to Repository: 1.451 s
2017-06-12 02:34:54:  Time to check user account: 15.000 ms
2017-06-12 02:34:54:  Time to purge old logs and temp files: 16.000 ms
2017-06-12 02:34:55:  Time to synchronize plugin icons: 749.000 ms
2017-06-12 02:34:55:  WARNING: Encountered the following error while initializing the Event Sandbox: 'An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full'.
2017-06-12 02:34:55:  Falling back to embedded Event Manager -- this may result in stale Python environments being used.
2017-06-12 02:34:55:  Scanline_NukeCacheCleanupEventListener.__init__
2017-06-12 02:34:55:  Enabling render stat mail
2017-06-12 02:34:59:  Time to initialize main window: 4.056 s
2017-06-12 02:35:00:  Main Window shown
2017-06-12 02:35:00:  Time to show main window: 374.000 ms
2017-06-12 02:35:00:  Error occurred while updating slave cache: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full 172.18.4.58:27017 (System.Net.Sockets.SocketException)
2017-06-12 02:35:00:  Error occurred while updating proxy server cache: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full 172.18.4.58:27017 (System.Net.Sockets.SocketException)
2017-06-12 02:35:00:     at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
2017-06-12 02:35:00:     at System.Net.Sockets.Socket.Connect(EndPoint remoteEP)
2017-06-12 02:35:00:     at System.Net.Sockets.TcpClient.Connect(IPEndPoint remoteEP)
2017-06-12 02:35:00:     at MongoDB.Driver.Internal.MongoConnection.Open()
2017-06-12 02:35:00:     at MongoDB.Driver.Internal.MongoConnection.SendMessage(BsonBuffer buffer, Int32 requestId)
2017-06-12 02:35:00:     at MongoDB.Driver.Internal.MongoConnection.SendMessage(MongoRequestMessage message)
2017-06-12 02:35:00:     at MongoDB.Driver.Operations.QueryOperation`1.GetFirstBatch(IConnectionProvider connectionProvider)
2017-06-12 02:35:00:     at MongoDB.Driver.Operations.QueryOperation`1.Execute(IConnectionProvider connectionProvider)
2017-06-12 02:35:00:     at Deadline.StorageDB.MongoDB.MongoProxyServerStorage.GetModifiedProxyServers(ProxyServerInfoSettings[]& modifiedProxyServers, String[]& deletedProxyServerIds, Nullable`1 lastSettingsAutoUpdate, Nullable`1 lastInfoAutoUpdate, Nullable`1 lastDeletionAutoUpdate)
2017-06-12 02:35:00:     at Deadline.StorageDB.ProxyServerStorage.a(Object o)
2017-06-12 02:35:00:  Error occurred while updating limit group cache: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full 172.18.4.58:27017 (System.Net.Sockets.SocketException)
2017-06-12 02:35:00:     at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)
2017-06-12 02:35:00:     at System.Net.Sockets.Socket.Connect(EndPoint remoteEP)
2017-06-12 02:35:00:     at System.Net.Sockets.TcpClient.Connect(IPEndPoint remoteEP)
2017-06-12 02:35:00:     at MongoDB.Driver.Internal.MongoConnection.Open()
2017-06-12 02:35:00:     at MongoDB.Driver.Internal.MongoConnection.SendMessage(BsonBuffer buffer, Int32 requestId)
2017-06-12 02:35:00:     at MongoDB.Driver.Internal.MongoConnection.SendMessage(MongoRequestMessage message)
2017-06-12 02:35:00:     at MongoDB.Driver.Operations.QueryOperation`1.GetFirstBatch(IConnectionProvider connectionProvider)
2017-06-12 02:35:00:     at MongoDB.Driver.Operations.QueryOperation`1.Execute(IConnectionProvider connectionProvider)
2017-06-12 02:35:00:     at Deadline.StorageDB.MongoDB.MongoLimitGroupStorage.GetModifiedLimitGroups(LimitGroup[]& modifiedLimitGroups, String[]& deletedLimitGroupIds, Nullable`1 lastAutoUpdate, Nullable`1 lastDeletionUpdate)
2017-06-12 02:35:00:     at Deadline.StorageDB.LimitGroupStorage.a(Object o)
2017-06-12 02:35:00:  Error occurred while updating job cache: An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full 172.18.4.58:27017 (System.Net.Sockets.SocketException)
2017-06-12 02:35:00:     at System.Net.Sockets.Socket.DoConnect(EndPoint endPointSnapshot, SocketAddress socketAddress)

What does netstat -nab net you? It should show you the open TCP sessions and who’s got 'em. It’s probably the Monitor, but I’m wondering if Launcher is to blame here too and it should also show the ports in play.

We bumped the MongoDB driver up a version or two in 9.0 I think… I can check for some noted bugs there.

Thx Edwin, ill run a test when i hit this issue again and report back!

So, after some really quick Googling, there have been a few connection leaks in the Mongo Profiler code, but I didn’t find anything specific to regular DB connections.

Feel free to send over some early results. You shouldn’t have more than 10 open connections (in fact should be closer to 6) to the database, so if you’re an order of magnitude out already that would be helpful right away. I’ll need to see if Jon can make use of core dumps here to view what’s keeping those sockets open, but feel free to make a full dump if you’re in the neighborhood of Process Explorer anyways.

Fun aside: A “long time ago” (Deadline 6?) we used to have a big issue with the MongoDB server on OS X and FreeBSD where the kernel would leak sockets and memory, but that’s not a client side problem.

Ill report back in detail, but for now it seems that it was just a lucky coincidence that this issue surfaced right after a deadline update. Seems like it may be related to the OpenText NFS driver for windows. Still investigating though!

Privacy | Site terms | Cookie preferences