Lots of mongo errosrs

No idea what is going on or how to debug that infernal MongoDB, but i get a lot of these errors :
MongoDB is running on CentOS 6.4, SELinux set to permissive (for debugging), firewall is disabled, deadline repo dir is shared through samba.
Connecting to D6 repo with a Win7 D6 client.

2013-07-26 17:42:26: Error occurred while updating task cache: An unexpected error occurred while interacting with the database (LANDLORD:27017): 2013-07-26 17:42:26: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. (FranticX.Database.DocumentException) 2013-07-26 17:42:54: Error occurred while updating job cache: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 192.168.123.100:27017 (System.Net.Sockets.SocketException) 2013-07-26 17:42:57: Traceback (most recent call last): 2013-07-26 17:42:57: File "DeadlineUI\UI\Importers\CloudListImporter.py", line 71, in run 2013-07-26 17:42:57: File "DeadlineUI\UI\Importers\CloudListImporter.py", line 111, in poll 2013-07-26 17:42:57: DocumentException: An unexpected error occurred while interacting with the database (LANDLORD:27017): 2013-07-26 17:42:57: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. 2013-07-26 17:42:57: at c.a(MongoServer A_0, Exception A_1) 2013-07-26 17:42:57: at Deadline.StorageDB.MongoDB.MongoSettingsStorage.GetCloudSettings(Boolean invalidateCache) 2013-07-26 17:42:58: Traceback (most recent call last): 2013-07-26 17:42:58: File "DeadlineMonitor\UI\Forms\MainWindow.py", line 781, in changeMonitorOptions 2013-07-26 17:42:58: DocumentException: An unexpected error occurred while interacting with the database (LANDLORD:27017): 2013-07-26 17:42:58: Unable to read data from the transport connection: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. 2013-07-26 17:42:58: at c.a(MongoServer A_0, Exception A_1) 2013-07-26 17:42:58: at Deadline.StorageDB.MongoDB.MongoUserStorage.SaveUser(UserInfo userInfo)

Could it be another case of Realtek NIC rubbish? (having replaced a Realtek adapter with a dedicated Intel NIC before because it databases /samba seem to kill those NICs)?
Or am i missing something obvious.

Sven

Okay, i’m starting to think it’s that Realtek rubbish again (why they even add this absolute crap hardware to boards is beyond me), i’m just going on a limb and order a couple of good Intels and see what that does.

Oh good gravy, yeah. It looks like something’s going wrong in transport.

It’s clearly resolving, and there must have been a route previously if it worked at all before. A flaky NIC is the best explanation here.

Just getting back to this, letting you know this particular problem has been solved.

As suspected, it was the Realtek/Broadcom/Marvell network nics in the servers.

The solution was simple, just ordered a couple of NICs from a company that actually knows what they are doing (Intel Pro GT/1000G cards in this case) and burned those unholy Realtek NICs with fire.
So when the network is acting up, check there are no Realtek/Marvell NICs on the network (these things degraded performance on the ENTIRE network, even for machines not connected), and replace with something reliable.

now onto the next problem.