AWS Thinkbox Discussion Forums

mongo connection error

On some machines, on occasion, we get this error:

Error occurred while updating network settings: An error occurred while trying to connect to the Database (deadline.scanlinevfxla.com:27017). It is possible that the Mongo Database server is incorrectly configured, currently offline, or experiencing network issues. (FranticX.Database.DatabaseConnectionException)
Error: An error occurred while trying to connect to the Database (deadline.scanlinevfxla.com:27017). It is possible that the Mongo Database server is incorrectly configured, currently offline, or experiencing network issues. (FranticX.Database.DatabaseConnectionException)
at b.a(MongoServer A_0, Exception A_1)
at Deadline.StorageDB.MongoDB.MongoUserStorage.GetUser(String userName, Boolean invalidateCache)
at Deadline.StorageDB.MongoDB.MongoUserStorage.UserExists(String userName)
at Deadline.Controllers.DataController.JobFromJobInfo(IDictionary2 jobInfo, StringWriter warningWriter) at Deadline.Submission.SubmissionUtils.SubmitNewJob(IDictionary2 htInfo, StringCollection2 auxillarySubmissionFileNames)
at Deadline.Submission.SubmissionUtils.SubmitNewJob(String[] args)
at Deadline.Submission.Submit.Perform(String[] args)

This affects only that particular machine (deadline is up, running on all slaves, deadline.scanlinevfxla.com is accessible). After a couple of minutes, it starts working again…
Happens maybe 2-3 times a day, on random artist machines. Any ideas?

It looks like a network issue, especially since it seems to clear itself up after a couple of minutes. When this happens, is the machine able to access deadline.scanlinevfxla.com? You could try connecting to the following url in a web browser to see if you can connect to mongo:

http://deadline.scanlinevfxla.com/28017

This is just the web interface to the mongo process.

Cheers,

  • Ryan

I think you mean:

http://deadline.scanlinevfxla.com:28017

Yup, thanks Gavin!

I tried that, and when the macine is popping the error, the website is also inaccessible.

Although, i can ping the machine just fine… any ideas?

This happens on occasion, but is quite wide spread (happens on slaves, workstations…)

I’m wondering if the mongodb server has reached its maximum number of connections. This is from the mongodb docs regarding the maximum number of connections:

We currently don’t enforce a maximum when our installer installs mongo, so unless you guys have set up mongo yourselves with a limit, the limit will be based on your system. Maybe you could post your mongodb log? If it’s rejecting connections for some reason, it should be logging it. It should also give an indication how many connections are open.

I ran some quick tests here, and determined how many connections each Deadline application can have open at a time:

Monitor: 6 connections
Slave: 2 connections
Launcher: 1 connections
Pulse: 2 connections

So generally, that means 3 connections per render node (launcher + slave) and 7 connections per workstation (launcher + monitor).

Cheers,

  • Ryan

I took over the task from Laszlo since his daughter was born last Friday.

Indeed it looks like the connection limitation has been reached. The Deadline log says: “09:48:45 [initandlisten] connection refused because too many open connections: 819”

When I tried to open the mongodb.log I had to realize that its size has reached the size of 22GB. :astonished: Does anybody know how I can limit the size of the log file?

Thanks for checking the mongodb log. Which OS do you guys have mongodb running on again? If it’s Linux, these docs should be helpful:
docs.mongodb.org/manual/administration/ulimit/

Apparently, the default file limit settings on Linux aren’t ideal for mongodb. There are some recommended settings at the bottom of the page that you guys could try.

If you’re using a different OS for mongodb, let me know!

I’ll also look into the log file size issue in the meantime. A few betas ago, we added an additional “–quiet” command line argument to try and reduce the amount of logging, but it would only apply to clean mongo installs. You can check your mongodb setup to see if you are using the quiet argument or not.

Some additional info. Mongodb’s maximum number of connections is 80% of the of the available file descriptors for connections. So for a limit of 1024, which I think is the default for some Linux systems, the maximum number of mongo connections is 819, so that would explain why you’re seeing that number in the mongo log.

Thank you for your response! I will look into the link. We are using Linux (CentOS 6.3).

MongoDB we are running with the standard configuration where the --quiet argument is set by default. I’m looking into the logRotate command and see how I can automate this. It looks like I will end up with a cron job doing this since MongoDB does not support it yet.

Cool. Hopefully using ulimit to increase the upper limit is the solution.

We’ve also run some tests here with caps on the maximum number of connections per Deadline application. We’ve tested with the Monitor at 2 connections and the Slave at 1 connection and found no impact on performance whatsoever. So we’ll be including these caps with beta 16. That should help keep the connection count down.

These will be the new counts in beta 16:

Monitor: 2 connections
Slave: 1 connections
Launcher: 1 connections
Pulse: 2 connections

So generally, that means 2 connections per render node (launcher + slave) and 3 connections per workstation (launcher + monitor).

I followed the link you posted earlier and adjusted the limits according to the recommendations for the MongoDB (docs.mongodb.org/manual/administ … d-settings).

ulimit -n 64000
ulimit -u 32000

So far we had not further problems. I will watch it and let you know how it works.

That’s great news! Keep us posted!

Yeah, it looks like we solved it. This night it went over 819 connections without any errors.

Regarding the hug log file I figured out that the “–quiet” option was not activated as I thought. After setting this parameter it was no problem any longer. The cron job covers the rest.

Thank you for your support!

Q. For users running Mongodb on a WIndows machine - will they need to configure a scheduled task job to ensure the logs don’t get to big?
Or should the --quiet option be keeping the log within reasonable parameters in the future?
(just curious)

From what I’ve read, the --quiet option should keep the log size reasonable. Our repository installer also no longer includes the --logappend option, which makes mongodb start a new log when it is restarted, rather than append to the existing log.

Cheers,

  • Ryan

The “–quiet” argument reduced the log a lot. Besides that I use the information from this page (docs.mongodb.org/manual/tutorial … log-files/) to rotate the log on a daily base.

I just want to do a follow up on this to sum it up. In general it would be useful to have a little bit more information regarding the MongoDB installation inside the Deadline documentation. When I first set up the MongoDB I followed the steps of their documentation (docs.mongodb.org/manual/installation/).

It is very straight forward, but it doesn’t cover two important points which were very important to us and which are not very obvious if you are not familiar with MongoDB.

[]The default connection limits of Linux have to change. MongoDB recommends different settings, but they are not enforced.[/]
[]The quiet parameter is not set in “/etc/init.d/mongodb”. This is helpful at the beginning, but very annoying when Deadline is running in production.[/]

Yup, our plan is to definitely mention the connection limits of Linux (and Mac OSX) in our documentation.

The --quite argument is now included on the mongodb file that our repository installer installs. This was a more recent change though, and it’s likely that the version of the repository installer you guys used didn’t include it yet.

Cheers,

  • Ryan

hi, guys, am haviing serious issues trying to connect deadline in mac, to my repositories… i have the current info:

deadline configuration error:an error occurred while trying to connect to the database (node01-pc:27017). it is possible that the mongo database server is incorrectly configured, currently offlinem or experiencing network issues.
full error:unable to connect to server Node01-pc:27017:no such host is known.

i would really appreciate it somekind of help, as am not too experienced in this kind of network technical things, and i will be very happy to learn too.
best regards, and please if you can explain for dummies, it ca be great!

thanks!,

best regards :slight_smile:

Privacy | Site terms | Cookie preferences