Do we need to do any special configuration for deadline to use mongodb replica sets?
Also, the mongodb documentation mentions that replica sets can be used to increase read capacity:
“In some cases, you can use replication to increase read capacity. Clients have the ability to send read and write
operations to different servers.”
No, nothing special needs to be done to use replica sets.
You can’t however use them to increase read capacity. The big reason is that reads from the replica sets have eventual consistency, meaning they won’t always match the primary server. That’s where we would like to see sharding come into play (to spread the load). Unfortunately though, because Deadline uses some server-side java scripts, it can’t work with sharding yet. This is something we want to address in Deadline 7.
If replica sets cant be used to reduce load, and sharding does not work, we can’t really make deadline scale better without beefing up the primary machine, correct?
Sidenote: can replica sets be used to live migrate the database to another machine? Something like:
Set up a secondary machine
declare the 2 machines as a replica set
remove the original primary (essentially making the previous secondary now the primary box)
We made a small test setup for the replica sets, and did a simulated failure.
The secondary mongo box successfully got elected to be primary within seconds, however deadline never picked up on that. Its been timing out since we took down the original primary basically.
Does deadline need to be configured special to make this work?
When doing the simulated failure, the monitor would throw these errors:
2013-12-10 15:38:43: Error occurred while updating job cache: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host. (System.IO.IOException)
2013-12-10 15:38:44: Error occurred while updating slave cache: Unable to connect to server deadline03.scanlinevfxla.com:27017: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.. (MongoDB.Driver.MongoConnectionException)
2013-12-10 15:38:47: Error occurred while updating slave reports: An error occurred while trying to connect to the Database (deadline03.scanlinevfxla.com:27017). It is possible that the Mongo Database server is incorrectly configured, currently offline, blocked by a firewall, or experiencing network issues.
2013-12-10 15:38:47: Full error: Unable to connect to server deadline03.scanlinevfxla.com:27017: No connection could be made because the target machine actively refused it 172.18.1.107:27017. (FranticX.Database.DatabaseConnectionException)
2013-12-10 15:38:48: Error occurred while updating task cache: An error occurred while trying to connect to the Database (deadline03.scanlinevfxla.com:27017). It is possible that the Mongo Database server is incorrectly configured, currently offline, blocked by a firewall, or experiencing network issues.
2013-12-10 15:38:48: Full error: Unable to connect to server deadline03.scanlinevfxla.com:27017: No connection could be made because the target machine actively refused it 172.18.1.107:27017. (FranticX.Database.DatabaseConnectionException)
2013-12-10 15:38:49: Error occurred while updating job reports: An error occurred while trying to connect to the Database (deadline03.scanlinevfxla.com:27017). It is possible that the Mongo Database server is incorrectly configured, currently offline, blocked by a firewall, or experiencing network issues.
2013-12-10 15:38:49: Full error: Unable to connect to server deadline03.scanlinevfxla.com:27017: No connection could be made because the target machine actively refused it 172.18.1.107:27017. (FranticX.Database.DatabaseConnectionException)
2013-12-10 15:38:50: Error occurred while updating limit group cache: Unable to connect to server deadline03.scanlinevfxla.com:27017: No connection could be made because the target machine actively refused it 172.18.1.107:27017. (MongoDB.Driver.MongoConnectionException)
However once the other machine was elected primary, its still failing with similar errors:
2013-12-10 15:39:45: Error occurred while updating pulse cache: Unable to connect to server deadline03.scanlinevfxla.com:27017: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 172.18.1.107:27017. (MongoDB.Driver.MongoConnectionException)
2013-12-10 15:40:27: Error occurred while updating Cloud Instances: An error occurred while trying to connect to the Database (deadline03.scanlinevfxla.com:27017). It is possible that the Mongo Database server is incorrectly configured, currently offline, blocked by a firewall, or experiencing network issues.
2013-12-10 15:40:27: Full error: Unable to connect to server deadline03.scanlinevfxla.com:27017: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 172.18.1.107:27017. (FranticX.Database.DatabaseConnectionException)
2013-12-10 15:40:27: at Deadline.StorageDB.MongoDB.MongoDBUtils.HandleException(MongoServer server, Exception ex)
2013-12-10 15:40:27: at Deadline.StorageDB.MongoDB.MongoCloudStorage.GetCloudRegions(Boolean invalidateCache)
2013-12-10 15:40:27: at Deadline.StorageDB.CloudStorage.UpdateData()
>client.test.test.find_one()
Exception: AutoReconnect: could not connect to deadline03.scanlinevfxla.com:27017: [Errno 10061] No connection could be made because the target machine actively refused it
10 seconds later, when deadline01 became the primary, i tried again:
>client.test.test.find_one()
And it was fine.
Is there a replica set name i should be using to make deadline recognize it as such?
You should just need to specify both servers, separated by a semicolon, in the dbConnect.xml file in the settings folder in the Repository. For example:
We should add support for the Replica Set Name property though, which will allow you to set up replica sets without needing to specify every single host in the dbConnect.xml file: docs.mongodb.org/manual/referenc … on-string/
Cool, ill test this today! The xml answers a question i forgot to ask, what happens if a machine starts a new process (deadlinecommand), which cant query the node list from the main primary defined… But if they are all in the xml, that’s not a problem!