mongo replica set failure

LaszloSebo · April 2, 2014, 4:14pm

Hi there,

Over the last 6-8 months of using replica sets for the repository, we had a couple of isolate outage incidents. There is one thing that i’ve noticed and is a bit weird, and that after a failure, the replica set member never recovers on its own. I left it in recovering state for 24+ hours, and it never comes back. The only way to bring it back online is to manually clear all the db folders, and have it do a full initial sync.

This is a bit worrying, did you guys test this behavior? This means that even after simple maintenance (reboot etc), we have to fully resync the replica sets. Could it be that the nature of the deadline db traffic just makes recovery impossible?

rrussell · April 2, 2014, 6:21pm

I found this:
docs.mongodb.org/manual/tutorial … et-member/

The behavior you’re seeing would indicate that the replica set has become stale because it’s too far behind. Increasing the oplog size would probably help here:
docs.mongodb.org/manual/tutorial … et-member/

After an outage, do you bring back the primary or the replica set first? If the primary comes up first, and then a lot of oplog writes happen before the replica set comes up, perhaps that might explain this? Based on the docs I’ve read, mongodb sets aside 5% of the available disk space for the oplog on linux, so I’d be curious to know how big your oplog is…

Cheers,
Ryan

LaszloSebo · April 2, 2014, 9:25pm

The primary in this case never went down, only the secondary (seems like mongo crashed, the machine was up and running).

Our current oplog covers ~20hrs right now (averages at around 2gigs / hour). I allocated 32gigs for the oplog … thats pretty large already, considering the default. What setting do you usually recommend?

rrussell · April 3, 2014, 3:46pm

How long was the secondary down for? If your current oplog covers 20 hours, and the secondary was down longer than that, that would explain why it ended up in a stale state.

I guess you want your oplog size to be able to cover the maximum window of time between when a secondary could go down and when it can be brought back online. So maybe 24-48 hours worth? That ends up being ~64 GB based on what you’re reporting…

LaszloSebo · April 3, 2014, 4:30pm

It was down longer than the oplog period in this case.

The main issue for me though is not that it didnt warm-restart, but that the server never recovered after that. We have waited 24+ hours for it to catch up, and it never did. Even with out of data oplogs, it should recover even if it takes longer, am i wrong?

rrussell · April 3, 2014, 4:55pm

This is from the first link I posted:

So it’s not possible for the replica set to catch up in this case because some of the data it needs to catch up has already been overwritten by the primary.

LaszloSebo · April 3, 2014, 5:02pm

So doesnt mongo in this case simply force a full resync? It was simply saying ‘recovering’, suggesting that it was trying to recover

LaszloSebo · April 3, 2014, 5:03pm

Nevermind, fully read the documentation now. I guess my problem is that mongo never says ‘this member is now stale’, but instead is in a status suggesting its in a valid state

LaszloSebo · April 3, 2014, 5:05pm

In your tests, how large oplogs did you guys have? I feel that 32g should be plenty… to have 3 days of recovery time (say, an unattended weekend outage), would require 110g+

rrussell · April 3, 2014, 5:09pm

When we tested replica sets, we didn’t test on the scale that you guys are at, so we never had to increase our oplog size. Considering how write-heavy Deadline is, I guess it’s not entirely surprising how large the oplog can get.

LaszloSebo · April 3, 2014, 6:10pm

When things are a bit slower, ill test a ‘within oplog time threshold’ outage to see how it behaves. We had situations previously when that was the case, and if memory serves me well i had to do a full resync independent of whether the oplog was still valid or not.

Might worth a mention in the docs if that is the case?

rrussell · April 3, 2014, 8:12pm

Yup, definitely worth mentioning. We’re actually in the process of writing internal docs regarding replica sets, and other mongodb related stuff, with the plan that this will eventually be added to our user guide. We’ll be sure to mention this in those docs.

Cheers,
Ryan