mongo locking / queue counts considerably higher in 8

LaszloSebo · May 18, 2016, 4:56pm

Hi there,

We have noticed that the mongo db locking and queue numbers are considerably higher in deadline8 then in 7

Any ideas?

LaszloSebo · May 18, 2016, 5:10pm

Connection count almost nearly doubled (d7 is already off)

Is this due to the multithreaded nature of the monitor? Or maybe the python sandboxes holding their own connections?

jgaudet · May 18, 2016, 6:12pm

More than likely the connection count is due to the sandboxes holding their own connections, which they do.

You can check to see how much of an impact the sandboxes (at least on the Slave) have by temporarily turning it off in the Repository options, under Slave Settings. Note that this only disables sandboxing for the application plugins – the event plugins will still be running in a sandbox with this setting off.

If that has no noticeable impact, it might be something changed in the Monitor that made it more spammy. I’ll have a look at comparing the two.

LaszloSebo · May 18, 2016, 6:19pm

In the meantime, we are upgrading to mongo 3.0.12. Do you think that would make any difference if we are still using split dbs with 8?
Currently on 2.6.9

LaszloSebo · May 18, 2016, 6:36pm

The locking seems to come entire from the Deadline8_Reports database btw.
second highest is limits:

reports:

jgaudet · May 18, 2016, 7:27pm

Interesting, that should help narrow it down. What’s the break down read vs write lock on that DB? I don’t think we changed much of anything with our report logging, so I imagine this is probably due to the Monitor spamming unnecessary queries to one of the Report collections. I’ll keep you posted.

In terms of having SplitDB on w/ MongoDB 3.0, it shouldn’t affect anything to have it on. It’s just defaulted to off now because the collection-level lock kind of eliminates the need to have separate DBs to begin with.

LaszloSebo · May 18, 2016, 7:54pm

I get this in the stats:

    deadline8db_Reports
        timeLockedMicros
            R: 7180525
            W: 0
            r: 2390957996
            w: 111293336004
        timeAcquiringMicros
            R: 11732086
            W: 0
            r: 64729519966
            w: 1183814283256

LaszloSebo · May 18, 2016, 10:19pm

The locking is maxed out flat now at around 200%… (the effective lock rates are ~3-4x worse than they were for deadline6 )

jgaudet · May 19, 2016, 12:43am

I’ve found a couple changes made to the report code that might be causing this, and the the amount of much larger amount of time spent under write lock seems to corroborate it.

I’m hoping I can revert that easily enough, and get another build out sooner rather than later.

LaszloSebo · May 19, 2016, 2:09am

Thanks Jon for the quick find! Could the higher lock counts cause some of the other issues we are seeing? (like machine limit stub getting stuck,inconsistent job/task states)

jgaudet · May 19, 2016, 7:48pm

Yeah, there honestly weren’t a lot of changes to the DB code that dealt with reports, so there aren’t that many potential culprits. I’ve stripped out the offending queries (they were largely unneeded). From my tests, Slaves running the new code will prevent older versions of the Monitor from updating reports properly, but I imagine that would be a small concern (just need to roll out the new version on artist machines as well). Found no issues with the new code seeing old reports, on the other hand.

In terms of this bug causing other issues, I suppose it’s possible. I would hope that the split databases keeps the impact mostly centered around reports, but it’s hard to know for sure. If you guys are looking to get your hands on this fix ASAP to roll it out, I can probably get a build going today and get it to you when it’s done.

LaszloSebo · May 19, 2016, 8:47pm

That would be great, thanks Jon, i would love to eliminate this as a factor.

nrusch · July 27, 2016, 5:00am

Hey guys, just wanted to see if this fix is in the latest 8.0 release.

ryangagnon · July 27, 2016, 1:40pm

Yes the fix for the reports collections is in the latest 8.0 release.