Monitor performance

LaszloSebo · August 15, 2017, 12:59pm

The deadline monitor’s refactoring for deadline8 was a life saver for us. In the past 1-2 years however, our farm and job counts grew in size,and it seems that we are hitting deadline’s usable performance limit again. We are getting more and more artist complaints about extremely slow monitor performance.

Please look at what could be improved, with job counts regularly between 15-25k, and slave count over 3.3k, things are getting pretty unbearable… :-\

eamsler · August 15, 2017, 3:30pm

I had about 160,000 idle jobs in my queue for awhile and it responded alright, but they just sat doing nothing. I’m guessing it might be the amount of data churn that’s syncing from the database.

Any guesses as to where we should concentrate our efforts?

LaszloSebo · August 15, 2017, 4:44pm

I’ll run it through a profiler and see if anything obvious pops up! Its also a bit random, sometimes its fairly OK for me and most people, but then we have a few folks where it takes 30-40 seconds to switch between jobs. We usually start by making sure the job candidate filter is off, but most people know not to forget that option in an on state.

eamsler · August 16, 2017, 2:09pm

Oh! I bet the candidate filter is still written in Python. One day I need to sit down and write a job right-click script to handle that in better detail…

I haven’t done C# profiling on Windows. How deep are you going to run the profiling? Any amount would be great, but at the C/Kernel level gets kinda tricky.

LaszloSebo · August 18, 2017, 10:38am

Honestly, previously i simply did an uncompile of the GUI code, and interjected a python profiler… Not sure how possible that still is with the new monitor, but might still work ok.

eamsler · August 18, 2017, 5:36pm

Yeah, should be fine. AFAIK there hasn’t been big churn there.

LaszloSebo · August 24, 2017, 12:03pm

I did a short deadline monitor session of clicking about, it seems that the biggest speed hit comes from JobListControl, SlaveListProxyModel, JobListProxyModel, all related to filters:

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     6432    0.025    0.000   41.219    0.006 UI/Controls/JobListControl.py:1368(quickFilterUpdateTimeout)
      240   25.239    0.105   41.172    0.172 UI/Controls/JobListControl.py:1374(updateQuickFilters)
  1029303   12.355    0.000   27.865    0.000 DeadlineUI/Models/SlaveListProxyModel.py:295(filterAcceptsRow)
535597/334156   10.273    0.000   22.618    0.000 DeadlineUI/Models/JobListProxyModel.py:274(filterAcceptsRow)
74187/74106    0.588    0.000   22.098    0.000 {method 'emit' of 'PyQt5.QtCore.pyqtBoundSignal' objects}

I’ve attached the full profile results. Ill leave it on for a few days, see if these stats change much.
deadlinestats.txt (94.1 KB)

LaszloSebo · August 25, 2017, 10:14am

Attached some more stats from a longer session. Similar results (most time spent in filters)
deadlinestats.txt (92.7 KB)

eamsler · August 25, 2017, 5:47pm

Thanks for all the profiling info Laszlo! I’ve moved it into the dev system.

It looks like things there are pretty lean, but it could be that moving more code in C++ could speed things up a little more. Waiting on the dev team to comment here though.

LaszloSebo · August 30, 2017, 10:48am

Thanks Edwin!
Is there a way we can escalate the issue to gain priority? I imagine this is not an ‘afternoon job’…

eamsler · August 30, 2017, 2:04pm

Good point. I’ll chase the guys here.

LaszloSebo · September 12, 2017, 6:14pm

Hi Edwin,

Any news on whether this could make it on the roadmap?

cheers
laszlo

eamsler · September 13, 2017, 2:07pm

We’ve got it in the system but the queue is pretty full. I’ve bumped the issue internally and I’ll go ask RR about it.