d7 monitor less responsive than d6?

LaszloSebo · January 6, 2015, 1:44am

Not sure what it might be (maybe pyqt5?) and since this is not really possible to measure i don’t have exact metrics, but the deadline7 monitor seems more sluggish than deadline6’s

Menus appear with a slight delay (1-3 seconds), moving separators is slower, moving between menu items is slow, opening settings dialogs takes seconds etc. Is this a known issue?
For example, clicking on menus, i first see its drop shadow being drawn, then the items show up. In deadline6, its instant.

We currently only have 30-40 jobs in d7 (albeit a lot of slaves), but with 200x that many jobs, the deadline6 monitor feels faster.

rrussell · January 6, 2015, 4:18pm

Hey Laszlo,

We’ll look into this ASAP, but I just want to check if you have dynamic sorting and filtering enabled in the Monitor panels in v7, but not v6, as that could explain the difference in performance.

Also, how many slaves do you currently have in your v7 Monitor?

Thanks!
Ryan

LaszloSebo · January 6, 2015, 5:11pm

Dynamic sorting is on in both versions. I tried turning it off in d7, but this slow response time did not really change. Even closing all slave panels, it was still somewhat slower than d6 with 4+ slave panels open.
Ill try to record a camtasia if i get some time today.

Current slave count is 2229 in d7 (2249 in d6)

LaszloSebo · January 6, 2015, 5:36pm

Attached is a camtasia recording. Note that while doing this video i noticed a couple of things… first, that the responsiveness is dependent on the machine you are using.
My workstation is much slower than this (but it didn’t have camtasia installed, so i had to use another box). The delays between the slow periods on my workstation are shorter, maybe 10 seconds or so, and when the menu comes up, you can see it being drawn.

Also, on my workstation the menus have drop shadows. Note that the drop shadows don’t appear in this relatively faster monitor in the camtasia. Weird…
So maybe some of the gui features work on some machines (making it slower on those), but not on others?
deadline7_monitor_menu_speed.mov (8.06 MB)

ryangagnon · January 6, 2015, 9:34pm

Just out of curiosity, are the Performance Settings set for the number of slaves in your d7 farm? And are they the same as the values in your d6 farm?

Ryan G.

LaszloSebo · January 6, 2015, 10:46pm

The settings are near identical, some were set back to defaults because we anticipate better performance due to d7 splitting the databases (we have severe locking/mongo queue issues with the default settings in d6).

Green shows update intervals that are actually longer in d7 then in d6.
Red shows update intervals that are shorter in d7 compared to d6 (note, all of which are shorter should only affect the mongo db, not the monitor, since those intervals are same or longer)

Monitor refresh intervals:
d6:
job updates: 10s
slave updates: 30s
pulse updates: 90s
limit updates: 30s
settings updates: 120s
cloud updates: 90s

d7:
job updates: 30s
slave updates: 30s
pulse updates: 90s
limit updates: 30s
settings updates: 120s
cloud updates: 90s
balancer updates: 30s

Slave intervals:
d6:
slave info updates: 180s
task query while rendering: 2s
task query while idle: 180s
stalled threshold: 10s
pulse wait: 60s
thermal shutdown: 600s

d7:
slave info updates: 20s
task query while rendering: 2s
max task query while idle: 320s
minimum task query time while idle: 90s
multiplier: 0.15

rrussell · January 7, 2015, 2:27pm

Thanks for sharing your settings! I think the problem is a result of the shorter interval for slave info updates (in red). This setting does impact the Monitor because with a slave update interval of 30s, and a slave info interval of 20s, the Monitor will updating every active slave at every interval (which is 2000+ updates).

Can you try bumping the slave info updates (in red) to 60s and see if that helps? At least then, on average, the Monitor will only be processing half the number of active slaves on each update.

Cheers,
Ryan

LaszloSebo · January 7, 2015, 6:36pm

Thanks Ryan, this seems to have made a difference! I set it to 180 to match deadline6

LaszloSebo · January 7, 2015, 6:42pm

Would it worth having different slave update times for different slave states? Say, an idle machine could update much less frequently than a machine thats actively rendering?
In addition, maybe only updating the status if there are significant changes.

For example, while memory / cpu usage might change all the time, rendering status changes less frequently, version/pools/limit groups/whitelists are essentially one off changes very infrequently and some of the fields almost never change (cpu count, speed, total memory, operating system, gfx card etc)

Is all slave data currently updated by the slave and then rerequested by the monitor? Or just information that can potentially change?

rrussell · January 7, 2015, 6:57pm

We had actually attempted to do this for 7.0 (the initial public release actually did this). However, during our cloud scaling tests, we discovered that this performed quite poorly compared to sending the the full slave info state. That’s why we put out the 7.0.1 patch release so quickly to address this.

Also, this wouldn’t impact the Monitor’s performance because checks are already made to only update data for a cell that has changed.

We already have a couple ideas on how we can improve the performance of the list controls in the Monitor, and they’re currently on the roadmap for 8.0.

Cheers,
Ryan

LaszloSebo · January 8, 2015, 7:36pm

Did some time profiling using cProfile of the gui in d7, to try to narrow down why interactivity (moving splitters etc) might be slow.

In this session, i opened the monitor for about 45 seconds, and then dragged the splitter between jobs / slaves up and down (i get only about 2-4fps doing that) constantly till shutdown:

MOVING SPLITTER:

2015-01-08 10:33:37:           8256329 function calls (8256187 primitive calls) in 44.652 seconds
2015-01-08 10:33:37:     Ordered by: cumulative time
2015-01-08 10:33:37:     List reduced from 948 to 100 due to restriction <100>
2015-01-08 10:33:37:     ncalls  tottime  percall  cumtime  percall filename:lineno(function)
2015-01-08 10:33:37:          1   18.893   18.893   43.987   43.987 {built-in method exec_}
2015-01-08 10:33:37:       2362    0.080    0.000    7.400    0.003 ..\Submodules\ThinkboxUI\UI\Models\UpdateModel.py:72(onUpdateTimerTimeout)
2015-01-08 10:33:37:     135518    0.273    0.000    5.728    0.000 ..\Submodules\ThinkboxUI\UI\Models\MultiColumnSortFilterProxyModel.py:65(headerData)
2015-01-08 10:33:37:     148964    5.218    0.000    5.218    0.000 {built-in method sourceModel}
2015-01-08 10:33:37:         49    0.501    0.010    3.861    0.079 {built-in method endInsertRows}
2015-01-08 10:33:37:     736258    1.866    0.000    3.023    0.000 ..\Submodules\ThinkboxUI\UI\Models\ListModel.py:86(index)
2015-01-08 10:33:37:       3492    0.446    0.000    2.946    0.001 {method 'emit' of 'PyQt5.QtCore.pyqtBoundSignal' objects}
2015-01-08 10:33:37:       2541    0.202    0.000    2.735    0.001 DeadlineMonitor\UI\Controls\JobListControl.py:1477(paint)
2015-01-08 10:33:37:     603523    0.890    0.000    2.375    0.000 ..\Submodules\ThinkboxUI\UI\Models\ListModel.py:60(data)
2015-01-08 10:33:37:       2358    0.668    0.000    1.680    0.001 DeadlineUI\UI\Models\SlaveListProxyModel.py:160(onRowsInserted)
2015-01-08 10:33:37:     975364    1.598    0.000    1.598    0.000 {built-in method createIndex}
2015-01-08 10:33:37:          1    0.001    0.001    1.548    1.548 C:\Program Files\Thinkbox\Deadline7\bin\UI\DeadlineMonitor\Forms\MainWindow.py:273(saveBeforeQuit)
2015-01-08 10:33:37:          1    1.500    1.500    1.500    1.500 {time.sleep}
2015-01-08 10:33:37:     231738    0.619    0.000    1.474    0.000 ..\Submodules\ThinkboxUI\UI\Models\TreeModel.py:206(index)
2015-01-08 10:33:37:      12627    0.526    0.000    1.451    0.000 DeadlineUI\UI\Models\SlaveListProxyModel.py:397(filterAcceptsRow)
2015-01-08 10:33:37:     603523    1.311    0.000    1.311    0.000 ..\Submodules\ThinkboxUI\UI\Models\UpdateModel.py:18(data)
2015-01-08 10:33:37:         54    0.000    0.000    1.186    0.022 C:\Program Files\Thinkbox\Deadline7\bin\lib\traceback.py:238(format_exc)
2015-01-08 10:33:37:         54    0.000    0.000    1.186    0.022 C:\Program Files\Thinkbox\Deadline7\bin\lib\traceback.py:130(format_exception)
2015-01-08 10:33:37:         54    0.000    0.000    1.186    0.022 C:\Program Files\Thinkbox\Deadline7\bin\lib\traceback.py:74(format_tb)
2015-01-08 10:33:37:         54    0.000    0.000    1.185    0.022 C:\Program Files\Thinkbox\Deadline7\bin\lib\traceback.py:78(extract_tb)
2015-01-08 10:33:37:         58    0.000    0.000    1.185    0.020 C:\Program Files\Thinkbox\Deadline7\bin\lib\linecache.py:13(getline)
2015-01-08 10:33:37:         58    0.000    0.000    1.185    0.020 C:\Program Files\Thinkbox\Deadline7\bin\lib\linecache.py:33(getlines)
2015-01-08 10:33:37:         58    0.009    0.000    1.185    0.020 C:\Program Files\Thinkbox\Deadline7\bin\lib\linecache.py:68(updatecache)
2015-01-08 10:33:37:       1102    1.165    0.001    1.165    0.001 {nt.stat}
2015-01-08 10:33:37:       2598    0.035    0.000    1.090    0.000 DeadlineUI\UI\Models\SlaveListProxyModel.py:203(onDataChanged)
2015-01-08 10:33:37:      87643    0.597    0.000    1.077    0.000 {built-in method data}
2015-01-08 10:33:37:      12761    0.104    0.000    0.925    0.000 ..\Submodules\ThinkboxUI\UI\Models\MultiColumnSortFilterProxyModel.py:145(filterAcceptsRow)
2015-01-08 10:33:37:     200770    0.405    0.000    0.913    0.000 ..\Submodules\ThinkboxUI\UI\Models\TreeModel.py:139(data)
2015-01-08 10:33:37:       2541    0.106    0.000    0.766    0.000 DeadlineMonitor\UI\Controls\JobListControl.py:1417(paint)

Then in another session, i didn’t touch the gui at all for ~2 mins (fresh start):

STATIC SESSION:

2015-01-08 11:09:19:           3960137 function calls (3959995 primitive calls) in 118.143 seconds
2015-01-08 11:09:19:     Ordered by: cumulative time
2015-01-08 11:09:19:     List reduced from 951 to 100 due to restriction <100>
2015-01-08 11:09:19:     ncalls  tottime  percall  cumtime  percall filename:lineno(function)
2015-01-08 11:09:19:          1  101.560  101.560  117.567  117.567 {built-in method exec_}
2015-01-08 11:09:19:      52021    0.936    0.000   10.957    0.000 ..\Submodules\ThinkboxUI\UI\Models\UpdateModel.py:72(onUpdateTimerTimeout)
2015-01-08 11:09:19:      43789    0.680    0.000    5.159    0.000 {method 'emit' of 'PyQt5.QtCore.pyqtBoundSignal' objects}
2015-01-08 11:09:19:         50    0.555    0.011    3.693    0.074 {built-in method endInsertRows}
2015-01-08 11:09:19:      19977    0.293    0.000    2.216    0.000 DeadlineUI\UI\Models\SlaveListProxyModel.py:397(filterAcceptsRow)
2015-01-08 11:09:19:      20114    0.441    0.000    1.923    0.000 ..\Submodules\ThinkboxUI\UI\Models\MultiColumnSortFilterProxyModel.py:145(filterAcceptsRow)
2015-01-08 11:09:19:       2343    0.120    0.000    1.688    0.001 DeadlineUI\UI\Models\SlaveListProxyModel.py:160(onRowsInserted)
2015-01-08 11:09:19:     348133    1.145    0.000    1.685    0.000 ..\Submodules\ThinkboxUI\UI\Models\ListModel.py:86(index)
2015-01-08 11:09:19:      13170    0.102    0.000    1.636    0.000 DeadlineMonitor\UI\Controls\SlaveListControl.py:791(onDataChanged)
2015-01-08 11:09:19:      40354    1.584    0.000    1.584    0.000 {built-in method sourceModel}
2015-01-08 11:09:19:       4212    0.191    0.000    1.543    0.000 DeadlineMonitor\UI\Controls\SlaveListControl.py:249(calculateSlaveTypes)
2015-01-08 11:09:19:          1    0.001    0.001    1.532    1.532 C:\Program Files\Thinkbox\Deadline7\bin\UI\DeadlineMonitor\Forms\MainWindow.py:273(saveBeforeQuit)
2015-01-08 11:09:19:          1    1.500    1.500    1.500    1.500 {time.sleep}
2015-01-08 11:09:19:      79392    0.535    0.000    1.431    0.000 {built-in method data}
2015-01-08 11:09:19:       6503    0.066    0.000    1.131    0.000 DeadlineUI\UI\Models\SlaveListProxyModel.py:203(onDataChanged)
2015-01-08 11:09:19:     260726    0.809    0.000    1.097    0.000 ..\Submodules\ThinkboxUI\UI\Models\ListModel.py:60(data)
2015-01-08 11:09:19:      19100    0.043    0.000    0.951    0.000 ..\Submodules\ThinkboxUI\UI\Models\MultiColumnSortFilterProxyModel.py:65(headerData)
2015-01-08 11:09:19:         55    0.000    0.000    0.826    0.015 C:\Program Files\Thinkbox\Deadline7\bin\lib\traceback.py:238(format_exc)
2015-01-08 11:09:19:         55    0.000    0.000    0.826    0.015 C:\Program Files\Thinkbox\Deadline7\bin\lib\traceback.py:130(format_exception)
2015-01-08 11:09:19:         55    0.000    0.000    0.826    0.015 C:\Program Files\Thinkbox\Deadline7\bin\lib\traceback.py:74(format_tb)
2015-01-08 11:09:19:         55    0.000    0.000    0.825    0.015 C:\Program Files\Thinkbox\Deadline7\bin\lib\traceback.py:78(extract_tb)
2015-01-08 11:09:19:         58    0.000    0.000    0.825    0.014 C:\Program Files\Thinkbox\Deadline7\bin\lib\linecache.py:13(getline)
2015-01-08 11:09:19:         58    0.000    0.000    0.825    0.014 C:\Program Files\Thinkbox\Deadline7\bin\lib\linecache.py:33(getlines)
2015-01-08 11:09:19:         58    0.007    0.000    0.825    0.014 C:\Program Files\Thinkbox\Deadline7\bin\lib\linecache.py:68(updatecache)
2015-01-08 11:09:19:       1102    0.808    0.001    0.808    0.001 {nt.stat}
2015-01-08 11:09:19:      14845    0.591    0.000    0.591    0.000 {built-in method model}
2015-01-08 11:09:19:      19693    0.270    0.000    0.588    0.000 ..\Submodules\ThinkboxUI\UI\Models\GroupByUpdateModel.py:226(onUpdateTimerTimeout)
2015-01-08 11:09:19:     362909    0.568    0.000    0.568    0.000 {built-in method createIndex}
2015-01-08 11:09:19:      72884    0.560    0.000    0.560    0.000 {built-in method start}
2015-01-08 11:09:19:          1    0.011    0.011    0.559    0.559 C:\Program Files\Thinkbox\Deadline7\bin\UI\DeadlineMonitor\Forms\MainWindow.py:420(loadSettings)
2015-01-08 11:09:19:      21158    0.130    0.000    0.540    0.000 ..\Submodules\ThinkboxUI\UI\Models\MultiColumnSortFilterProxyModel.py:195(regexFilterAcceptsRow)
2015-01-08 11:09:19:        410    0.112    0.000    0.514    0.001 C:\Program Files\Thinkbox\Deadline7\bin\UI\DeadlineMonitor\Forms\MainWindow.py:1838(onNewDataTimerTimeout)
2015-01-08 11:09:19:       1452    0.310    0.000    0.421    0.000 {built-in method setStyleSheet}
2015-01-08 11:09:19:       4212    0.196    0.000    0.399    0.000 DeadlineMonitor\UI\Controls\SlaveListControl.py:216(calculateUtilization)
2015-01-08 11:09:19:      52132    0.234    0.000    0.376    0.000 C:\Program Files\Thinkbox\Deadline7\bin\lib\threading.py:146(acquire)

In the static session, after UpdateModel (which is running in the background, so is expected to take a long time), the next entry is {method ‘emit’ of ‘PyQt5.QtCore.pyqtBoundSignal’ objects}, which im guessing is signals to the GUI to update. This all makes sense.

However, when the splitters are being moved around, around 40% of the time is spent doing. I’m guessing, not much can be do about this, right? (Really weird that it calls endInsertRows when nothing is changing just the size of the window):

2015-01-08 10:33:37: 135518 0.273 0.000 5.728 0.000 …\Submodules\ThinkboxUI\UI\Models\MultiColumnSortFilterProxyModel.py:65(headerData)
2015-01-08 10:33:37: 148964 5.218 0.000 5.218 0.000 {built-in method sourceModel}
2015-01-08 10:33:37: 49 0.501 0.010 3.861 0.079 {built-in method endInsertRows}
2015-01-08 10:33:37: 736258 1.866 0.000 3.023 0.000 …\Submodules\ThinkboxUI\UI\Models\ListModel.py:86(index)

LaszloSebo · January 8, 2015, 7:51pm

Based on my mini research, most of the monitor gui speed limitations are due to pyqt.
Do you see any chance in the future to go c++/qt for performance? The sluggishness can be a real downer :-\

In the previous log, i could optimize out the function call to:
C:\Program Files\Thinkbox\Deadline7\bin\UI\ThinkboxUI\Models\MultiColumnSortFilterProxyModel.py:67(headerData)
by caching the sourceModel into an internal variable (basically went from needing 5-6 seconds to needing 1 for about the same amount of function calls):

In MultiColumnSortFilterProxyMode.py:

    def setSourceModel(self, sourceModel):
        super(MultiColumnSortFilterProxyModel, self).setSourceModel(sourceModel)
        self._sourceModel = sourceModel

    def headerData(self, section, orientation, role = Qt.DisplayRole):
        return self._sourceModel.headerData(section, orientation, role)

The rest of the functions don’t seem to be optimizable any further. I think the overhead of python here is the real killer, hundreds of thousands of small function calls etc.

rrussell · January 8, 2015, 9:15pm

For Deadline 8, we currently plan on looking at moving the list controls and their data models into a separate c++ library, similar to how the node view works. The theory is that this will speed things up because all the heavy computation will be done in c++ instead of python. If this works, the rest of the python code can be left alone, since the vast majority of computation is done in the list controls.

Cheers,
Ryan

rrussell · January 8, 2015, 9:16pm

Also, thanks for the optimization suggestion. We should be able to get that into 7.1.

LaszloSebo · January 8, 2015, 9:22pm

Thanks Ryan, i think this will make a huge difference for us! It seems like most of the time is spent in a crazy amount of calls between the model / controls. Which of course is all expected, but the added overhead of python/pyqt might make it slower than it could be with the amount of data we have.