what version are you using? are you on b8 yet?
cb
what version are you using? are you on b8 yet?
cb
yeah its beta8
The job candidate filter is better, but i still have it off to make things faster.
That is to be expected. We have to do some significant computation to figure out which slaves we can and can’t show based on pools, groups, machine lists, limits, bad slave detection, etc, which will certainly slow things down a bit. When the feature was implemented way-back-when, it was meant to be a debugging tool, and not something that was always on. As Jon has mentioned in the past, we recommend only turning it on when you need it.
We are making some gains in improving the speed at which the data is refreshed, which should be ready for beta 10 (I know, we still have to release beta 9, but that should happen this week). We’re trying to be much smarter about how we update, including only updating the actual cells in the tree view that have changed, and so far things are looking pretty promising.
Cheers,
Great to hear Ryan, thanks for the update! I wanted to send some camtasias, but its probably better to just wait for beta10 it seems
Does the update frequency respect when the last update happened?
Say, it updates at 10:00:00 and the update takes 5 seconds. Would it then wait 3 seconds for the next update to start? My monitor feels like it gets locked in a constant loop of update, i sometimes only get a <1second window to actually operate stuff in it
Ill try reinstalling the old beta later
(im getting similar feedback from other users btw, for Joe Scarr it takes about 8 seconds or so to enter his names to filter to those jobs, he said its surprisingly slow to work with it)
This is an issue that should be greatly helped by the changes we’re making for beta 10.
Ok cool, in the meantime, i uploaded a general usage camtasia. This instance of deadline is “not that bad”, its been pretty good on my machine. Some other users have minute long delays sometimes.
Thanks for posting that video. That is completely unacceptable, and beta 10 should make significant improvements to what you’re experiencing. The primary issue has been that the “background” thread we were creating the list items in was in Python, so it was competing with the main UI thread for the Python GIL. This is why you see the slowdown you’re seeing. In beta 10, this has all been moved out of Python, so now Python is only responsible for drawing the items, and we have been very impressed with the performance so far.
We’re hoping to get beta 10 out later this week.
Cheers,
Thanks Ryan!
Btw, i know you guys tested it with 50k jobs. But is it possible that a simulated environment didn’t put near the same amount of load on the database server as real life load with 20k jobs, thousands of active tasks, jobs with hundreds of logs, ~900 active slaves and 120 clients constantly pinging it?
Because the mongod process is almost always at 100%+ core usage, pulse is also constantly near 90-100% cpu usage, and there are running deadlinecommand processes on the pulse machine also using 1-200% cpu usage.
The monitor speed has become much worse in the last couple of days, its almost completely unusable now. Like, 20 second delays between clicks. Cant wait for beta10…
laszlo -
we have other clients in your size range in production, but they arent reporting any issues like you are. i have to presume that it is a different environment- one of the clients i am thinking of came from deadline 5.x and 6.0 was a huge performance increase. They have more slaves in one physical site than you do, but i’m not certain of the job count or tasks etc.
what is your mongo server running on, hardware-wise? forgive me if you’ve sent that in before. it’s possible to scale both vertically and horizontally with Mongo [caveat you can’t scale horizontally with Deadline…yet…]. Scaling vertically means of course faster metal in that server. Horizontal scaling would be like sharding, distributing the load over multiple machines. I’m confident we can deal with that, but based on the database logs you sent us the delays in Mongo haven’t been our concern. it seems to be performing really well…i have to say that i think the CPU usage isnt really telling the story.
keep sending us the log data from Mongo when you get into these crunch issues and we’ll look at it and ensure this is the case of course.
i’ll let Ryan respond to the pulse cpu usage/deadlinecmd process usage
cb
It will be interesting to see if this improves in beta 10. I imagine this is because of all the job archiving and dependency scripts that are being run, and we’ve now moved these into separate threads for beta 10. The deadlinecommand processes are probably the housecleaning processes, and in beta 10, they will just be dependency checking processes.
Cool, sorry for keeping this topic hot, but its a hot topic internally Obviously, the discussion is kinda moot till we get a chance to try beta10.
Is there a way i could send you a database snapshot? That way you could import that and see if its just simply the database, or a combination of the machine count, job activity etc.
We could even set up a parallel database / repository, that has no slaves, just the data, and see if thats any faster.
Actually, maybe its just the client… most of the times when the monitor is hanging, its using 100% of a core
Just to confirm, mongo is using 100% in this case (not the Monitor)? This would be a good time to get the database statistics from the help menu in the Monitor and post them!
mongo is almost always at 100+ cpu usage.
By client i meant the deadline monitor itself using 100%. I can get some more stats though, when mongo is 100+
Thanks! The stats show that while lock time is pretty low (only 2%), there are currently 35 readers waiting for a read lock. Mongo should be able to flush that pretty quickly though. Other than that, despite mongo running at the high cpu usage, the stats look pretty good. I guess we really need to wait and see how beta 10 performs (which we’re hoping to release tomorrow).