deleting jobs

LaszloSebo · January 31, 2013, 10:05pm

Currently seems to take a looooong time. Not sure if this is expected, i tried deleting about 20-30 jobs all at once.

Note that my machine is in a different location than the repository (would that be the reason? remotely deleting job/report files?)

LaszloSebo · January 31, 2013, 10:06pm

Deleting 4 jobs takes about 40 seconds or so.

rrussell · February 1, 2013, 4:03pm

Yeah, it’s the reports that cause this slowdown. In beta 12, deleting a job will no longer include deleting the reports or auxiliary files. Those will eventually be cleaned up by the housecleaning code.

If you are running Pulse, housecleaning is performed pretty regularly. If Pulse isn’t running, the slaves will eventually get to them between tasks.

LaszloSebo · February 1, 2013, 8:16pm

I am not sure if pulse is running to be honest, i ll have to double check =)

Deadline 6 is so fast, we might not even have installed it.

Cool, thanks for changing this behavior for beta12, we anticipate job counts up to 10-20k, so being able to mass delete would be necessary.

rrussell · February 1, 2013, 8:20pm

Pulse is no longer needed for the performance boost. In fact, it’s not used as a proxy at all anymore. Here’s all it does now:

Perform regular cleanup operations (release job dependencies, delete jobs marked for removal, etc). Note that the slaves will do this themselves if Pulse isn’t running, but it’s more random.
Power management. Redundancy for temperature checking is built into the slaves, but the other areas of Power Management require Pulse to be running.
Statistics gathering. Job statistics are handled by the slaves, but stats for the slaves and the repository in general require Pulse.
Slave Throttling (so that only a certain number of slaves can load a job at once, which can help network bandwidth).

So it’s still nice to have running, but Deadline’s performance is no longer dependent on it.

LaszloSebo · March 29, 2013, 8:56pm

Using beta 17, deleting jobs still seems to be fairly slow.

I am doing a regular cleanup, and trying to delete ~4600 jobs. Its been going for about 3h 40minutes so far. There is nothing in the logs, and i dont see any progress bar, so im not sure how long it will take, but i would expect it to be much faster, in the range of a couple of seconds

LaszloSebo · March 29, 2013, 11:50pm

It hit the 4hr mark now, i think ill kill it

Deleting even just a couple (~8) is also taking ages… it makes the monitor just hang. So for now our only option really is to just leave them :\

l

LaszloSebo · March 30, 2013, 12:08am

Deleting a single job takes about 30-50 seconds from a workstation.

If i delete it on the machine that hosts the mongo database (and pulse), its about 15 seconds… :\

jgaudet · April 1, 2013, 5:02pm

Hey Lazslo,

We’ll have another look at this to see if we can replicate it, though at this point we suspect it might be related to your other issue of Mongo using up a ton of CPU. At the very least, we could probably do this in the background, so that the Monitor isn’t locked up while you wait for Jobs to delete

Cheers,

Jon

jgaudet · April 1, 2013, 5:50pm

After looking into this a bit more, I was able to replicate this with a large amount of jobs. I don’t think it was quite as pronounced as what you’re seeing, but it locked up my Monitor for ~20-30 mins for 5,000 jobs. I suspect that it’s related to Event Plugins at this point, I’ll have a go at improving the way we call/check the event plugins.

Cheers,

Jon

MikeOwen · April 1, 2013, 8:51pm

FYI…
I think Ryan put a new event plugin in for “onJobDeleted” when I mentioned it the other day…which will be really useful for me, but not if it kills performance!

jgaudet · April 1, 2013, 10:38pm

I noticed I’ve made improvements to how it loads the Event Plugins in this case, but it doesn’t seem to have accounted for all of the delay I’m seeing when deleting a large amount of jobs. I’ll do some more tweaking.

Jon

cbond · April 7, 2013, 8:48pm

i like the idea of this happening in the BG as well. the point being able to manage 100k jobs reasonably
obviously we just have to make it fast…
cb