resume failed job

LaszloSebo · May 10, 2013, 12:22am

Hi there,

Im a bit fuzzy as how this works now. In deadline 5, when a job accumulated 100 errors (or as many as your limit), it would get failed. To resume it, i normally would clear its errors, then resume.

However now, if i clear the error reports (which takes a LOOONG time btw…), and then resume the job, it gets failed right again.

So how am i supposed to do this without upping the error limit? Once the job error reports are cleared, the Error counter in the job list shows 0, yet the Task error timers are showing values in the 30-40s (with no corresponding reports…)

cheers,
laszlo

rrussell · May 10, 2013, 2:44pm

Hey Laszlo,

I can’t reproduce this behavior. However, I did discover a bug in RC2 that prevented job failure detection from working at all.

I’m guessing you guys are still on RC1. The bug I ran into will be fixed in RC3, so you might want to wait until that’s available (which should be next week). If you still see this behavior in RC3, let us know!

Cheers,

Ryan

cbond · May 10, 2013, 3:33pm

what about the slow error report deleting?

rrussell · May 10, 2013, 3:39pm

I think that’s something we’ll have to look into post-6.0. I’m sure the slowness is because it’s deleting physical log files off the repository. We’d have to change this so that a deleted flag gets set for reports so that the background housecleaning process can remove them from disk later. We’ll definitely make this a priority for 6.1 though.

cbond · May 10, 2013, 3:49pm

great. think async!

LaszloSebo · May 13, 2013, 4:51pm

Cool, thanks! Will wait for rc3 and report back if i bump into this again!

cheers,
l