Feature Request: Execute on Failure

anon89511579 · March 19, 2013, 10:51pm

I would definitely be interested in executing code on slaves with X consecutive failures.

Events.DeadlineEventListener’s OnJobFailedCallback seems closest to being suitable for the job except this would mean a job is already dead before we can act to resolve an issue. I can handle the tracking of what/how often something failed myself but I would need an an OnJobErrorCallback instead.

The perfect use case would be a single slave lighting up the queue with red so it takes an environment dump, current usage stat dump, whether the machine can touch network resources, tries to resolve issues and/or finally reboots. If it continues to fail, we can disable it programatically.

rrussell · March 20, 2013, 12:13pm

We’ll add this to the wish list. We definitely see how this could be useful.

Cheers,

Ryan