Hey all,
Lately we’ve been looking at Job Reports and their potential to really balloon the size of the DB if left unattended. There are several things we can do to reduce the impact of this, and we are definitely going to do what we can – for example, in Beta 13, individual reports should be taking up half the amount of space in the DB that they used to.
However, there are also a few other things we’ve discussed doing that may be a bit more controversial, and we wanted to bring them up here so that we could have a discussion about it before we do anything. So here goes!
First, we’re talking about doing away with Requeue Reports. They use up a lot of space relative to their usefulness; we figured that Requeues should all be logged in the Job History (we’ll make sure they are if they aren’t already), and that individual reports for each Task is kind of unnecessary.
Second, we discussed capping Render Logs to only 1 per task (in other words, we would only keep the latest Render Log for each task). The rationale here is that only the latest one of these should be relevant.
Third, we are considering forcing Job Failure Detection on, probably with a very generous default threshold (say, 500-1000 errors). This is mostly for preventing slaves from generating an indefinite amount of errors on an overnight job, or if no one is paying attention to that particular Job at the time. I feel like this is important particularly because we’ve dramatically lowered the time a Slave waits in between Jobs, so in the worst case (job failing right away), Error Reports can grow out of control much quicker than they did in Deadline 5.
Finally – and this is probably the most controversial/impactful change we’ve discussed – we are thinking of putting a hard cap on the number of a Job’s Error Reports we keep around per Job. How (or even if) we do this is mostly open to discusssion. There are a lot of different options here; we could enforce the cap on a per-Task or per-Slave basis, or just for the overall Job. There’s also a choice in which reports we keep; we could either keep only the first X reports (and no more until they get deleted), or we could keep cycling through reports and only keep the latest X reports. No matter how you slice it, there are reports that aren’t going to be kept around; the challenge here is making sure we are keeping a good distribution of the relevant ones to cover end-user needs.
Now, all of these are up for debate, we haven’t decided for sure to implement any of these changes. I wanted to bring these up here before implementing any of these changes (especially the last one) to get your opinions on the matter. We want to make sure we’re accounting for the most common use-cases of Job Reports before doing anything drastic!
Of course, any other ideas to reduce Job Reports’ footprint in the DB are definitely welcome!
NOTE: Keep in mind for this discussion that the bulk of the reports are actually stored in (compressed) log files in the Repository; the main thing that affects the size in the DB is the sheer number of reports
Cheers,
- Jon