Data orphaned in temp directories on slave

awilson · July 17, 2015, 5:16pm

We had a problem last night where a slave machine completely filled its local storage. Upon examination, it looks like the deadline job wasn’t cleaning up after itself properly. In the temp directory for the job that had run on the machine, there were a bunch of copies of the scene file orphaned there. For instance, there were 14 copies of the scene file for frame 66 (it tried to run that many times, failing each time). I haven’t seen this before, so I’m assuming that it was something isolated to this case that caused it to fail. Has anyone seen this before? Is there any way to add a check to make sure that the temp files are removed?

eamsler · July 17, 2015, 7:47pm

It’s actually intentional, but we’ve found that it’s not a great plan on more constrained systems.

The idea was that while a job is running, if we need to pull some re-mapped files, they’d be around.

We have a discussion going internally about how we’d support cleaning things up between tasks. It’s a bit tricky at the moment because a job error is different than a job cancelled event and neither have a function that’s called for us to do the deletion in so we’re stuck relying on the Slave to clean things up when it picks up a new job. The only thing I can suggest for the mean time is lower the number of errors that can happen on a particular Slave via failure detection:
docs.thinkboxsoftware.com/produc … -detection

That should at least limit the number of scene files a specific Slave can generate for a job. The folder should clear out when the Slave moves onto the next one.

awilson · July 17, 2015, 9:56pm

Perfect, thanks! I turned that on, so hopefully we won’t see the problem anymore.