Hello,
I am running into a problem where if an Arnold task gets requeued (either manually or due to auto task cancellation when a job exceeds its error limit), it leaves behind a .tmp file which can fill up the machines’ local storage and cause “No space left on device” errors on all subsequent tasks that the worker acquires. I have run several tests which confirm that this is what’s happening. When a task completes or fails, it’s no problem - the files get cleaned up. But when it gets requeued, the file remains. Very annoying.
I am trying to write an event plugin that would trigger when a task gets canceled or requeued to clean up the files, but I can’t find any callback that serve this purpose. There is ‘OnJobError’ which triggers when a single task fails, and there is ‘OnJobRequeued’ which triggers when the job itself gets requeued, but nothing for when an individual task gets requeued. Post-task scripts will not work either, because they also do not get triggered when on task requeue/cancellation.
Has anyone run into this problem before? If it’s not possible to do using callbacks, has anybody figured out any other way of executing code whenever a task gets requeued? This issue is causing my company a lot headaches right now.
Thanks,
-KJB