We have a number of jobs that, as of today, can’t be deleted, either from the Monitor or using deadlinecommand. The job’s user has tried, and I have also tried as the super user. The only error I get in the Monitor console is:
2015-11-11 10:23:48: Traceback (most recent call last):
2015-11-11 10:23:48: File "DeadlineUI\UI\Commands\JobCommands.py", line 1381, in InnerExecute
2015-11-11 10:23:48: DocumentException: Failed to Delete one or more Jobs, see application log for details.
2015-11-11 10:23:48: at Deadline.StorageDB.MongoDB.MongoJobStorage.DeleteJobs (System.String[] jobIDs) [0x00000] in <filename unknown>:0
2015-11-11 10:23:48: at Deadline.Controllers.DataController.DeleteJobs (Deadline.Jobs.Job[] jobs) [0x00000] in <filename unknown>:0
2015-11-11 10:23:48: at (wrapper managed-to-native) System.Reflection.MonoMethod:InternalInvoke (System.Reflection.MonoMethod,object,object[],System.Exception&)
2015-11-11 10:23:48: at System.Reflection.MonoMethod.Invoke (System.Object obj, BindingFlags invokeAttr, System.Reflection.Binder binder, System.Object[] parameters, System.Globalization.CultureInfo culture) [0x00000] in <filename unknown>:0
I have restarted the Monitor numerous times, I’ve tried editing some of the job properties, and I’ve changed the job’s frame range to try and get Deadline to snap out of it, but nothing has worked. I’ve diffed the job’s Mongo document against a job that I can delete just fine, and there’s nothing notable. I’ve checked the Monitor log, etc, etc.
Something weird is definitely going on. When I try to delete some of these jobs, all of their task documents seem to get blown away. I didn’t think that was supposed to happen until the job was fully purged from the queue.
The job still shows up in the Jobs pane, and I get the aforementioned error message in the console, but clicking on it causes the Tasks pane to display “Loading tasks…” forever.
This doesn’t always happen though. With some of them, nothing happens to the task documents when I try to delete them… I just get the error message.
If I remember right, others at Luma had previously reported this – assuming it’s the same issue, the problem happens when a job only gets partially deleted (due to a failed call, or the monitor got killed mid-delete or something). Basically, the issue is that it’s trying to insert the Job in the “DeletedJobs” collection, which keeps the Jobs around in case you want to undelete them, but it’s already in there, so it fails.
This should be fixed in the latest version of 7.2, but the workaround we found to delete the jobs back then was to set the “Deleted Job Purging” to 0 hours in Repository Options -> Job Settings -> Cleanup, and then run a HouseCleaning operation. This should get rid of the ‘deleted’ job, and allow you to delete the actual job.
Another alternative would be to try installing the latest 7.2 client somewhere, and deleting the jobs from there.
The thing to be careful with that we discovered last time, is that if you try to delete multiple jobs at the same time, at least one of which has this problem, the bug will ‘spread’ to those Jobs as well (because it’s causing them to fail mid-delete).
Hey Jon, I definitely remember the issue you’re talking about, and the virus-like spreading does fit the bill here. However, it seems that some part of Deadline reconciled something in the repository over the weekend, as the jobs in question are now gone. Unless this keeps happening, I guess it can probably be chalked up to gremlins.
Got it – it was likely the HouseCleaning that took care of this, the workaround I mentioned would have just sped up the process that happens naturally.