So that error in particular comes up when there is a network disruption that is preventing looking up the actual job report. This happens fairly often when a mounted drive that holds the repo and DB is unmounted for some reason. This message is the one displayed by the monitor when that happens, and isn’t a report in and of itself.
We have been fighting this same error for months, and getting no wear with it. It started when we upgraded from DL 6.0 to 6.2 on the same network hardware/switches/NAS/etc, so actually, it’s not necessarily YOUR setup, but something neither Thinkbox or my engineers have been able to track down.
That said, the issue that brings me to search for this string today is this error is not getting caught by DL now, and is showing the task as complete, when it clearly is not. How can I flag DL to show this as an error, and to reque the given task?
could this be an issue not with the NAS, but with mongoDB and the server it is on? Another question, I noticed the default number of available DB connections is 30. We have 200 nodes on our farm not including workstations. Shouldn’t I change this to something higher, like, well…90?
My guess is that if you are using network drives, they are becoming unmounted. It might also be that the log folders were somehow deleted by 5.2 or an engineer, but that would have been a problem in 6.0 as well and a non-mapped drive would really explain why the job failed in the first place.
If it doesn’t go away, it may be excessive load on the file server, and we have a solution for that too, but this post is going to be long enough.
Now for the ‘why did this happen?’!
Over the lifetime of a task, Deadline stores all of the output of the program it ran for rendering into its local log directory as “task_X-machine-XXXX.log”. You can likely still find it there, use the ‘help’ menu on the Slave to find the log folder, then check date/times of the files. When the task is done, Deadline streams it through bzip2 compression into a new file in the Repository’s strange directory structure (designed so millions of logs wouldn’t be in a single folder).
If that copy fails (directory doesn’t exist, drive not mounted, server too busy), we write that message you saw to the database so you can know why the file isn’t there. It’s interesting actually, we write messages to the database either way so we have something to show in the ‘title’ column of the log viewer. On success, we write “Render Log” which isn’t very exciting.
I don’t think we overwrite those task files so they should all be there still. Instead we increment the four digit number at the end. but I do know that we should delete them after 30 days.
How did you changed the title of the event inside the Job Report view?
For some jobs we have a lot events running with them and it would make debugging much easier if we would be able to see in the Job Report view which event is failing. That’s why we want to give them specific ‘title’ names.
The fact that it’s marked as ‘title’ is a bit of a misnomer. The text for ‘title’ is whatever was passed to “DeadlinePlugin.FailRender()”. It should really be called ‘error summary’.
Would be cool to have a ‘real’ title column as a feature in the new version.
So for now we have to live with looking in each report to figure out which one was doing what event process.