AWS Thinkbox Discussion Forums

Getting tons of Job Corrupted reports

I have no idea why this is, but I’m getting a lot of Job Corrupted reports in my inbox when Deadline finishes the jobs. It keeps reporting that these jobs are corrupted. For some reason.

There is no network issue, all paths are accessible, the drive Deadline and the jobs are on, is an SSD drive, not fully utilised, serves only to Deadline.

I really have no idea where this might be coming from, so, that’s why I’m reporting it here. Any thoughts?

Deadline 5.1.46114
Server: Windows Server 2008 x64
Slaves: Windows 7 Pro x64
All in a domain on a 1gbps line

Hey Loocas,

I’m sure you’re used to the drill by now, but could you pack one of those jobs up sans scene file and throw it at our support e-mail (support@thinkboxsoftware.com)? I’d like to put it in our farm and see why Deadline complains that it’s corrupt.

Considering it’s multiple jobs, I doubt the SSD has gone bad.

I hope it will become clearer on our end with one of your job files.

Thanks

Sent, thank you. :slight_smile:

So, funny story… They’re not corrupt on my side.

Now the real fun starts. I’ll check the source code for some hints to what the cause might be.

Yeah, that’s the thing. They don’t show as corrupted in the Monitor either, but I’m getting tons of these:

[attachment=0]corrupted_jobs.png[/attachment]

Well, it seems that the vast majority of the time when Deadline can’t read the XML files, it marks the job as corrupt.

My first guess is that somehow it’s a permissions problem, but that’s not likely it. So, more questions I guess:

  1. Have you checked the access permissions on those corrupt jobs? (just being thorough)
  2. Is every job failing this way?
  3. Do the Slaves actually finish the render?

This is definitely an odd one. Hopefully Ryan can give some clear input here.

Deadline is running in a super admin mode as a service (logged on as me personally, no special users were created for Deadline)

I can’t say every job, but it seems that many jobs are failing like this. A lot.

Yes, everything is fine and in fact I’m getting these errors only AFTER the jobs have finished, it seems. I haven’t seen this error on a job that is being currently rendered.

Ryan’s guess was that maybe Pulse is having trouble reading/writing the XML after the fact. Could you turn verbose logging on and maybe throw us a log for that? I’m hoping something jumps out at us instead of this random guessing.

Verbose logging for Pulse? I already have that enabled.

Which log exactly do you need?

The Pulse log would be ideal. I’ve sent a response through the ticket system so you can include the log simply by replying to that e-mail.

I’m really looking for anything that will stand out as far as file access errors. I’m not sure specifically what’s causing this yet, so we’ll see if the log will provide more clues.

Found the logs, here they are: http://www.duber.cz/dump/pulse_logs.zip Hope this helps.

I also saw a LOT of corrupted job reports (thousands) the first day after I installed the beta & launched my first OSX slave. Near as I can tell, these were not for jobs that were actually queued/active, but for old jobs.

I restarted everything, deleted the corrupt jobs and nothing’s screamed again since. But I haven’t turned the OSX slave back on yet. Any new thoughts on what happened?

–Rob

We’re still not sure what’s happening, but I thought I’d share some stats on those logs:

Counts of how many times Deadline complained about a specific corrupt job:

1 \Deadline\jobs\999_050_999_2e7688b9\999_050_999_2e7688b9.job (Job has been corrupted, it is recommended that this job be removed from the repository 1 \Deadline\jobs\999_050_999_5b2df106\999_050_999_5b2df106.job (Job has been corrupted, it is recommended that this job be removed from the repository 2 \Deadline\jobs\999_055_999_602cd1b3\999_055_999_602cd1b3.job (Job has been corrupted, it is recommended that this job be removed from the repository 2 \Deadline\jobs\999_070_999_10f268ba\999_070_999_10f268ba.job (Job has been corrupted, it is recommended that this job be removed from the repository 1 \Deadline\jobs\999_070_999_21b62aff\999_070_999_21b62aff.job (Job has been corrupted, it is recommended that this job be removed from the repository 2 \Deadline\jobs\999_070_999_26719333\999_070_999_26719333.job (Job has been corrupted, it is recommended that this job be removed from the repository 1 \Deadline\jobs\999_070_999_47b69e56\999_070_999_47b69e56.job (Job has been corrupted, it is recommended that this job be removed from the repository 1 \Deadline\jobs\999_070_999_670a6f38\999_070_999_670a6f38.job (Job has been corrupted, it is recommended that this job be removed from the repository 1 \Deadline\jobs\999_080_999_06d62cdb\999_080_999_06d62cdb.job (Job has been corrupted, it is recommended that this job be removed from the repository 2 \Deadline\jobs\999_080_999_50214e37\999_080_999_50214e37.job (Job has been corrupted, it is recommended that this job be removed from the repository 1 \Deadline\jobs\999_080_999_6d14bd21\999_080_999_6d14bd21.job (Job has been corrupted, it is recommended that this job be removed from the repository 2 \Deadline\jobs\999_080_999_6d8d28ad\999_080_999_6d8d28ad.job (Job has been corrupted, it is recommended that this job be removed from the repository 1 \Deadline\jobs\999_080_999_7237a106\999_080_999_7237a106.job (Job has been corrupted, it is recommended that this job be removed from the repository 1 \Deadline\jobs\999_085_999_23f0dc9a\999_085_999_23f0dc9a.job (Job has been corrupted, it is recommended that this job be removed from the repository 1 \Deadline\jobs\999_085_999_4c502938\999_085_999_4c502938.job (Job has been corrupted, it is recommended that this job be removed from the repository 1 \Deadline\jobs\999_085_999_5ef0a083\999_085_999_5ef0a083.job (Job has been corrupted, it is recommended that this job be removed from the repository 1 \Deadline\jobs\999_090_999_128df6dd\999_090_999_128df6dd.job (Job has been corrupted, it is recommended that this job be removed from the repository 2 \Deadline\jobs\999_090_999_1ccfe3cb\999_090_999_1ccfe3cb.job (Job has been corrupted, it is recommended that this job be removed from the repository 1 \Deadline\jobs\999_090_999_21c318b3\999_090_999_21c318b3.job (Job has been corrupted, it is recommended that this job be removed from the repository 1 \Deadline\jobs\999_090_999_3042383e\999_090_999_3042383e.job (Job has been corrupted, it is recommended that this job be removed from the repository 2 \Deadline\jobs\999_090_999_44a6b371\999_090_999_44a6b371.job (Job has been corrupted, it is recommended that this job be removed from the repository 2 \Deadline\jobs\999_090_999_54a334fc\999_090_999_54a334fc.job (Job has been corrupted, it is recommended that this job be removed from the repository 2 \Deadline\jobs\999_095_999_233555fd\999_095_999_233555fd.job (Job has been corrupted, it is recommended that this job be removed from the repository 1 \Deadline\jobs\999_095_999_42f65459\999_095_999_42f65459.job (Job has been corrupted, it is recommended that this job be removed from the repository 1 \Deadline\jobs\999_099_999_6ddb4359\999_099_999_6ddb4359.job (Job has been corrupted, it is recommended that this job be removed from the repository

Other corrupt notice counts:

1 minor exception: Error in file: E:\Deadline\limitGroups\999_050_999_510482b0.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_050_999_55e56d12.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_055_999_2a039ae8.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_055_999_602cd1b3.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_060_999_06f0d1d5.limitGroup (System.Exception) 2 minor exception: Error in file: E:\Deadline\limitGroups\999_060_999_4ebf44bc.limitGroup (System.Exception) 3 minor exception: Error in file: E:\Deadline\limitGroups\999_070_999_10f268ba.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_070_999_21b62aff.limitGroup (System.Exception) 2 minor exception: Error in file: E:\Deadline\limitGroups\999_070_999_26719333.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_070_999_26e16cb1.limitGroup (System.Exception) 3 minor exception: Error in file: E:\Deadline\limitGroups\999_070_999_39c2fe5d.limitGroup (System.Exception) 2 minor exception: Error in file: E:\Deadline\limitGroups\999_070_999_40274f23.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_070_999_47b69e56.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_070_999_4b2cc537.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_070_999_53933c4c.limitGroup (System.Exception) 4 minor exception: Error in file: E:\Deadline\limitGroups\999_070_999_5ab4b033.limitGroup (System.Exception) 5 minor exception: Error in file: E:\Deadline\limitGroups\999_070_999_5abc6748.limitGroup (System.Exception) 5 minor exception: Error in file: E:\Deadline\limitGroups\999_070_999_670a6f38.limitGroup (System.Exception) 3 minor exception: Error in file: E:\Deadline\limitGroups\999_070_999_6b5e12a2.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_075_999_21016deb.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_075_999_287af065.limitGroup (System.Exception) 2 minor exception: Error in file: E:\Deadline\limitGroups\999_075_999_3913d852.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_075_999_75baa257.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_080_999_0555d38c.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_080_999_06d62cdb.limitGroup (System.Exception) 3 minor exception: Error in file: E:\Deadline\limitGroups\999_080_999_2a06667d.limitGroup (System.Exception) 2 minor exception: Error in file: E:\Deadline\limitGroups\999_080_999_313eb76f.limitGroup (System.Exception) 2 minor exception: Error in file: E:\Deadline\limitGroups\999_080_999_35df3427.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_080_999_4f8dbdfd.limitGroup (System.Exception) 2 minor exception: Error in file: E:\Deadline\limitGroups\999_080_999_5ace2ae3.limitGroup (System.Exception) 2 minor exception: Error in file: E:\Deadline\limitGroups\999_080_999_5f7a582c.limitGroup (System.Exception) 2 minor exception: Error in file: E:\Deadline\limitGroups\999_080_999_6b367671.limitGroup (System.Exception) 2 minor exception: Error in file: E:\Deadline\limitGroups\999_080_999_6b5d42e2.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_080_999_6d14bd21.limitGroup (System.Exception) 2 minor exception: Error in file: E:\Deadline\limitGroups\999_080_999_6d8d28ad.limitGroup (System.Exception) 2 minor exception: Error in file: E:\Deadline\limitGroups\999_085_999_25ce0b12.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_085_999_4c502938.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_090_999_128df6dd.limitGroup (System.Exception) 2 minor exception: Error in file: E:\Deadline\limitGroups\999_090_999_1ccfe3cb.limitGroup (System.Exception) 3 minor exception: Error in file: E:\Deadline\limitGroups\999_090_999_1cec402e.limitGroup (System.Exception) 3 minor exception: Error in file: E:\Deadline\limitGroups\999_090_999_21c318b3.limitGroup (System.Exception) 2 minor exception: Error in file: E:\Deadline\limitGroups\999_090_999_2903bd12.limitGroup (System.Exception) 4 minor exception: Error in file: E:\Deadline\limitGroups\999_090_999_2e3855e0.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_090_999_2eda9d6b.limitGroup (System.Exception) 3 minor exception: Error in file: E:\Deadline\limitGroups\999_090_999_3042383e.limitGroup (System.Exception) 4 minor exception: Error in file: E:\Deadline\limitGroups\999_090_999_304b9ecd.limitGroup (System.Exception) 4 minor exception: Error in file: E:\Deadline\limitGroups\999_090_999_3bccbd41.limitGroup (System.Exception) 3 minor exception: Error in file: E:\Deadline\limitGroups\999_090_999_3cb88b87.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_090_999_42ee486d.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_090_999_44a6b371.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_090_999_47df3098.limitGroup (System.Exception) 2 minor exception: Error in file: E:\Deadline\limitGroups\999_090_999_54a334fc.limitGroup (System.Exception) 4 minor exception: Error in file: E:\Deadline\limitGroups\999_090_999_64fcceed.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_090_999_7612d88b.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\999_095_999_42f65459.limitGroup (System.Exception) 2 minor exception: Error in file: E:\Deadline\limitGroups\furnace.limitGroup (System.Exception) 1 minor exception: Error in file: E:\Deadline\limitGroups\nuke.limitGroup (System.Exception) 4 minor exception: Error in file: E:\Deadline\pulse\Rammstein\Rammstein.pulseInfo (System.Exception)

For the curious, on machines with the GNU tools, here are the commands I used:

cat * | grep exception | grep -iv "limitGroup" | grep -iv "Rammstein.pulseInfo" | cut -d ":" -f 8 | sort | uniq -c cat * | grep exception | grep -v "Deadline.Jobs.JobCorruptedException" | cut -d "-" -f4 | sort | uniq -c

Is there anything I can do to further test this?

I’m rendering some more shots currently and again, getting corrupted jobs errors after the jobs have completed rendering.

At this point, we don’t believe so. The corruption errors are quite odd though since there seems to be a number of read errors from of your repository share with the limitGroup and pulseInfo files.

Our recommendation for now is to unsubscribe from the corrupt jobs notifier since the signal to noise ratio on those is so bad.

I’m not overly sure how we can debug this yet. We’ve tried reproducing this so far with no luck.

If it’s any relief, the corrupt job system is likely going to be deprecated in Deadline 6.

:slight_smile: yeah, I was actually thinking what this error could be good for. I mean, if the job is corrupted thus doesn’t report correctly or anything else for that matter, it’d still show up in the monitor as some sort of an error.

And if it happens on a finished job, I don’t care about that too much.

Privacy | Site terms | Cookie preferences