Slaves delete jobs automatically

Hello,

since we updated deadline to 3.1 we encounter a “strange” behaviour. Some slaves delete completed jobs automatically. The corresponding entries in our history logfile are as follows:


Apr 20/09 08:54:16 C2QUAD-001 C2QUAD-001: Empty Trash Bin: Deleted job “TDX-P4-MorphingTEST-Links” because the job was in a corrupted state.
Apr 20/09 08:54:17 C2QUAD-001 C2QUAD-001: Empty Trash Bin: Deleted job “TDX-P2b-CR_TSX-Bahn-Maske” because the job was in a corrupted state.
Apr 20/09 08:54:17 C2QUAD-001 C2QUAD-001: Empty Trash Bin: Deleted job “TDX-P2b-CL-TSX” because the job was in a corrupted state.
Apr 20/09 08:54:18 C2QUAD-001 C2QUAD-001: Empty Trash Bin: Deleted job “TDX-P2b-CL-DEM” because the job was in a corrupted state.
Apr 20/09 08:54:18 C2QUAD-001 C2QUAD-001: Empty Trash Bin: Deleted job “TDX-P2b-CR_Erde” because the job was in a corrupted state.
Apr 20/09 08:54:19 C2QUAD-001 C2QUAD-001: Empty Trash Bin: Deleted job “TDX-P2b-CR_DEM” because the job was in a corrupted state.

I checked some of the jobs minutes before this happens and none of these completed jobs were corrupt or deleted.

Do you have any idea why this happens and how I can prevent slaves from deleting jobs automatially.

Thank you very much in advance

Best regards

Thorsten

Hi Thorsten,

Is C2QUAD-001 the only machine that is doing this? If so, it could be that the machine doesn’t have full access to the repository, and thus it is thinking jobs are corrupt when they are not (likely because the slave can’t access all the job files or read them in).

Maybe in the next release, we’ll add a repository option to turn off the automatic deletion of corrupted jobs. That way in circumstances like this, you can turn off the feature until the problem is figured out.

Cheers,

  • Ryan

Hi Ryan,

until now we had 3 different machines that deleted jobs automatically. Here ar some other examples of the deadlien history file:

Apr 15/09 07:22:59 wcs C2D-006: Empty Trash Bin: Deleted job “Gal_InstAllBlend_R_300-330” because the job was in a corrupted state.
Apr 17/09 10:11:58 C2QUAD-004 C2QUAD-004: Empty Trash Bin: Deleted job “WolkenTEST-Ozon08” because the job was in a corrupted state.
Apr 20/09 09:10:50 C2QUAD-001 C2QUAD-001: Empty Trash Bin: Deleted job “TDX-P2b-CR_TSX” because the job was in a corrupted state.

All machines have full access to the repository. We never had this behavior in 3.0. After updating the repository to v 3.1 we had some slaves that didn’t upgrade correctly to 3.1 (using the automatic upgrade option). So on this machines we removed and reinstalled the slave software manually. I can’t figure out if these 3 machines were among them but I am 90% sure that C2D-006 was not.

Any other ideas? Would it be possible to have the option to turn off the autmatic deletion in the next SP?

Best regards

Thorsten

Adding the option to turn this off would be very straight forward, so we will add it to the next SP release. In the meantime, it might just be best to remove this machine from the queue.

Since the permissions aren’t the problem, the only other thing I can think of is that those 3 slaves are potentially running a different version of Deadline. Do the version numbers match exactly with the machines that do work? For whatever reason, these machines think the job is corrupt, and even with the option to turn off the autodelete of corrupted jobs, these machines still won’t be able to participate in the rendering if they think the jobs are corrupt. If the permissions are good and the versions are the same, then I’m not sure what else the problem could be…

Cheers,

  • Ryan

Hi Ryan,

you may be right with the different versions of the deadline slave. All clients report v 3.1.0.5390 in the monitor but during a manual uninstall the client software reported v 3.0.0.32934. So it seems that the automatic update didn’t work on all machines.

Best regards

Thorsten