Temporary files fulling hard drives

nomte · November 21, 2018, 12:20pm

Hi there!

I notice recently that the hard drives on the rendernodes are being filled with temporary files from deadline jobs. In the screenshot you can see how the content:

The location is C:\Users\USER_NAME\AppData\Local\Temp\

and the content of some of this folders looks like this:

Please notice that the zip file contains the same already uncompressed .max file in the same folder.
All jobs have the flag LocalRendering=0

I’ve Pulse running and performing House cleanings regularly.
Any thoughts?

Thank you very much

kwatts · November 21, 2018, 4:40pm

Following this.
We have hit this before.

eamsler · November 21, 2018, 7:55pm

This used to be a problem when the Proxy (pre-RCS) compressed data to send along the pipe. We created temporary folders to generate the zip stream instead of just creating it in memory. We changed that to a memory-only approach sometime before 9.0 if I recall correctly.

What version of Deadline are you running there?

nomte · November 22, 2018, 8:13am

the version is 10.0.21.5

Are there any logs or reports that I can be useful for you?

eamsler · November 22, 2018, 10:41pm

I think so. If you can grab some of the render node’s logs I think that would be good. Sorry, for some reason I latched onto the old Proxy’s compression strategy when you explicitly mentioned that this is a render node problem.

I’d use the creation time on those folders to inform which logs and where in them to look. Are you using the RCS in your setup? If you are, is a direct connection possible just as an A/B comparison to see if it makes a difference?

nomte · November 22, 2018, 11:30pm

Hi eamsler,

Sure thing. I have a log of the machine that also runs the primary Pulse, so we have can have both logs from the same machine.

This is the content of the folder:
content

Name of the folder:
35b940e0-e57b-4550-be4f-fbc1b2e00256

Date of creation:
20-11-2018 16:51

In the renderslave log appears a sort problem when archiving the file.
Logs.zip (447.4 KB)

Unfortunately I am not currently using RCS. If there is anything else I can send you, please let me know.

Thank you very much!

eamsler · November 27, 2018, 7:39pm

Neat! Here’s the error:

2018-11-20 16:51:17:  Error occurred while archiving job with ID 5beedbdf48ba194728a4dceb: Specified argument was out of the range of valid values.
2018-11-20 16:51:17:  Parameter name: size (System.ArgumentOutOfRangeException)
2018-11-20 16:51:17:     at ICSharpCode.SharpZipLib.Zip.ZipEntry.set_Size(Int64 value)
2018-11-20 16:51:17:     at ICSharpCode.SharpZipLib.Zip.ZipOutputStream.CloseEntry()
2018-11-20 16:51:17:     at ICSharpCode.SharpZipLib.Zip.ZipOutputStream.Finish()
2018-11-20 16:51:17:     at ICSharpCode.SharpZipLib.Zip.Compression.Streams.DeflaterOutputStream.Close()
2018-11-20 16:51:17:     at FranticX.IO.Compression.Zip.CompressFilesToStream(String[] inputFilenames, String outputFilename, CompressionLevel compressionLevel, Boolean failOnError)
2018-11-20 16:51:17:     at FranticX.IO.Compression.Zip.CompressFiles(String[] inputFilenames, String outputFilename, CompressionLevel compressionLevel, Boolean failOnError)
2018-11-20 16:51:17:     at Deadline.StorageDB.JobStorage.ArchiveJobs(Job[] jobs, String archiveFolderOverride, Boolean useAltPath, String alternativePath)
2018-11-20 16:59:06:  Skipping thermal shutdown check because it is not required at this time

So, that must mean there’s some finite limit on the size of zip files we can create. My guess is 4GB and that internally something is using 32bit integers or similar so I created a Command Script job (which writes instructions to a text file) and created a 60GB sparse file as a test:

cd C:\DeadlineRepository10\jobs\5bfd99adc3ca4f223c6793a0
del commandsfile.txt
fsutil file createnew commandsfile.txt 0xf00000000

That did create a folder for this: (username obfuscated)

So far it’s working (which I expect means it made it passed where yours failed), but compressing 60GB of zeros is taking awhile. I’ll let this cook and report back later.

The error you’re seeing isn’t unheard of in the library we’re using.

Not sure why it’s failing, but I’ll cut a ticket for us to roll back the temporary folder.

Update: I got the same error when it was all said and done:

2018-11-27 13:37:24:  Error occurred while archiving job with ID 5bfd99adc3ca4f223c6793a0: Specified argument was out of the range of valid values.
2018-11-27 13:37:24:  Parameter name: size (System.ArgumentOutOfRangeException)
2018-11-27 13:37:24:     at ICSharpCode.SharpZipLib.Zip.ZipEntry.set_Size(Int64 value)
2018-11-27 13:37:24:     at ICSharpCode.SharpZipLib.Zip.ZipOutputStream.CloseEntry()
2018-11-27 13:37:24:     at ICSharpCode.SharpZipLib.Zip.ZipOutputStream.Finish()
2018-11-27 13:37:24:     at ICSharpCode.SharpZipLib.Zip.Compression.Streams.DeflaterOutputStream.Close()
2018-11-27 13:37:24:     at FranticX.IO.Compression.Zip.CompressFilesToStream(String[] inputFilenames, String outputFilename, CompressionLevel compressionLevel, Boolean failOnError)
2018-11-27 13:37:24:     at FranticX.IO.Compression.Zip.CompressFiles(String[] inputFilenames, String outputFilename, CompressionLevel compressionLevel, Boolean failOnError)
2018-11-27 13:37:24:     at Deadline.StorageDB.JobStorage.ArchiveJobs(Job[] jobs, String archiveFolderOverride, Boolean useAltPath, String alternativePath)

nomte · November 28, 2018, 2:29pm

thanks for the research!
right now, I am periodically auto-deleting temporary folders, but I’ll try to find more offending samples and check the filesize of the .max files. Certainly, it is not unusual that we have +4Gb files.

eamsler · November 29, 2018, 10:42pm

Something you can do to lessen the burden of the cleanup process would be to run Pulse which would do the archiving operation on one unique machine. …that could have a really big disk.

nomte · November 30, 2018, 9:57am

That makes a lot of sense… Thank you!

nomte · December 3, 2018, 2:23pm

I can confirm that the error persists every time I archive a file over 4Gb. After archiving manually this test file I had this error in the local launcher log file:

2018-12-03 15:10:07: Error occurred while archiving job with ID 5c0536cf48ba193e30aa6a51: Specified argument was out of the range of valid values.
2018-12-03 15:10:07: Parameter name: size (System.ArgumentOutOfRangeException)
2018-12-03 15:10:07: at ICSharpCode.SharpZipLib.Zip.ZipEntry.set_Size(Int64 value)
2018-12-03 15:10:07: at ICSharpCode.SharpZipLib.Zip.ZipOutputStream.CloseEntry()
2018-12-03 15:10:07: at ICSharpCode.SharpZipLib.Zip.ZipOutputStream.Finish()
2018-12-03 15:10:07: at ICSharpCode.SharpZipLib.Zip.Compression.Streams.DeflaterOutputStream.Close()
2018-12-03 15:10:07: at FranticX.IO.Compression.Zip.CompressFilesToStream(String[] inputFilenames, String outputFilename, CompressionLevel compressionLevel, Boolean failOnError)
2018-12-03 15:10:07: at FranticX.IO.Compression.Zip.CompressFiles(String[] inputFilenames, String outputFilename, CompressionLevel compressionLevel, Boolean failOnError)
2018-12-03 15:10:07: at Deadline.StorageDB.JobStorage.ArchiveJobs(Job[] jobs, String archiveFolderOverride, Boolean useAltPath, String alternativePath)

Running Pulse gave me control to localize the offending folders, so I no longer have full HD on workstations. Thanks for that

Should we wait until the resolution of ticket for further info?

eamsler · December 4, 2018, 4:37pm

It’s been logged and I’m working on getting it slotted in.

I don’t have a great solution at the moment because the process will fail regardless of the archiving mechanism (event script, etc). The cleanest workaround I can think of at the moment is to have an event clean out scenes files it finds that are greater than 4GB before the archiving takes place so at least you can keep the logs, but that’s not ideal as the jobs would be broken.

nomte · December 4, 2018, 5:38pm

Thanks for the info. We’ll wait then.

Now it is not a great problem for us since we have other redundant backups that we can use in case we need to recover a rendered job, and keeping the issue allocated in a single machine makes it easy to handle.
I’ll be waiting for further info.

Thanks!

Peter_Kmet · February 24, 2023, 7:53am

Hey guys, are there any updates on this? I am hitting same walll (using latest deadline 10.2)

Peter_Kmet · February 24, 2023, 7:55am

hey eamsler, did you find solution?
Right now i have the same issue using deadline 10.2.0.10 (windows temp files just “eat” all disk resources)

nomte · March 17, 2023, 9:53am

Honestly I haven’t check anymore since the recurrent deletion of files is working fine.

zainali · March 17, 2023, 8:09pm

Hello @Peter_Kmet

Can you share a screenshot of the files generated? Deadline has changed over the years, there may be a way to stop it (I am not so sure). I need to look at:

Which location are those files created? (I believe at %temp% or /tmp)
What are their names look like (A screenshot will help here), feel free to attach one I will take a look at the contents
Is it the Client machine? Which Deadline Application is it running like Monitor/Pulse?
Share the global client configuration file (please remove machine names/internal URLs from it before sharing)
Are you running Pulse?

Hopefully, I will be able to reproduce this with the above info.