Temporary files fulling hard drives


#1

Hi there!

I notice recently that the hard drives on the rendernodes are being filled with temporary files from deadline jobs. In the screenshot you can see how the content:

The location is C:\Users\USER_NAME\AppData\Local\Temp\

and the content of some of this folders looks like this:

Please notice that the zip file contains the same already uncompressed .max file in the same folder.
All jobs have the flag LocalRendering=0

I’ve Pulse running and performing House cleanings regularly.
Any thoughts?

Thank you very much


#2

Following this.
We have hit this before.


#3

This used to be a problem when the Proxy (pre-RCS) compressed data to send along the pipe. We created temporary folders to generate the zip stream instead of just creating it in memory. We changed that to a memory-only approach sometime before 9.0 if I recall correctly.

What version of Deadline are you running there?


#4

the version is 10.0.21.5

Are there any logs or reports that I can be useful for you?


#5

I think so. If you can grab some of the render node’s logs I think that would be good. Sorry, for some reason I latched onto the old Proxy’s compression strategy when you explicitly mentioned that this is a render node problem.

I’d use the creation time on those folders to inform which logs and where in them to look. Are you using the RCS in your setup? If you are, is a direct connection possible just as an A/B comparison to see if it makes a difference?


#6

Hi eamsler,

Sure thing. I have a log of the machine that also runs the primary Pulse, so we have can have both logs from the same machine.

This is the content of the folder:
content

Name of the folder:
35b940e0-e57b-4550-be4f-fbc1b2e00256

Date of creation:
20-11-2018 16:51

In the renderslave log appears a sort problem when archiving the file.
Logs.zip (447.4 KB)

Unfortunately I am not currently using RCS. If there is anything else I can send you, please let me know.

Thank you very much!


#7

Neat! Here’s the error:

2018-11-20 16:51:17:  Error occurred while archiving job with ID 5beedbdf48ba194728a4dceb: Specified argument was out of the range of valid values.
2018-11-20 16:51:17:  Parameter name: size (System.ArgumentOutOfRangeException)
2018-11-20 16:51:17:     at ICSharpCode.SharpZipLib.Zip.ZipEntry.set_Size(Int64 value)
2018-11-20 16:51:17:     at ICSharpCode.SharpZipLib.Zip.ZipOutputStream.CloseEntry()
2018-11-20 16:51:17:     at ICSharpCode.SharpZipLib.Zip.ZipOutputStream.Finish()
2018-11-20 16:51:17:     at ICSharpCode.SharpZipLib.Zip.Compression.Streams.DeflaterOutputStream.Close()
2018-11-20 16:51:17:     at FranticX.IO.Compression.Zip.CompressFilesToStream(String[] inputFilenames, String outputFilename, CompressionLevel compressionLevel, Boolean failOnError)
2018-11-20 16:51:17:     at FranticX.IO.Compression.Zip.CompressFiles(String[] inputFilenames, String outputFilename, CompressionLevel compressionLevel, Boolean failOnError)
2018-11-20 16:51:17:     at Deadline.StorageDB.JobStorage.ArchiveJobs(Job[] jobs, String archiveFolderOverride, Boolean useAltPath, String alternativePath)
2018-11-20 16:59:06:  Skipping thermal shutdown check because it is not required at this time

So, that must mean there’s some finite limit on the size of zip files we can create. My guess is 4GB and that internally something is using 32bit integers or similar so I created a Command Script job (which writes instructions to a text file) and created a 60GB sparse file as a test:

cd C:\DeadlineRepository10\jobs\5bfd99adc3ca4f223c6793a0
del commandsfile.txt
fsutil file createnew commandsfile.txt 0xf00000000

That did create a folder for this: (username obfuscated)
image

So far it’s working (which I expect means it made it passed where yours failed), but compressing 60GB of zeros is taking awhile. I’ll let this cook and report back later.

The error you’re seeing isn’t unheard of in the library we’re using.

Not sure why it’s failing, but I’ll cut a ticket for us to roll back the temporary folder.

Update: I got the same error when it was all said and done:

2018-11-27 13:37:24:  Error occurred while archiving job with ID 5bfd99adc3ca4f223c6793a0: Specified argument was out of the range of valid values.
2018-11-27 13:37:24:  Parameter name: size (System.ArgumentOutOfRangeException)
2018-11-27 13:37:24:     at ICSharpCode.SharpZipLib.Zip.ZipEntry.set_Size(Int64 value)
2018-11-27 13:37:24:     at ICSharpCode.SharpZipLib.Zip.ZipOutputStream.CloseEntry()
2018-11-27 13:37:24:     at ICSharpCode.SharpZipLib.Zip.ZipOutputStream.Finish()
2018-11-27 13:37:24:     at ICSharpCode.SharpZipLib.Zip.Compression.Streams.DeflaterOutputStream.Close()
2018-11-27 13:37:24:     at FranticX.IO.Compression.Zip.CompressFilesToStream(String[] inputFilenames, String outputFilename, CompressionLevel compressionLevel, Boolean failOnError)
2018-11-27 13:37:24:     at FranticX.IO.Compression.Zip.CompressFiles(String[] inputFilenames, String outputFilename, CompressionLevel compressionLevel, Boolean failOnError)
2018-11-27 13:37:24:     at Deadline.StorageDB.JobStorage.ArchiveJobs(Job[] jobs, String archiveFolderOverride, Boolean useAltPath, String alternativePath)

#8

thanks for the research!
right now, I am periodically auto-deleting temporary folders, but I’ll try to find more offending samples and check the filesize of the .max files. Certainly, it is not unusual that we have +4Gb files.


#9

Something you can do to lessen the burden of the cleanup process would be to run Pulse which would do the archiving operation on one unique machine. …that could have a really big disk. :smiley:


#10

That makes a lot of sense… :slight_smile: Thank you!


#11

I can confirm that the error persists every time I archive a file over 4Gb. After archiving manually this test file I had this error in the local launcher log file:


2018-12-03 15:10:07: Error occurred while archiving job with ID 5c0536cf48ba193e30aa6a51: Specified argument was out of the range of valid values.
2018-12-03 15:10:07: Parameter name: size (System.ArgumentOutOfRangeException)
2018-12-03 15:10:07: at ICSharpCode.SharpZipLib.Zip.ZipEntry.set_Size(Int64 value)
2018-12-03 15:10:07: at ICSharpCode.SharpZipLib.Zip.ZipOutputStream.CloseEntry()
2018-12-03 15:10:07: at ICSharpCode.SharpZipLib.Zip.ZipOutputStream.Finish()
2018-12-03 15:10:07: at ICSharpCode.SharpZipLib.Zip.Compression.Streams.DeflaterOutputStream.Close()
2018-12-03 15:10:07: at FranticX.IO.Compression.Zip.CompressFilesToStream(String[] inputFilenames, String outputFilename, CompressionLevel compressionLevel, Boolean failOnError)
2018-12-03 15:10:07: at FranticX.IO.Compression.Zip.CompressFiles(String[] inputFilenames, String outputFilename, CompressionLevel compressionLevel, Boolean failOnError)
2018-12-03 15:10:07: at Deadline.StorageDB.JobStorage.ArchiveJobs(Job[] jobs, String archiveFolderOverride, Boolean useAltPath, String alternativePath)


Running Pulse gave me control to localize the offending folders, so I no longer have full HD on workstations. Thanks for that :slight_smile:

Should we wait until the resolution of ticket for further info?


#12

It’s been logged and I’m working on getting it slotted in.

I don’t have a great solution at the moment because the process will fail regardless of the archiving mechanism (event script, etc). The cleanest workaround I can think of at the moment is to have an event clean out scenes files it finds that are greater than 4GB before the archiving takes place so at least you can keep the logs, but that’s not ideal as the jobs would be broken.


#13

Thanks for the info. We’ll wait then. :pause_button:

Now it is not a great problem for us since we have other redundant backups that we can use in case we need to recover a rendered job, and keeping the issue allocated in a single machine makes it easy to handle.
I’ll be waiting for further info.

Thanks!