Slow submission of Max scene with many assets

mois.moshev · June 22, 2018, 6:48am

We are submitting a scene with ~ 14k XMesh files linked, and submission takes 3-4 minutes. This hangs Max for quite a long time.

I observed that assets are being enumerated by the Asset Tracker before submission once SMTDFunctions.SubmitJob is called. In the SMTD dialog this happens when the dialog is open, even without submitting.
I tried to disable this by setting SMTDSettings.SubmitExternalFileMode = 0, which indeed disabled the enumeration that gets printed out, but the submission took as long nevertheless.

I also tried a few other options:

SMTDSettings.AssetsAutoUpdateList = false
SMTDSettings.AssetsPreCacheFiles = false
SMTDSettings.AssetSyncAllFiles = false

but none of them helped.

My current thinking is that it takes long because the resources are still being enumerated for the AWS Asset list, which is visible in the job info. Haven’t been that deep into SMTD code yet (12k loc of maxscript… mindblown).

Is there a way to selectively disable this behavior, in cases like this?

Edit: Found it! On line 5675 of SMTD_functions, there is an unconditional resolving of assets from the asset tracker, with SubmitExternalFileMode set to 3. This should really be optional! Especially since I am not using AWS in this case.

Bobo · June 22, 2018, 7:21am

Right now there isn’t a way to disable the collection of assets for cloud rendering (output as AWSAssetFile% entries), even when the Pre-Caching is disabled. Sounds like I should add an option to not include assets with the job file to speed things up for users who don’t use cloud rendering.

Sorry about that, I will add it to the ToDo list.

In the mean time, you can look for the remark

-- WRITE ASSETS FOR CLOUD SYNC HERE
in the SubmitMaxToDeadline_Functions.ms in your Repository, and remark the very next line which sets the AssetsResolved to the collected list from Asset Tracker.
I suspect that both the collecting (how long does it take to update the Asset tab in the UI?), and the outputting of hundreds of thousands of lines to the job file are causing the delay.

For your info:

SMTDSettings.AssetsAutoUpdateList
Controls whether to refresh the assets list in the UI automatically on changes.
When off, the user can refresh manually if the update takes too long.
But it has no effect on the submission process.

SMTDSettings.AssetsPreCacheFiles
Controls whether to call DeadlineCommand with the JobID to try to push files to AWS.
The Job contains all these AWSAssetFile entries, and DeadlineCommand reads them from the job info and tries to pre-cache them in a background process using the AWS Portal Asset Sync service.
When it is off, the job on AWS will make pull requests for these assets regardless.
Including the AWSAssetFile entries would also let us add custom scripts to re-sync data after the job has been submitted…

SMTDSettings.AssetSyncAllFiles
Controls whether to collect all files in a sequence, or just the ones needed for the current frame range.
For example, if you have saved 101 frames in an XMesh sequence, but the render dialog is set to render only frames 1 to 10, only the 10 frames needed will be collected. This respects all timing controls like Offset, Playback Graph, Custom Range limits etc.

mois.moshev · June 22, 2018, 7:31am

Thanks for the info. Sounds like we’ll have to add an option ourselves in the meanwhile.

You don’t have any upstream submission process, do you?

Bobo · June 22, 2018, 3:53pm

I can add the option myself, and give you the files. Shouldn’t take me too long…

mois.moshev · June 22, 2018, 4:19pm

Oh it’s already done, but thank you. We’ll wait for the official release

Bobo · June 22, 2018, 5:03pm

Great!

Out of curiosity, can you share with me the times reported in the SMTD Log when you switch to the Asset tab and update the list?
I would like to understand how long it takes the Asset Tracker and the MAXScript code scanning for XMesh sequences to resolve your scene.
I had made some improvements to the XMesh collection code (it was even slower in the first Deadline 10 builds last year). I wonder if I should be looking into speeding up the AWSAssetFile%= writing to the output file by building a stringStream in memory, and dumping it into the JOB file in one call…
Also, now many frames did you have in those 14K XMesh Loaders? How many data channels on average per sequence? Assuming 100 frames per sequence and 7 XMDAT files (verts, faces, velocity, smoothing, texture mapping, matIDs, edgevis) with unique data, this would be 1,400,000*8=11,200,000 lines. Even if some of those data channels are reused, I would still expect about 5 million lines in the metadata…

Bobo · June 22, 2018, 5:24pm

I took a single XMesh Loader with 101 frames, 363 files in total showing in Asset Tracker, resolved in 0.178 seconds.
Then I added a for loop with 14K iterations to the SMTDFunctions so I would write the same data 14,000 times (over 5 million lines of metadata), and added a timestamp() to see how long it will take. It has been running for 10 minutes so far, and still not done submitting. I will see if writing the data to memory and then out in one call could speed this up…

Still want to know how long it takes to resolve your XMesh Loader assets, as I don’t want to build 14K unique XMesh sequences

Bobo · June 22, 2018, 7:25pm

So my test created a job file of 625MB size which is 36x larger than the maximum allowed document size in mongoBD (16MB), thus the submission failed. It took 22 minutes to generate. I will reduce the loop a lot and will see if I can speed up the metadata writing…

Bobo · June 22, 2018, 10:03pm

Switching the file IO to a DotNet function call reduced the time to write the 625MB file from 22 minutes to 54 seconds. The majority of the time was spent in the 14,000 iterations of my test FOR loop (47 seconds) formatting to a stringStream, and 6.7 seconds was the actual writing to disk…

Bobo · June 22, 2018, 10:59pm

Attached is a WIP functions script file with the following fixes:

Added a property SMTDSettings.AssetsIncludeFilesInJob to control the AWS Assets saving with the JOB file. Defaults to true. I have not exposed it to the UI yet, but you can set it in your custom scripts. Set to false to suppress the writing.

You can also set a flag in the global SubmitMaxToDeadline_StickySettings.ini file to false, and in the Global SubmitMaxToDeadline_Defaults.ini file to false to enforce this across the whole company:

[Assets]
AssetsIncludeFilesInJob=false

Sped up the saving of AWS Assets using a DotNet function.

Please let me know how it performs…

mois.moshev · June 25, 2018, 7:40am

Thanks Bobo for all the good work.
We’re going to give it a spin.

I mean, it’s not very reasonable to expect this much data to be handled gracefully, I just did not make the connection to aws asset collection right away. It’s awesome that you sped this part up.

I cannot copy-paste from the work machine, but here is a summary: There are something like 20 XMesh loaders with 1790 data files and 179 headers, the grand total amounts to around 14300, and this takes 177 seconds to resolve, which isn’t actually that bad. I suspect the thing that was freezing Max was the file writing and formatting, as you pointed above.

Do you think there should be any automation magic to detect whether the job is targeted at aws and then choose whether to write the awsfile entries (with a manual override of course)?

Side question: any chance SMTD could be tracked from a git/whatever cvs repo?

Bobo · June 25, 2018, 9:50pm

I originally misread your post and assumed incorrectly 14K LOADERS This made it possible to uncover a possible limitations with > 150,000 asset files.

I hope that with the new code, the 177 seconds will be more like 7 seconds… Let me know!

Yes, it was suggested in internal discussions that we can query Deadline using deadlineCommand to check if an Infrastructure was launched on AWS. WIthout one, AWS Portal sync would make no sense. However, the metadata in the job file could be useful for customers who might want to launch an Infrastructure after the fact, and still have the job functions correctly if picked up by EC2 instances. So we might add an automatic detection, and still allow the user to force the inclusion of the AWSAssetFile entries if they feel they will need them later…

Obviously we track it internally, but that does not help you much
I will have to ask around to see if we can do anything public-facing.

mois.moshev · June 27, 2018, 6:53am

But shouldn’t there be an explicit way to define a job as running on aws? Or do you find this too coupled to the underlying infrastructure?
I think a checkbox “this will execute on aws” isn’t too much hassle… or perhaps something more sophisticated, but declaring something like this upfront may save some trouble.

Alternatively, do something with pools. Not sure if there is knowledge in the pool of where its infrastructure is (going to be).

Honestly, I don’t have sufficient experience with Deadline, so I’m just throwing ideas here.

Thanks! I am hoping for a git with tagged releases
If the problem is ip, some build automation would help copy over just what is needed by users, ensuring other code won’t leak out of the internal repo.