AWS Thinkbox Discussion Forums

AssetCrawler_Server.py....still foggy

I understand I need this running on a small instance as a daemon of sorts, but:

  • how does it know what to crawl? Is it crawling my ZFS file system and comparing it to my local file system?

  • If so, the mappings are not going to match, so does it read in the re-mappings from DL7 so it sees //disks/nas0/ (local) as //disks/zfsvol1/nas0/ (cloud)?

  • How does it know what project folders to crawl? Is it based on the job going through the balancer?

Yes, we have a VPN connection between the ZFS and our local.

-ctj

The Asset Crawler doesn’t crawl folders in advance. It waits for a request from Balancer in the form of a list of full asset paths to be checked. It first applies path mapping (per the settings in the repository) to each path, then it checks for the assets, and then it reports the results back to Balancer. The Asset Crawler also caches the results of its findings for a brief time for better efficiency.

gotcha. so I’ll need to add the assets to the job file:

RequiredAssets=<assetPath,assetPath,assetPath>

Then they should show up as asset dependencies and the crawler will look for them?

-ctj

Exactly. Don’t hesitate to give me a shout if you have any questions.

Ok, have a python script gathering all asset paths from the scene and populating the RequiredAssets field of the job file.

Two more questions about the crawler:

  • it has no idea if the cloud asset is older than the local asset, right?

  • Is there a way to flag a job status if the crawler runs for XX:XX amount of time, e.g. the rsync job failed, and the assets aren’t ever going to be there until someone fixes the problem?

I’m assuming the answer is, it’s a python script, add what you want to it, but before I do that, I wanted to check if that wheel was already invented.

Thanks for all the help,

-ctj

Hi Chris,

Both of your assumptions are correct. Asset Crawler is not very sophisticated in its checking for files, and you would need to implement any additional functionality such as awareness of sync delays.

As I’ve mentioned, we have some new asset-related functionality in the works. Stay tuned!

As we move forward, the crawler is looking more like just a safety blanket. We’re adding an rsync job ahead of all cloud render jobs, with a job dependency for the render job to depend on the rsync job. The rsync job will sync the asset dependency from the properties of the render job.

This will all work swimmingly, I’m sure.

-ctj

the safety blanket is a good analogy - we dont want the balancer to spin up machines and discover an asset is missing…after doing some tests and incurring costs on the spinup time, it’s clear that it can be a necessity.

cb

Privacy | Site terms | Cookie preferences