AWS Thinkbox Discussion Forums

Cloud Repo

Edwin, I actually had a question about that: if most of the compute and file swapping is taking place on AWS, then would it not make sense to locate the database in the cloud? I’m just thinking that our biggest bottleneck is our upload bandwidth, so if AWS needs to suck up GB’s of files come render time over and over again, maybe it would just make sense to locate the database in the cloud.

Also, if the database were in the cloud, couldn’t this make multi-site or offsite submittals a bit easier (wouldn’t have to open up our database or connect to it via vpn)?

I’m just getting into this right now and messing around with it, so I’m trying to figure out which scales better

I can’t follow either.

The DeadlineDatabase can run anywhere, AFAIK it does not suck up gigabytes of traffic, perhaps a few megabytes (would be a lot too).

Are you perhaps meaning to say the asset sync?
(As in all your job files -> EXR/DPX sequences, scene files, textures etc)

I’m asking about performance. Would we see a substantial performance hit on Deadline if we ran the database on the cloud. What are the average datarates? Will the UI hang with high latency? What sort of performance should we expect if we moved our repo to the cloud if we aren’t Pixomondo with 10,000 jobs running concurrently?

Also kind of hoping for Deadline Repositories as a SAAS cloud feature in the future where Thinkbox charges like $5/month to host our repo in the cloud preconfigured with license forwarders etc.

More considerations. We would still need to run a license forwarder on the intranet of all render nodes right? (One in the cloud, one on premises, one at a freelancer location etc?)

And the Remote Connection Server could be used instead of having file folder permissions to the repository?

Hey guys! Sorry it’s been so long. Time to dive into this stuff since there are some great questions here!

Most of Deadline’s compute is distributed and duplicated within the client apps. The DB is just a really fast persistent key storage that allows some filtering. Great stuff, but no smarts. That means to do anything with Deadline we need to write bits in there and read them out (though the Monitor has a local copy of almost everything).

There are three parts to a Deadline farm essentially: Database, Repository, and some place to stuff your scenes and assets. Usually a file server, but the AWS Portal that is shipping in 10.0 uses a local file server plus S3.

If your Database and Repository are in the cloud, everyone has an equal share of that resource and so everyone is equally delayed (it’s not much, but it depends what your ping latency is through your Internet service provider). If the DB and Repo are sitting next to you, you’ll see things like simple submission go quickly. There’s a happy medium by using the Deadline Proxy (RCS in Deadline 10.0) where things like Repository downloads or database can be cached.

Performance is another bit of fun. Depending on where the Proxy lives you can save yourself either Repo download speeds (if the Proxy/RCS sits close to the Repo) or Database access (Proxy/RCS near the the nodes). Unfortunately we still don’t support one Proxy feeding into another one, but I’ll keep bringing that up. ;D

The metrics for performance are more tricky. The queries the Slaves make for data will return every job which is ‘queued’ or ‘rendering’, so it’ll depend on how much unstarted / progressing work there is. The Monitor will pull any data at all that changed, so larger Slave counts can affect data rates as well as the number of currently rendering jobs. All that to say, “it depends” and “I don’t think we have data rates written down yet”. Definitely if you have fewer local machines than you do cloud resources, keeping the DB close to the most consumers is a grand idea!

Not sure about the Repo being a SaaS service, but MongoDB is getting hosted all over the place. At least according to the Googles.

For licensing, you’ll need a VPN because we don’t have workstation licenses in the Marketplace.

Couple holes I’m noticing: no more autoupgrade. It would be great if you could auto-upgrade with the slaves over RCS.

Also it would be really nice if all of the licensing just flowed through On-Demand and RCS.

We’ve been trying to get away from bundling everything into a single app. I agree that it’s easier to set things up when it’s one monolithic app like Pulse was back in the day, but we starting hitting problem when users ran custom scripts through and it slowed or crashed Pulse. We’ve been trying to move to a Unix/microservices approach since then.

I think it deserves taking a step back and thinking through though. Pros and cons of how we’ve organized the existing pieces.

The big con I can see is that there are so many interdependencies now for a functioning system that you can be in a soft-failed state very easily. You have to run down the entire list:

RCS, AWS Link, AWS Asset, Mongo, File Share, License Server, https proxy, License Forwarder, firewall, license port forwarder in AWS link, SSH certificate, AWS keys, Environment variables, Asset directory path, S3 Bucket, AWS permission, PFX files, PFX File directory, Database SSL certificate, Database SSL password, etc… that if something fails you have 6 different features to diagnose. It used to be a short debug list if something went wrong. Now… I can’t imagine migrating a deployment to another machine successfully without something going wrong. It’s incredibly fragile. Remember when Deadline had one requirement: an accessible file share?

Deploying Deadline runs more services and has more inter-dependencies and configurations than the rest of our infrastructure combined. I think the only way to feel comfortable deploying Deadline now would be to hire an IT person to oversee it (which obviously isn’t happening).

Thanks for the observation man. I’ll make sure the right people see this.

I think Gavin beings up a valid concern for many users. We aren’t dealing with the same number of moving parts for various reasons, but I can see how the configuration complexity could result in a net negative in terms of long-term sustainability, ability to upgrade to new versions, etc.

That said, I also want to mention that the move away from monolithic applications has been a welcome one for us, and I wouldn’t really want to see that trend reversed outright.

Just to jump onto this: I just tried the AWS integration stuff and I’m taking a big “nope” step away from this. As I think someone else mentioned, there are something like 2 additional servers that need to be setup and run with always on services to talk with AWS, plus several levels of configuration both locally and in the cloud, plus asset management, plus license management, plus just dealing with AWS’s inane complexities.

This just strikes me as more of a tech demo right now, there is no way I can spend time deploying this for our firm, which is a shame, because smaller firms I feel are the ones that could most benefit from some flexible cloud horsepower. But right now, even understanding all the steps, this is ridiculously convoluted

Delineator,

If you haven’t already, i think you should take a few moments to chat with support to get assistance on setup. our documentation makes it appear much more complicated than it is (and we are working on making that better as we speak!); I set it up from scratch on my laptop the other day and was rendering with Max + Arnold on machines far more powerful than my laptop; and getting as many as I need - it felt totally magical. assets were just handled, licensing was handled and my local machine could contribute to the same work.

that was the goal here; to allow a system that wasn’t 2 farms: if you want a simpler approach, we can assist you in setting up a second all-in cloud infrastructure, but if you want a collaborative, flexible and secure workflow that operates in a hybrid fashion: that’s what Deadline 10 enables.
I would ask that you work with us to make it better/easier/and to align with your workflow!

cheers

cb

g -

When you break it into a list like this; it may appear that way - but lets remember that much of this is still necessary for any facility to operate even without cloud (asset servers, mongo, fileshares, license server, firewall etc). As well, some of those are common to on-prem; if you store a texture on your c: drive and don’t have it mapped on the renderfarm - its not going to work…we all learned that at some point in our careers, and learned best practices. there is a lot of change here which may appear at the outset as scary/different/wrong, but two things will happen; we will learn and users will learn: this means the documentation, the experience, the fundamental understanding of what to do and what not to do will improve for all of us.

cheers

cb

[quote=“im_thatoneguy”]
The big con I can see is that there are so many interdependencies now for a functioning system that you can be in a soft-failed state very easily…RCS, AWS Link, AWS Asset, Mongo, File Share, License Server, https proxy, License Forwarder, firewall, license port forwarder in AWS link, SSH certificate, AWS keys, Environment variables, Asset directory path, S3 Bucket, AWS permission, PFX files, PFX File directory, Database SSL certificate, Database SSL password, etc…
quote]

N - I actually see it differently. we are able to update the online infrastructure and AMI’s we provide without the customer having to do anything. indeed, the other AWS services can continue to improve (launch time, new features) - as an example we implemented per-second billing on EC2 Linux instances and the user didn’t have to do anything to get that benefit; as well we have updated AMI’s coming with various software configurations and again - the user just gets the advantage if that.

now if they don’t want the new ami’s’; they can keep using their own. that’s a key flexibility.

i’ll go back to the documentation; the configuration appears a lot more difficult than it needs to. it will get better!

cb

We don’t upgrade our Domain Controller though every 3 months. And since we use LDAP\Active Directory we actually haven’t touched our configuration in nearly 10 years except to add and remove users to the employee group.

Even if we do deploy all of those things, we would have to deploy and maintain a second copy for Deadline. So for deadline we’re just for licensing maintaining: A FlexLM server, a LicenseForwarder, an AWS forwarder, a license file, a license file for each 3rd party license, a remote connection server, a reverse proxy and an entire SSL\HTTPS setup. Not to mention I need to keep all of that up to date so that I don’t get Equifaxed. I also have about 4 different ports instead of 1 to keep track of to ensure the firewall is working properly as well as remember which port is for which service when configuring slaves etc.

  1. You have to install a FlexLM server.
  2. You have to install your Thinkbox.lic file.
  3. You need to setup your Active Directory Firewall settings.

– Done with local Deployment –

  1. You need to create a folder for your 3PL_Certs
  2. You need to run the deadlinelicforwarder once and tell it where your certs folder is.
  3. You need to edit your ini file to launch the forwarder on startup.
  4. You probably need to change the port number because 443 is “SSL” so if you have any services which host a website like say… the IIS default page, you’ll get a socket conflict. (Or NGINX… or any other https reverse proxy that you’ll need later…)
  5. You need to add 443 to your firewall… no wait! Don’t that opens up every web server that supports https to the outside world!

Ok… go buy your license time.

  1. Place the 3rd party cert in your license folder and and restart your license forwarder.

– Done with local UBL –

  1. Enable the Remote Connection Server in your ini.
  2. Change your Remote Connection server’s port because you probably already have something on 8080 (HTTP).
  3. Reserve a URL for your web server. netsh http add urlacl url=http://+:8888/ user=USERNAME
    12-30) [Setup a Reverse Proxy. God have mercy on your soul. Install openSSL, generate a key, install python on the server, generate a code, register the key, create a public key, sign the key locally etc…]
  4. Open port 443, wait… you didn’t leave the license forwarder service on 443 did you? Better go back and change it if you did or configure your reverse proxy to a different https port.

– Done with theoretically giving a license proxy access –

  1. Assuming you already have an AWS account. And you’ve already setup the policy rules. And you’ve already created an AWS user with said policy rules… Run the AWS Portal Link Installer

  2. Now if everything is working perfectly, and you have no plugins which also need to be forwarded to AWS you’re good!

  3. … if it’s not all good, you need to start working backwards through everything to figure out why a slave isn’t starting…

And that still ignores the fact that we’re essentially amateurs trying to deploy an internet facing hardened service with access to our whole internal network. :0 Is our SSL certificate properly salted? I don’t know, I’m just throwing out words that sound important! Also none of your plugins work. So have fun setting that up as well.

The alternative is to just setup a single VPN connection to the cloud and you’re done. It’s encrypted. It uses your existing Active Directory\LDAP\RADIUS authorization and IT staff has already hardened it 10 years ago with automatic updates through Windows.

That’s the camp I fell into right now until I have a chance to have Thinkbox walk me through the entire AWS setup. I am no novice when it comes to tech stuff, but just reading through this thread has my brain tied in knots and I just can’t pull myself off projects right now to deal.

Don’t get me wrong, I will cbond’s advice and setup some time with Thinkbox so they can walk me through the setup process, I just afford the time distraction right now.

On the topic of this:

can we clone the Deadline Repository and work from the clone and original (not for backup but for redundancy purposes).
We would ensure the clone has realtime syncing enabled.

Would that work?

(example: Round Robin Load balancing between two file servers or having a cloud repo and a local repo)

Just a quick update, I went through it again this afternoon in preparation for some assistance from the Thinkbox folk. The write up’s do seem a bit more streamlined, but outcome is the same (as in it doesn’t work). I have no idea where the kink in the chain is, but I’m hoping I have enough of the framework up and running that Thinkbox can help me out.

The main issue SEEMS to center around the portal server and portal link (these are different?). The console says something like:

2017-11-09 16:47:14: PYTHON: Traceback (most recent call last): 2017-11-09 16:47:14: PYTHON: File "c:\FranticRegressions\DL_Main\git_git.thinkbox.corp.amazon.com_deadline_deadline\DeadlineProject\DeadlineUI\Forms\ConfigureDAWSSettings.py", line 279, in createNewBucket 2017-11-09 16:47:14: PYTHON: System.FormatException: An invalid IP address was specified. 2017-11-09 16:47:14: PYTHON: at System.Net.IPAddress.InternalParse(String ipString, Boolean tryParse) 2017-11-09 16:47:14: PYTHON: at Deadline.AWS.Setup.NewSetup(String accessKey, String secretKey, String[] ips, String region)

and the portal link log says:

Traceback (most recent call last): File "C:\PROGRA~2\Thinkbox\AWSPOR~1\awsportallinklib\ssh_tunnel_dispatcher.py", line 199, in _handle_ssh_tunnel_params ssh_tunnel_params = self._get_ssh_tunnel_params() File "C:\PROGRA~2\Thinkbox\AWSPOR~1\awsportallinklib\ssh_tunnel_dispatcher.py", line 349, in _get_ssh_tunnel_params "GetTunnelParams.py") File "C:\Program Files\Thinkbox\Deadline10\bin\lib\subprocess.py", line 573, in check_output raise CalledProcessError(retcode, cmd, output=output) CalledProcessError: Command '['C:\\Program Files\\Thinkbox\\Deadline10\\bin\\deadlinecommand', '-ExecuteScriptNoGui', 'C:\\PROGRA~2\\Thinkbox\\AWSPOR~1\\GetTunnelParams.py']' returned non-zero exit status 1 1510264254.289000 2017-11-09 16:50:54,289 [C:\PROGRA~2\Thinkbox\AWSPOR~1\awsportallinklib\ssh_tunnel_dispatcher.py:_handle_ssh_tunnel_params:203] [root] [9904] [Dummy-1] [WARNING] [SSHTunnelDispatcher] unable to get ssh tunnel params; fallback with empty list 1510264264.291000 2017-11-09 16:51:04,290 [C:\PROGRA~2\Thinkbox\AWSPOR~1\awsportallinklib\ssh_tunnel_dispatcher.py:_run:128] [root] [9904] [Dummy-1] [INFO] [SSHTunnelDispatcher] is running ... 1510264264.773000 2017-11-09 16:51:04,773 [C:\PROGRA~2\Thinkbox\AWSPOR~1\awsportallinklib\ssh_tunnel_dispatcher.py:_handle_ssh_tunnel_params:202] [root] [9904] [Dummy-1] [ERROR] Command '['C:\\Program Files\\Thinkbox\\Deadline10\\bin\\deadlinecommand', '-ExecuteScriptNoGui', 'C:\\PROGRA~2\\Thinkbox\\AWSPOR~1\\GetTunnelParams.py']' returned non-zero exit status 1

and the aws portal asset server log says:

Traceback (most recent call last): File "C:\PROGRA~2\Thinkbox\AWSPOR~2\awsportalassetserverservice.py", line 46, in main 'FileTransferDirectory') File "C:\PROGRA~2\Thinkbox\AWSPOR~2\s3backedcachelib\deadline_util.py", line 67, in get_repository_option output = _run_deadline_command('GetAWSPortalSetting', option) File "C:\PROGRA~2\Thinkbox\AWSPOR~2\s3backedcachelib\deadline_util.py", line 59, in _run_deadline_command raise DeadlineMissingRepositoryOptionError(args[1]) DeadlineMissingRepositoryOptionError 1510264140.310000 2017-11-09 16:49:00,309 [C:\PROGRA~2\Thinkbox\AWSPOR~2\awsportalassetserverservice.py:main:54] [root] [17796] [Dummy-1] [ERROR] Will retry in 10 seconds. 1510264140.525000 2017-11-09 16:49:00,525 [C:\PROGRA~2\Thinkbox\AWSPOR~2\awsportalassetserverservice.py:main:52] [root] [17796] [Dummy-1] [ERROR] Could not read options from Deadline.

I sorta get all nostalgic working through this, brings back memories of me trying to setup Deadline initially and configure it from scratch

Also do we really need to uninstall and reinstall the AWS Link every time there is a security update?

Privacy | Site terms | Cookie preferences