I’m setting up the whole AWS portal system and i’m having issues with Redshift standalone. All the ec2 stuff is working fine, I can start the spotfleets etc.
But I’m getting errors while rendering redshift standalone files:
The .rs files are placed in the right bucket by the asset server and if I manually dig around and rename one of the files to .rs it renders fine locally using redshift cmdline. So the assets in the bucket seems to be fine.
But a renderslaves get a mangled(?) version or atleast something with a wrong descriptor as shown in the above log.
That is some excellent diagnosing there! Agreed it must be corrupted due to the SGMTMRKR in the error.
If the file is coming back from S3 and rendering fine, it does seem like the cache on the render node has a bad copy… We store it on a local volume for faster access, but re-saving it should cause a re-sync up to S3 and back to the local volume.
Has this happened more than once? We’re downloading it to the volume using Amazon libraries, then providing it to Redshift through some magic. Is there a good way for us to reliably reproduce this one? We don’t have data corruption issues often (or they’re not reported).
Now that I read it again… ’ SGMTMRKR’ that could be 8 character shorthand for ‘segment marker’… that might be a clue?
Anyways… it happens every time, I’ve started/stopped the whole ec2 infrastructure a couple of times during testing, I assume that’s the equivalent of a fresh start cache wise. The only think I didn’t do is empty the bucket, I’ll give that a go later when I get back to it.
As far as reproducing it… it’s just a simple RS standalone submission, to rule things out here is my test sequence of rs files, nothing pretty, just a torus, light and a camera: _assets.zip (355.4 KB)
I’ve emptied out the bucket and it got more strange… the first few frames rendered fine and I thought it was fixed… and then after a few frames it was the same SGMTMRKR problem again…
This is all using a single render slave, so it’s not a matter of that some machines are working and others are not…
Hmm. Good point about “Segment Marker”. I kind of glossed over that.
You’re version of Deadline is pretty old at this point (we’re at 10.0.20 now) but if it’s happening reliably, we should try and upgrade Redshift on the AMI to match what you have in-office. There’s some docs on how to create a custom AMI over here.
The base images are also designed to match whatever version of Deadline you’re currently on, but I’m not sure if we’ve upgraded Redshift since SP12. If you do make a custom image, it may be worth upgrading Deadline first so you’ll have more of a runway for your efforts. You’d want to stop the AWS Portal services, upgrade Deadline, then install the new AWS Portal components.
Ok, I’m going to upgrade everything to the latest version and see what happens
I’m also tempted to go for a custom AMI based on the Thinkbox base AMI so I can have some control… but a few things that aren’t clear at the moment:
-I can’t find a base Redshift Standalone AMI, only a Redshift+Maya one… are those the same?
-I’m planning on using Redshift on-demand though deadline, is that supported using a customized base AMI? (So the Redshift usage is burning credits per hour and deadline handles all the licensing etc)
-When Redshift releases an update/new version what’s the usual time frame before that comes available in one of the the official Thinkbox AMIs?
Got a bit furher… now I get this error when rendering to Redshift Standalone:
Loading: /mnt/Data/CCloutput_assets0e07af1e79b51dbb794d5f065b4c793e/test_0000.rs
2018-09-03 18:34:59: 0: STDOUT: Failed to load proxy ‘/mnt/Data/CCloutput_assets0e07af1e79b51dbb794d5f065b4c793e/test_0000.rs’. Proxy version mismatch. Found version ‘46’, current version is ‘44’.
So I guess that means you guys need to update the RS version on the ami?
What Redshift version are you using? If there is a version mismatch you are able to create a custom AMI with the version you need. You’d just load up our base Redshift AMI and then install the version of Redshift you need. Here is a document that covers creating a custom AMI.
I’m working on the custom AMI right now… my l337 linux skills are a bit rusty but I think I can manage
I took the maya+RS base ami since that’s what the portal starts up if you select ‘redshift’.
While i’m at it, is there a magic command to upgrade deadline itself to the lastest version? The AMI is ‘still’ on 10.0.17.3 iirc and locally om on 20.0 now.
At the moment I recommend starting the AMI you are modifying from the latest version that matches your local installation instead of upgrading Deadline on the AMI. Sometimes there may be fixes on the AMI itself that may not be implemented through an upgrade of the Deadline Client.
Thanks for the headsup! Ahh by looking a bit better in the AMI list from user 357466774442 I found this one, which has the right deadline version on it… I mus thave missed that one.
Deadline Slave Base Image Linux 10.0.20.1 with Maya 2017_Update4 and Redshift 2.5.62 2018-08-24T155329Z - ami-09924e55c29f637c7
After updating RS I got an error when running the redshiftcmdline:
STDOUT: /usr/redshift/bin/redshiftCmdLine: error while loading shared libraries: libgomp.so.1: cannot open shared object file: No such file or directory