Optimizing render times for Cinema 4D + Redshift (AWS Portal Workers)

sorenlaulainen · January 5, 2021, 3:40am

Hi folks!

I’m currently experimenting with rendering C4D+Redshift scenes on various AWS Portal instance types. I’m trying to optimize render times using the Cinema4DBatch plugin, so the scene file stays loaded in memory in between frames. From my tests, it’s clear that rendering .rs files using the Redshift Standalone plugin is faster, but from a QoL standpoint we’d prefer to just render the .c4d scenes directly instead of exporting proxies first.

Since we’re using Cinema4DBatch, all of the frames after the first one render very quickly. However, there’s a lot of “setup” involved in rendering Frame 0 that gets cached. It’s this part of the rendering process that I’d like to speed up as much as possible.

I ran the same task through a few different instances and went through and summarized times for various “chunks” of the process:

=======================================================
Worker Information
=======================================================
Operating System: Amazon Linux release 2 (Karoo)
CPUs: 16 Memory Usage: 6.7 GB / 62.1 GB (10%) Free Disk Space: 9.811 GB
Video Card: Amazon.com, Inc. Device 1111 (x1 GPU)
Instance Type: g4dn.4xlarge
Startup Time: 1:05
Render Time: 17:33
Task Time: 18:38

0:04 – Deadline job start, launch Cinema4DBatch plugin
0:03 – Cinema 4D Commandline launch
0:01 – Could not initialize OpenGL
0:01 – Redshift initialization
0:52 – Redshift load C4d scene
0:03 – Redshift path mapping
0:01 – Deadline start render

0:19 – Redshift scanning scene, updating lights
0:08 – Redshift scanning materials
0:23 – Redshift Extracting Geometry, Mesh Creation, Mesh Geometry Update, Acquire License, etc
4:10 – Redshift process textures
0:01 – Redshift preparing materials and shaders
0:00 – Redshift allocating GPU mem and VRAM
11:57 – Redshift rendering blocks
0:01 – Redshift apply post effects and end render
0:08 – Redshift return license and free GPU memory
0:17 – Redshift context unlocked render
0:09 – Deadline finish Cinema4D task

=======================================================
Worker Information
=======================================================
Operating System: Amazon Linux release 2 (Karoo)
CPUs: 64 Memory Usage: 10.3 GB / 480.3 GB (2%) Free Disk Space: 10.240 GB
Video Card: Cirrus Logic GD 5446
Instance Type: p3.16xlarge
Startup Time: 0:54
Render Time: 3:04
Task Time: 3:58

0:05 – Deadline job start, launch Cinema4DBatch plugin
0:09 – Cinema 4D Commandline launch
0:18 – Could not initialize OpenGL
0:09 – Redshift initialization
0:06 – Redshift load C4d scene
0:02 – Redshift path mapping
0:02 – Deadline start render

0:08 – Redshift scanning scene, updating lights
0:08 – Redshift scanning materials
0:07 – Redshift Extracting Geometry, Mesh Creation, Mesh Geometry Update, Acquire License, etc
1:11 – Redshift process textures
0:07 – Redshift preparing materials and shaders
0:06 – Redshift allocating GPU mem and VRAM
0:43 – Redshift rendering blocks
0:02 – Redshift apply post effects and end render
0:08 – Redshift return license and free GPU memory
0:06 – Redshift context unlocked render
0:16 – Deadline finish Cinema4D task

=======================================================
Worker Information
=======================================================
Operating System: Windows Server 2016 Datacenter
CPUs: 64 Memory Usage: 17.4 GB / 488.0 GB (3%) Free Disk Space: 91.781 GB
Video Card: NVIDIA Tesla V100-SXM2-16GB (x8 GPU)
Instance Type: p3.16xlarge
Startup Time: 4:49
Render Time: 7:42
Task Time: 12:31

0:12 – Deadline job start, launch Cinema4DBatch plugin
2:30 – Cinema 4D Commandline launch
0:25 – Could not initialize OpenGL
0:30 – Redshift initialization
0:56 – Redshift load C4d scene
0:05 – Redshift path mapping
0:16 – Deadline start render

0:19 – Redshift scanning scene, updating lights
2:23 – Redshift scanning materials
0:05 – Redshift Extracting Geometry, Mesh Creation, Mesh Geometry Update, Acquire License, etc
1:08 – Redshift process textures
1:21 – Redshift preparing materials and shaders
0:06 – Redshift allocating GPU mem and VRAM
0:44 – Redshift rendering blocks
0:06 – Redshift apply post effects and end render
0:05 – Redshift return license and free GPU memory
1:23 – Redshift context unlocked render
0:02 – Deadline finish Cinema4D task

(Full logs here: render-logs.zip (69.4 KB) )

Of course, the actual bulk of the Redshift render is way faster on the p3 instance. And the same instance type on Linux is much faster than Windows. But it’s curious to me how some parts actually run slower on the p3.16xl instance than on the g4dn.4xl instance. Is this added time mostly due to interfacing with multiple GPUs instead of a single GPU?

I understand that preprocessing Redshift textures would help alleviate some of the setup time here, but are there other aspects to the AWS portal instance (or the render job) that I should be considering?

How much do the EBS Volume Types on my AMI, or the proximity of my AWS Infrastructure Region, affect the speed of these processes?

Thanks for the help

Bobo · January 5, 2021, 5:13am

I cannot give you a precise answer on why g4dn might be faster in some parts, but you have to keep in mind that the g4dn’s CPU / RAM are roughly equivalent to the c5/m5 EC2 instances generation (Cascade Lake-based Xeons like c5, but clocked at 2.5 like m5), while the p3’s non-GPU hardware is like the m4 generation (Broadwell-based). So there can be a difference in performance on the CPU side which could affect the preparation of the scene. But I am not really seeing much of that in the examples you included…

The EBS volume attached to the instances should be the same, but the EBS bandwidths supported by g4dn and p3 are different. The g4dn.4xlarge offers only 4,750 Mbps, while p3.16xlarge supports 14 Gbps (with a G!). So I would expect loading assets to be faster.

On an EBS note, when you are building your AMIs, I would suggest switching to the latest gp3 SSDs as they are both faster and cheaper, and you can adjust the IOPS independently from the volume size.

The proximity of the AWS Region plays some role for the uploading of the assets, but that is not captured in the render logs, as it runs in a separate process via the AWS Portal Asset Server. You should find the largest region near you, with the lowest Spot prices - it will pay off.

Regarding the size of the instances, both g4dn.xlarge, 2xlarge, 4xlarge and 8xlarge offer the same single T4 GPU. Given the amount of RAM you are using in these tests, it might be worth trying to render on a g4dn.xlarge instead of 4xlarge. While the p3.16xlarge is over 16x faster in the GPU phase (the actual buckets being rendered), the g4dn.xlarge is about 45x cheaper on EC2 Spot on Linux (using Ireland as an example)! The g4dn.4xlarge is only 19.7x cheaper. You could lower your bill significantly by finding the smallest g4dn instance that has enough RAM to handle your jobs. And with the rest of the stages that are less dependent on the GPU performance leading to a difference in total render times of only 4.7 times, the g4dn totally makes sense.

Note that the p3.2xlarge is also a lot more cost effective than the 16xlarge. Of course it is about 8x slower in the GPU stage, but it is also 8x cheaper. With the CPU and EBS-bound stages blurring the lines, it might be worth waiting longer for the frame, but paying less for it.

Food for thought…

sorenlaulainen · January 6, 2021, 11:12pm

Thanks for the thoughtful and detailed response, @Bobo!

Good point with the gp3 volume, I’ll make sure to check that.

With the g4dn.xlarge instance, let’s say we had a render that took 50 hours. The EC2 costs would only be $7.89 but wouldn’t the UBL costs for all the extra hours almost eliminate the savings?

Say we were just using Redshift standalone and not consuming C4D UBL as well, that would be $30 in UBL, for a total of $37.89. On a p3.16xlarge instance, the render would be closer to 5 hours, so $36.72 + $3 in UBL for a total of $39.72. I know UBL can only be purchased in buckets of at least 50 hours, but it sort of seems like the p3 instances are the best option?

Bobo · January 6, 2021, 11:40pm

This is a very valid point, but in the case of Redshift, you don’t have to use UBL if you have permanent floating licenses, as we have an agreement with the developers of Redshift to allow Bring Your Own License (BYOL). So if you intend to run a specific number of instances with Redshift, you could purchase perpetual floating licenses for that number of instances. While the cost of a Redshift license is quite high ($600 for a floating license, $300 per year to renew the Support and Maintenance), spread across many years it would be orders of magnitude cheaper than equivalent UBL.

Also note that Redshift UBL gets cheaper with increased package size, so there is another variable to take into account.

At the highest price of $0.60 per hour, rendering 24/7 for a whole year would cost $5,256. At the lowest price of $0.179 per hour, it would be $1,568. Compared to $600 for the first year, and $300 for subsequent years, it is simply too expensive for non-stop rendering. It is good for occasional spiky rendering. At the highest UBL price, you can render for 1000 hours or 41.66 days for $600. With the largest package, you can render for nearly 140 days with one machine.

My main objection was the use of a g4dn.4xlarge instead of a g4dn.xlarge, because both have the same T4 GPU. A p3.2xlarge is about 4x faster in pure GPU performance, so it will reduce the UBL cost a few times if you use UBL.

A p3.16xlarge will be about 8x faster than p3.2xlarge in pure GPU performance, but its pricing does not scale in lockstep with the performance, and the available capacity is significantly lower. This is because smaller instances are VMs running on the same hardware as the largest instance. So one physical machine can hold either 8 2xlarge, or a single 16xlarge p3 instance, and for that reason there are a lot more smaller instances than large ones in all AWS Regions.

A benefit of a faster instance, besides the lower UBL cost, is that there is a lower chance of interruption. If a frame takes 50 hours to render on a g4dn.xlarge, and it takes 15 hours on a p3.2xlarge, and under 2 hours on a p3.16xlarge, the probability to lose the instance while it is rendering goes down the shorter the render time gets. But at the same time, larger p3 instances tend to have a higher probability of being interrupted due to lower capacity, so you are balancing a positive and a negative. For that reason, the p3.2xlarge which has a lower chance of capacity-related interruption, but higher performance than a g4dn, is my favorite middle ground for GPU rendering…

There is no way around C4D UBL, if you need to render in C4D with Redshift, you must use UBL.

If you can get enough p3.16xlarge instances, go for them!

sorenlaulainen · January 6, 2021, 11:56pm

Loving the deep dive on this. Thanks again!

Bobo · January 7, 2021, 12:30am

One more detail that might escape some users as it is not super obvious, unless you read the AWS documentation:

When using Amazon EC2 Spot, interruptions are handled as follows:

If an instance is reclaimed within the first hour, you are not charged for that hour at all on both Linux and Windows.
If an instance is reclaimed after the first hour, you are charged for the full hours for Windows, or for the exact seconds that have passed since the launch in the case of Linux.

Now let’s say your render time on a p3.16xlarge is 30 minutes per frame. You launch some instances, and one of them gets interrupted around the 20th minute. In that case, you wasted some UBL licensing, but you don’t pay for the instance at all. If you get interrupted around the 40th minute, you will not be charged for the instance, AND you got a frame to render for free (except for the UBL cost). If you render two frames in the three frames in 90 minutes, and on the 4th frame a Windows instance gets interrupted, you pay for the first two frames in the first full hour, but not the third frame because that hour is not charged. If you are on Linux, you will pay for the exact seconds you got to run the instance, so you don’t win much.

In other words, it is a good idea to try to keep your frame times under an hour to fit within this policy and save money even if some of your render nodes get interrupted. This gives a boost to the p3.16xlarge case even if capacity is low and you get more interruptions…