Home » Blender » Benchmarking Blender on RenderStreet, dual CPU and quad GPU

Benchmarking Blender on RenderStreet, dual CPU and quad GPU

Last updated on by .

This February, the Blender Institute published a set of files that are used internally by the Cycles developers for testing purposes. They also released the configuration of the workstations they tested these files on, along with the render times for each configuration and scene.

We get asked a lot how fast are our servers, so we thought to give these files a test run and put the numbers here for everyone to see. This way we give you an idea of what kind of rendering speed we are offering, and how our machines perform.

Remember that the numbers are for a single server, and there are hundreds available to render your jobs when needed.

Update: a new benchmark article, including numbers for Blender 2.80 and hybrid (CPU+GPU) rendering, can be found here.

RenderStreet’s test machines had the following configurations:

  • CPU: Dual Intel Xeon E5-2680, 16 cores, 32 threads
  • GPU: Dual NVidia K520, a total of 4 GPUs per server

For reference purposes, we are adding the scores from the fastest machines available in the Blender Institute, which have the configurations listed below. These configurations also provide a good comparison base with an AMD GPU and a Mac server. As a side note, the AMD GPU had a few issues during their tests – more on that at the end of the article.

  • CPU station 1: Studio Intel Server, Dual Xeon E5-2697 v3, 28 cores, 56 threads
  • CPU station 2: Studio MacPro, 8 core Xeon E5, 16 threads
  • GPU station 1: EVGA GTX980
  • GPU station 2: AMD R9 Fury

Here are the scenes, all rendered with their default settings. The RenderStreet tests were done with the official Blender 2.77 version.

1. BMW27, 960 x 540 px, 1,225 Samples (35 squared)

This is an evolution of Mike Pan’s famous scene, which takes a bit longer to render compared to the first version. It’s still the ‘lightest’ image from the entire test by far.

RenderStreet Blender Cycles benchmark 1 - BMW27

Render time in h:mm:ss, lower is better

In this case the render speeds and scaling are pretty much as expected, both for the CPUs and the GPUs. There are no notable surprises, which was expected given the fact that it’s not a very challenging scene in terms of rendering.

2. Classroom, 1,920 x 1,080 px, 300 Samples

Created by Christophe Seux, this is a very good interior rendering sample, with nice illumination and a good level of detail.

RenderStreet Blender Cycles benchmark 2 -Classroom

Render time in h:mm:ss, lower is better

For the classroom scene, the render times begin to scale unevenly with the hardware. The difference between the Studio server and our CPU server grows wider, and the GPU scaling between a single GTX 980 and two of them (our GPU servers are roughly equivalent with two Titan boards) is no longer linear.

3. Fishy Cat, 1,002 x 460 px, 1,000 Samples

The splash screen image from Blender 2.74, created by Manu Järvinen. It has a lot of hair, but it’s a fast render.

RenderStreet Blender Cycles benchmark 3 -Fishy Cat

Render time in h:mm:ss, lower is better

The Fishy Cat scene brings back the balance in the CPU area, but the GPU performance is again uneven – this time, in favor of our quad GPU setup. It’s quite an unexpected result, given the small size of the image, which we expected to make less optimal use of the 4 GPUs.

4. Koro, 720 x 1,280 px, 500 Samples

Our favorite character from the Caminandes series doesn’t need any other intro. Lots of hair here, too.

RenderStreet Blender Cycles benchmark 4 - Koro

Render time in h:mm:ss, lower is better

The results are clearly in favor of CPU rendering. Even the slowest CPU in the test, the MacPro one, performs better than the most powerful GPU server.

5. Pabellon Barcelona, 1,280 x 720 px, 1,000 Samples

A nice exterior architecture render by Ludwig Mies van der Rohe/Claudio Andres/Hamza Cheggour, has a generous amount of reflections and a number of difficult elements.

RenderStreet Blender Cycles benchmark 5 - Pabellon Barcelona

Render time in h:mm:ss, lower is better

The pavilion scene renders as expected on the GPUs, and once again shows uneven results for the CPU renders.

6. Victor, 2,048 x 858 px, 600 Samples

The two main characters from Cosmos Laundromat make for the most challenging scene in this batch.

RenderStreet Blender Cycles benchmark 6 - Victor

Render time in h:mm:ss, lower is better

Because of the size of this scene’s footprint (over 5GB), the render is CPU only. The results are pretty much as expected, with the dual CPU machines having the best results.

Closing thoughts:

The Xeon E5-2697 v3 workstation is the fastest one in the majority of the tests. However, one CPU costs $2,700 at the time of writing this article, which makes the entire workstation cost over $6,000. Hardly the best bang for the buck, under any circumstances. As a comparison, a GeForce 980Ti is priced at $670, and a 12 GB Titan X will set you back approximately $1,600. And if you do need to render on CPU, there are options with a better cost/performance ratio.

Our GPUs are performing very well in the majority of the scenes, leading the pack as expected. But if we look at the Koro scene, the power balance scales heavily towards the CPU machines. My guess is that the fur in this file is causing the speed drop on the GPU, which means that there is still room for improvement in the Cycles support for GPU.

Our CPU servers are packing a good punch as well, so we got you covered should you wish (or need) to render on CPU.

The AMD GPU is struggling, both in terms of performance and in terms of output. The original results file linked from the Blender Dev Blog mentions issues with half of the scenes: white windshield on the BMW, no fur for Koro and missing texture on the pool for Pabellon, all renders being made with Blender 2.77 RC. At this moment the market price for the R9 Fury is in the same league with a GeForce 980Ti, and given those results I would say this is still a game best played by NVidia.

A couple of days ago, Sergey made a performance update to the BVH compute algorithm, making it multi-threaded. As a result, in some cases the BVH takes significantly less time to calculate. It would be interesting to see what impact the update has for the render times in this test.

You can find the demos on the Blender Developer Blog here, including the link to the full result set on the Institute workstations.

Happy blending!

Marius
Passionate about technology and constantly working on making a difference, Marius is RenderStreet's CEO.