Home » Blender » Benchmarking Blender on RenderStreet, dual CPU and quad GPU

Benchmarking Blender on RenderStreet, dual CPU and quad GPU

Last updated on by .

This February, the Blender Institute published a set of files that are used internally by the Cycles developers for testing purposes. They also released the configuration of the workstations they tested these files on, along with the render times for each configuration and scene.

We get asked a lot how fast are our servers, so we thought to give these files a test run and put the numbers here for everyone to see. This way we give you an idea of what kind of rendering speed we are offering, and how our machines perform. Remember that the numbers are for a single server, and there are hundreds available to render your jobs when needed.

RenderStreet’s test machines had the following configurations:

  • CPU: Dual Intel Xeon E5-2680, 16 cores, 32 threads
  • GPU: Dual NVidia K520, a total of 4 GPUs per server

For reference purposes, we are adding the scores from the fastest machines available in the Blender Institute, which have the configurations listed below. These configurations also provide a good comparison base with an AMD GPU and a Mac server. As a side note, the AMD GPU had a few issues during their tests – more on that at the end of the article.

  • CPU station 1: Studio Intel Server, Dual Xeon E5-2697 v3, 28 cores, 56 threads
  • CPU station 2: Studio MacPro, 8 core Xeon E5, 16 threads
  • GPU station 1: EVGA GTX980
  • GPU station 2: AMD R9 Fury

Here are the scenes, all rendered with their default settings. The RenderStreet tests were done with the official Blender 2.77 version.

1. BMW27, 960 x 540 px, 1,225 Samples (35 squared)

This is an evolution of Mike Pan’s famous scene, which takes a bit longer to render compared to the first version. It’s still the ‘lightest’ image from the entire test by far.

RenderStreet Blender Cycles benchmark 1 - BMW27

Render time in h:mm:ss, lower is better

In this case the render speeds and scaling are pretty much as expected, both for the CPUs and the GPUs. There are no notable surprises, which was expected given the fact that it’s not a very challenging scene in terms of rendering.

2. Classroom, 1,920 x 1,080 px, 300 Samples

Created by Christophe Seux, this is a very good interior rendering sample, with nice illumination and a good level of detail.

RenderStreet Blender Cycles benchmark 2 -Classroom

Render time in h:mm:ss, lower is better

For the classroom scene, the render times begin to scale unevenly with the hardware. The difference between the Studio server and our CPU server grows wider, and the GPU scaling between a single GTX 980 and two of them (our GPU servers are roughly equivalent with two Titan boards) is no longer linear.

3. Fishy Cat, 1,002 x 460 px, 1,000 Samples

The splash screen image from Blender 2.74, created by Manu Järvinen. It has a lot of hair, but it’s a fast render.

RenderStreet Blender Cycles benchmark 3 -Fishy Cat

Render time in h:mm:ss, lower is better

The Fishy Cat scene brings back the balance in the CPU area, but the GPU performance is again uneven – this time, in favor of our quad GPU setup. It’s quite an unexpected result, given the small size of the image, which we expected to make less optimal use of the 4 GPUs.

4. Koro, 720 x 1,280 px, 500 Samples

Our favorite character from the Caminandes series doesn’t need any other intro. Lots of hair here, too.

RenderStreet Blender Cycles benchmark 4 - Koro

Render time in h:mm:ss, lower is better

The results are clearly in favor of CPU rendering. Even the slowest CPU in the test, the MacPro one, performs better than the most powerful GPU server.

5. Pabellon Barcelona, 1,280 x 720 px, 1,000 Samples

A nice exterior architecture render by Ludwig Mies van der Rohe/Claudio Andres/Hamza Cheggour, has a generous amount of reflections and a number of difficult elements.

RenderStreet Blender Cycles benchmark 5 - Pabellon Barcelona

Render time in h:mm:ss, lower is better

The pavilion scene renders as expected on the GPUs, and once again shows uneven results for the CPU renders.

6. Victor, 2,048 x 858 px, 600 Samples

The two main characters from Cosmos Laundromat make for the most challenging scene in this batch.

RenderStreet Blender Cycles benchmark 6 - Victor

Render time in h:mm:ss, lower is better

Because of the size of this scene’s footprint (over 5GB), the render is CPU only. The results are pretty much as expected, with the dual CPU machines having the best results.

Closing thoughts:

The Xeon E5-2697 v3 workstation is the fastest one in the majority of the tests. However, one CPU costs $2,700 at the time of writing this article, which makes the entire workstation cost over $6,000. Hardly the best bang for the buck, under any circumstances. As a comparison, a GeForce 980Ti is priced at $670, and a 12 GB Titan X will set you back approximately $1,600. And if you do need to render on CPU, there are options with a better cost/performance ratio.

Our GPUs are performing very well in the majority of the scenes, leading the pack as expected. But if we look at the Koro scene, the power balance scales heavily towards the CPU machines. My guess is that the fur in this file is causing the speed drop on the GPU, which means that there is still room for improvement in the Cycles support for GPU.

Our CPU servers are packing a good punch as well, so we got you covered should you wish (or need) to render on CPU.

The AMD GPU is struggling, both in terms of performance and in terms of output. The original results file linked from the Blender Dev Blog mentions issues with half of the scenes: white windshield on the BMW, no fur for Koro and missing texture on the pool for Pabellon, all renders being made with Blender 2.77 RC. At this moment the market price for the R9 Fury is in the same league with a GeForce 980Ti, and given those results I would say this is still a game best played by NVidia.

A couple of days ago, Sergey made a performance update to the BVH compute algorithm, making it multi-threaded. As a result, in some cases the BVH takes significantly less time to calculate. It would be interesting to see what impact the update has for the render times in this test.

You can find the demos on the Blender Developer Blog here, including the link to the full result set on the Institute workstations.

Happy blending!

Marius
Passionate about technology and constantly working on making a difference, Marius is RenderStreet's CEO.
  • Great Article! I’m building my own 64 core render node, and was really curious to know how something like this would compare to a High End GPU, so looks like I made the right choice. 🙂
    Useful to highlight, even GPUs being cheaper, still they can’t carry something like 1TB of ram, as a good server board does, that’s why I’ll probably stick to the CPU route for yet a long time.
    Cheers!

    • A 64 core render note sounds promising indeed. I would suggest going the Intel route, as they offer the best performance at the moment. And I assume you’d like to future-proof it as much as possible.

      Regarding the CPU vs. GPU choice, it really depends on what kind of projects you work on. We render a lot of professional work and both the CPUs and the GPUs provide good value to the respective studios. And the current trend is to add more and more memory to the GPUs. Plus, I have yet to see a 1TB project 🙂

      Anyway, I’m glad the article helped. Good luck with the build and post the numbers here when it’s ready.

      • Yeah, intel is the way to go. I have here a quad Opteron 6282SE that I got really cheap online, and also a dual E5 2695 v3 on the go. But if I had to start from scratch today, the new v4 Xeons looks really sexy indeed!

        And agreed about the cpu vs gpu, I still love to use a GPU for lookdev, it’s just way more fun to work with, but add a bunch of 6-8k UDIM tiles and GPU isn’t a option anymore.
        That why I started to invest in some more robust render nodes at first, but I didn’t expected to see the dual socket setups outperforming a GPU like a GTX980, that was a heart warming surprise, I’ll surely add my benchs here soon as I get all the remaining packs. 🙂

        (PS: I can’t say about 1TB, but I already had to deal with some point clouds from PhotoScan peaking at around 400gb! But other than that, I think a porsche would be a better investment :P)
        Cheers!

  • Jacob Perl

    Newbie here. Just so I understand correctly, your bronze plan would give us roughly 10x the speed of the RenderStreet GPU results shown here?

    • The Bronze plan gives you access to 10 servers simultaneously, CPU or GPU. So yes, if you launch a longer animation – or several still images – you’ll get 10x the speed shown here. Also, if you launch a large still image render using the split function, several servers will be allocated to it – depending on the image size and complexity. But if you launch only a still image without the split function activated, only a single server will be allocated to it. Does this answer your question?