3D Rendering on Apple Silicon, CPU&GPU

jmho · Dec 7, 2021

The latest patch is still under review, so don't expect Metal support in the dailies until that's merged in.

It's also important to remember that as far as I can tell this Metal patch is the quickest and dirtiest way to get Metal up and running so don't expect blazing performance right off the bat. OptiX is very mature at this point and HIP is in a fully released state, while Metal support is very much a work in progress.

I'd expect Metal support to be relatively slow and buggy for a month or two while it's still being worked on, and I feel kinda bad for Michael Jones who has done an absolutely incredible job getting this stuff done so quickly, and the second it's out in it's very alpha state he's going to get so much garbage - not from the people who actually use Blender on Mac who will be really happy having usable render times, but from the benchmark people moaning that the metal port is disappointing and Apple sucks blah blah because the M1 Max isn't able to beat a 3060 the second it hits alpha (or likely ever, because the M1 Max doesn't have RT h/w)

jmho · Dec 7, 2021

mi7chy said:
Furthermore, the non-default settings influence render times. And, disabling denoise and noise threshold not only make the render results look terrible but also make the AMD render times about 38% worse on Sprite Fright render demo.

I think this is probably because Nvidia cards have access to the much faster AI-based OptiX de-noiser instead of the standard Intel one that everyone else has to use.

De-noising with a threshold means that the better the de-noiser, the less raytracing you have to do. I'm assuming that rather than deal with the question of is it fair that Nvidia cards get an advantage (using OptiX de-noising) or a disadvantage (not using it) they just decided to remove de-noising all together.

l0stl0rd · Dec 7, 2021

mi7chy said:
16.39s - 3060 70W mobile (OptiX Blender 3.0)
20.57s - reference 6900xt (HIP Blender 3.0)
29s - 2070 Super (OptiX)
31s - 3060 70W mobile (OptiX Blender 2.93)
48s - M1 Max 24GPU (Metal Blender 3.1 alpha source build)
51s - 2070 Super (CUDA)
2.04m - Mac Mini M1 (Metal Blender 3.1 alpha source build)
3:55.81m - AMD 5800H base clock no-boost and no-PBO overclock (CPU Blender 3.0)
5:51.06m - MBA M1 (CPU Blender 3.0)
*will have 5950x CPU result up soon

1:18:34 M1Pro GPU 3.1 + patch
I wonder how much it will change when the patch gets more “final”.

ww1971 · Dec 7, 2021

jmho said:
The latest patch is still under review, so don't expect Metal support in the dailies until that's merged in.

It's also important to remember that as far as I can tell this Metal patch is the quickest and dirtiest way to get Metal up and running so don't expect blazing performance right off the bat. OptiX is very mature at this point and HIP is in a fully released state, while Metal support is very much a work in progress.

I'd expect Metal support to be relatively slow and buggy for a month or two while it's still being worked on, and I feel kinda bad for Michael Jones who has done an absolutely incredible job getting this stuff done so quickly, and the second it's out in it's very alpha state he's going to get so much garbage - not from the people who actually use Blender on Mac who will be really happy having usable render times, but from the benchmark people moaning that the metal port is disappointing and Apple sucks blah blah because the M1 Max isn't able to beat a 3060 the second it hits alpha

I expect it will be sometime before that happen. Who would want to try a buggy metal add on on an early blender build?

jujoje · Dec 7, 2021

jmho said:
I think this is probably because Nvidia cards have access to the much faster AI-based OptiX de-noiser instead of the standard Intel one that everyone else has to use.

De-noising with a threshold means that the better the de-noiser, the less raytracing you have to do. I'm assuming that rather than deal with the question of is it fair that Nvidia cards get an advantage (using OptiX de-noising) or a disadvantage (not using it) they just decided to remove de-noising all together.

The Nvidia de-noiser has, at least in my testing, always been pretty terrible compared to Intel or Renderman; it was good for lookedev because it's iterative and fast, but shouldn't be let near final renders as it tends to artefact and cross frame denoising sucks. It's fast but it's not good; definitely has its uses but feel it's been overhyped.

The render test should always be to the converged render and not denoised so that's definitely the right choice there. That they changed the resolution and the noise threshold is somewhat baffling though.

Boil · Dec 7, 2021

mi7chy said:
16.39s - 3060 70W mobile (OptiX Blender 3.0)
20.57s - reference 6900xt (HIP Blender 3.0)
29s - 2070 Super (OptiX)
31s - 3060 70W mobile (OptiX Blender 2.93)
48s - M1 Max 24GPU (Metal Blender 3.1 alpha source build)
51s - 2070 Super (CUDA)
2.04m - Mac Mini M1 (Metal Blender 3.1 alpha source build)
3:55.81m - AMD 5800H base clock no-boost and no-PBO overclock (CPU Blender 3.0)
5:51.06m - MBA M1 (CPU Blender 3.0)

l0stl0rd said:
1:18:34 M1Pro GPU 3.1 + patch
I wonder how much it will change when the patch gets more “final”.

From 48s to 1:18:34...?

jmho said:
The latest patch is still under review, so don't expect Metal support in the dailies until that's merged in.

It's also important to remember that as far as I can tell this Metal patch is the quickest and dirtiest way to get Metal up and running so don't expect blazing performance right off the bat. OptiX is very mature at this point and HIP is in a fully released state, while Metal support is very much a work in progress.

I'd expect Metal support to be relatively slow and buggy for a month or two while it's still being worked on, and I feel kinda bad for Michael Jones who has done an absolutely incredible job getting this stuff done so quickly, and the second it's out in it's very alpha state he's going to get so much garbage - not from the people who actually use Blender on Mac who will be really happy having usable render times, but from the benchmark people moaning that the metal port is disappointing and Apple sucks blah blah because the M1 Max isn't able to beat a 3060 the second it hits alpha (or likely ever, because the M1 Max doesn't have RT h/w)

Hoping for a solid release of a Metalified Blender when the mid/high-end Mac minis drop next Spring...?!?

ww1971 said:
I expect it will be sometime before that happen. Who would want to try a buggy metal add on on an early blender build?

All the haters from the PC 3D world who want to come in and poop on macOS, Apple silicon, & Metal...?

hefeglass · Dec 7, 2021

jmho said:
The latest patch is still under review, so don't expect Metal support in the dailies until that's merged in.

It's also important to remember that as far as I can tell this Metal patch is the quickest and dirtiest way to get Metal up and running so don't expect blazing performance right off the bat. OptiX is very mature at this point and HIP is in a fully released state, while Metal support is very much a work in progress.

I'd expect Metal support to be relatively slow and buggy for a month or two while it's still being worked on, and I feel kinda bad for Michael Jones who has done an absolutely incredible job getting this stuff done so quickly, and the second it's out in it's very alpha state he's going to get so much garbage - not from the people who actually use Blender on Mac who will be really happy having usable render times, but from the benchmark people moaning that the metal port is disappointing and Apple sucks blah blah because the M1 Max isn't able to beat a 3060 the second it hits alpha (or likely ever, because the M1 Max doesn't have RT h/w)

I am ecstatic that this is already almost usable..and the improvement over cpu is almost 5x in this early release. I wasnt expecting RTX level of performance..we will see how the youtube benchmarkers react to the performance. For my personal use, its obviously not ready for primetime..but I am looking forward to being able to do quick renders on my machine before sending to my desktop for a full animation render. Will be the perfect workflow.
I also noticed that you can get away with very low render passes and use denoising and get really quick decent renders on the m1 gpu.
One scene that seemed to perform the same on the m1 max gpu and rtx 3070 was "junk shop" got 23 seconds on both machines after multiple runs. Interesting that optix doesnt have many gains on that scene.

l0stl0rd · Dec 7, 2021

Boil said:
From 48s to 1:18:34...?

yes I can check again… 16 cores.

l0stl0rd · Dec 7, 2021

hefeglass said:
I am ecstatic that this is already almost usable..and the improvement over cpu is almost 5x in this early release. I wasnt expecting RTX level of performance..we will see how the youtube benchmarkers react to the performance. For my personal use, its obviously not ready for primetime..but I am looking forward to being able to do quick renders on my machine before sending to my desktop for a full animation render. Will be the perfect workflow.
I also noticed that you can get away with very low render passes and use denoising and get really quick decent renders on the m1 gpu.
One scene that seemed to perform the same on the m1 max gpu and rtx 3070 was "junk shop" got 23 seconds on both machines after multiple runs. Interesting that optix doesnt have many gains on that scene.

yes I think I had about 44 sec on the junk shop with 16 cores, which makes sense if the was the 32 core version.

Boil · Dec 7, 2021

mi7chy said:
16.39s - 3060 70W mobile (OptiX Blender 3.0)
20.57s - reference 6900xt (HIP Blender 3.0)
29s - 2070 Super (OptiX)
31s - 3060 70W mobile (OptiX Blender 2.93)
48s - M1 Max 24GPU (Metal Blender 3.1 alpha source build)
51s - 2070 Super (CUDA)
2.04m - Mac Mini M1 (Metal Blender 3.1 alpha source build)
3:55.81m - AMD 5800H base clock no-boost and no-PBO overclock (CPU Blender 3.0)
5:51.06m - MBA M1 (CPU Blender 3.0)

l0stl0rd said:
1:18:34 M1Pro GPU 3.1 + patch
I wonder how much it will change when the patch gets more “final”.

Boil said:
From 48s to 1:18:34...?

l0stl0rd said:
yes I can check again… 16 cores.

My bad, I did not register that it was a M1 Pro SoC with the 16-core GPU...!

And then I was thinking, "Wow, 8 GPU cores make that much difference...?"

But then I realized the real difference might be the 200GB/s UMA on the M1 Pro versus the 400GB/s UMA on the M1 Max...?

l0stl0rd · Dec 7, 2021

Boil said:
My bad, I did not register that it was a M1 Pro SoC with the 16-core GPU...!

And then I was thinking, "Wow, 8 GPU cores make that much difference...?"

But then I realized the real difference might be the 200GB/s UMA on the M1 Pro & the 400GB/s UMA on the M1 Max...?

yes that is the question, seems to make less of a difference on the junk shop scene.

But is is still close, 16 to 24 is 1.5 times more cores.

So it should have been from 72 sec ( from the 48 sec) but took 78 sec.

Then again we can only really say for sure once that version is really out or at least beta.

Xiao_Xi · Dec 8, 2021

Blender rendering devs have published the meeting minutes of yesterday meeting, but it seems there is no relevant news regarding the Metal backend.

2021-12-7 Blender Rendering Meeting

Attendees Brecht Van Lommel (Blender) Kévin Dietrich (Blender) William Leeson (Blender) Thomas Dinges (Blender) Patrick Mours (NVIDIA) Roman Zulak (NVIDIA) Brian Savery (AMD) Michael Jones (Apple) Feng Xie (Facebook) Notes Cycles 3.0 has been released (Brecht). For 3.1 there are a few more...

devtalk.blender.org

jeanlain · Dec 8, 2021

Here's a comparison between M1 variants and a RTX 3060 (mobile I suppose), using redshift.

I expected the Apple GPUs to do better than that. The Geforce is almost twice faster than the M1 Max. Wasn't redshift supposed to be ported to Metal by Apple engineers?

It seems abundantly clear that the M1 Max is no match for the best discrete mobile GPUs.

jmho · Dec 8, 2021

Redshift is using hardware raytracing on Nvidia cards, so it's actually a fairly decent showing for the M1 Max to only take twice as long.

I was going to say a better comparison would be AMD cards, but it looks like Redshift doesn't support AMD cards at all (apart from on macOS via Metal)

l0stl0rd · Dec 8, 2021

jmho said:
Redshift is using hardware raytracing on Nvidia cards, so it's actually a fairly decent showing for the M1 Max to only take twice as long.

I was going to say a better comparison would be AMD cards, but it looks like Redshift doesn't support AMD cards at all (apart from on macOS via Metal)

Had a look in the redshift forums the M1 Max is about as a 5700 XT… which was to be expected (consider Tflops).

It would be equal to a 3060 month if it had rt cores but it does not.

An 1080 as the 1080 Ti is slightly faster.

jeanlain · Dec 8, 2021

jmho said:
Redshift is using hardware raytracing on Nvidia cards, so it's actually a fairly decent showing for the M1 Max to only take twice as long.

Would it be faster than the 3060 if the geforce did not use its specialised hardware? It is a lot slower than the 3060 while Apple says it should almost match an RTX 3080 mobile.

Apple should have added a footnote:
*not on compute tasks nor ray tracing, or anything except games leveraging TBDR (none of which exists on the Mac, except BG3) and workflows where unified memory can compensate for the M1 Max lower compute power (~10 TFLOPS vs 19 TFLOPS for the best mobile discrete GPUs)

It really looks like the "industry standard benchmark" they used for the comparison was GFXBench.

l0stl0rd · Dec 8, 2021

jeanlain said:
Would it be faster than the 3060 if the geforce did not use its specialised hardware? It is a lot slower than the 3060 while Apple says it should almost match an RTX 3080 mobile.

Apple should have added a footnote:
*not on compute tasks nor ray tracing, or anything except games leveraging TBDR (none of which exists on the Mac, except BG3) and in some workflows where unified memory can compensate its lower compute power (~10 TFLOPS vs 19 TFLOPS for the best mobile discrete GPUs)

It really looks like the "industry standard benchmark" they used for the comparison was GFXBench.

Yeah well apple and benchmarks, but don’t get me wrong it is still a nice performance for an integrated gpu.
They really need some raytracing acceleration however if they want to compete in rendering speed.

They way it looks at the moment even a 64 Core gpu would barely match a 3060 or the 6900xt

Still waiting for the final 3.1 blender to draw conclusions.

Xiao_Xi · Dec 8, 2021

jeanlain said:
It seems abundantly clear that the M1 Max is no match for the best discrete mobile GPUs.

It seems GPU performance is linearly proportional to consumption, and Apple couldn't find any trick to beat Nvidia in performance per watt.

treehuggerpro · Dec 8, 2021

I tried to evaluate where the M1s would sit against the Mac Pro graphics options with Redshift, because it's a new option in our Cad. The figure @l0stl0rd posted is correct in the Vultures benchmark, for the moment. But there's more to come here, which Maxon has indicated in the Redshift Forum, and as others have noted here. Once the bucket / block size of the benchmark is adjusted to suit the M1s available memory, they are expecting around a 30 percent bump in performance, which will pull that time down around the 7:00m mark.

mr_roboto · Dec 8, 2021

Xiao_Xi said:
It seems GPU performance is linearly proportional to consumption, and Apple couldn't find any trick to beat Nvidia in performance per watt.

On the contrary, their GPUs lead in perf/W by quite a bit. However, Apple doesn't want to design portable Macs around enough watts to compete with the hottest Nvidia GPUs.

For example: RTX 3060 offers 12.8 single precision shader TFLOPs at 170W TDP for the whole card (I think), M1 Max 10.4 at ~50W for the GPU part of the SoC plus (handwavy number alert but I have to guess at this figure) 10W for RAM.

So that's 12.8/170 = 0.075 TF/W versus 10.4/60 = 0.173 TF/W. Apple's winning TFLOPs/W by about 2.3x. It's not unexpected - Nvidia's on Samsung 8nm, which is not as good a process as TSMC 5nm. It's not just process tech either, Apple's had a very long string of extremely power efficient GPUs prior to M1. It's what you do when you're designing for phones and tablets.

The other complicating factor is that for graphics, with properly optimized native Metal software, Apple's GPU architecture makes much more efficient use of its available TFLOPs than Nvidia and AMD architecture GPUs.

Xiao_Xi · Dec 8, 2021

mr_roboto said:
So that's 12.8/170 = 0.075 TF/W versus 10.4/60 = 0.173 TF/W. Apple's winning TFLOPs/W by about 2.3x

What software can take advantage of it?

So far, Redshift (3D rendering ) and Apple's Tensorflow fork (deep learning) don't perform as good as those numbers.

I think Blender 3.2/3.3 will be a good benchmark to compare Nvidia, AMD and Apple GPUs as their engineers are working on optimizations for their GPUs.

diamond.g · Dec 9, 2021

mr_roboto said:
On the contrary, their GPUs lead in perf/W by quite a bit. However, Apple doesn't want to design portable Macs around enough watts to compete with the hottest Nvidia GPUs.

For example: RTX 3060 offers 12.8 single precision shader TFLOPs at 170W TDP for the whole card (I think), M1 Max 10.4 at ~50W for the GPU part of the SoC plus (handwavy number alert but I have to guess at this figure) 10W for RAM.

So that's 12.8/170 = 0.075 TF/W versus 10.4/60 = 0.173 TF/W. Apple's winning TFLOPs/W by about 2.3x. It's not unexpected - Nvidia's on Samsung 8nm, which is not as good a process as TSMC 5nm. It's not just process tech either, Apple's had a very long string of extremely power efficient GPUs prior to M1. It's what you do when you're designing for phones and tablets.

The other complicating factor is that for graphics, with properly optimized native Metal software, Apple's GPU architecture makes much more efficient use of its available TFLOPs than Nvidia and AMD architecture GPUs.

I wonder when we will get a breakdown of what Apples GPUs look like. AFAIK no one has ever detailed the GPU blocks on Apples hardware. Example of Turing below.

hefeglass · Dec 10, 2021

have you guys seen this review of the new GPUs from nvidia?

Xiao_Xi · Dec 13, 2021

It's official. Cycles has a Metal GPU backend!

Cycles Apple Metal device feedback

Cycles now has a Metal GPU backend, contributed by Apple. Metal GPU rendering is currently supported on Apple Silicon computers running macOS 12.2 Apple computers with AMD graphics card running macOS 12.3 The implementation is in an early state. Performance optimizations and support for Intel...

devtalk.blender.org

metapunk2077fail · Dec 13, 2021

jeanlain said:
Here's a comparison between M1 variants and a RTX 3060 (mobile I suppose), using redshift.

I expected the Apple GPUs to do better than that. The Geforce is almost twice faster than the M1 Max. Wasn't redshift supposed to be ported to Metal by Apple engineers?

It seems abundantly clear that the M1 Max is no match for the best discrete mobile GPUs.

He didn't bench them on battery like others did. That's when the RTX performance crashes through the floor.

3D Rendering on Apple Silicon, CPU&GPU

macrumors 6502a

macrumors 6502a

macrumors 6502

macrumors regular

macrumors 6502

macrumors 68040

macrumors 6502a

macrumors 6502

macrumors 6502

macrumors 68040

macrumors 6502

macrumors 68000

macrumors 68020

macrumors 6502a

macrumors 6502

Attachments

macrumors 68020

macrumors 6502

macrumors 68000

macrumors regular

macrumors 6502a

macrumors 68000

macrumors G5

macrumors 6502a

macrumors 68000

macrumors 6502a

Our Staff