3D Rendering on Apple Silicon, CPU&GPU

Andropov · Mar 15, 2022

terminator-jq said:
Yep not optimized yet. In other GPU based task and benchmarks, the M1 Max reaches 3070m and in some rare cases even 3080m levels of performance.

GPU rendering has been almost exclusively developed for Nvidia GPUs for the past 5+ years so it’s not surprising that Apple Silicon (and AMD for that matter) are falling behind and need extra development to show their true performance.

If Apple was already getting twice the level of rendering performance before the official Blender 3.1 release then we should be seeing something closer to 3070m performance once the optimizations are built in and especially if the neural network is used for de-noising.

I don't know how much render time is spent de-noising the image, but unless it's a big percentage of the total frame rendering time, I think there are other things that are going to speed it more.

From the Apple engineer's post on the Blender forum, I take that white it's now running using Metal, there are significant architecture-dependent optimizations to be made. He mentions that it's already somewhat optimized because it's not copying data back and forth due to the Unified Memory Architecture, but that comes for 'free' with the Metal implementation. There are other (structural) changes that should improve the GPU performance further, without needing to offload work to additional hardware.

For example, and since the post explicitly mentions the rendering path still being closely tied to the Nvidia/CUDA model and not optimized for Apple's architecture, maybe they haven't started trying to reorder and merge the rendering passes that only need access to tile memory. The TBDR of Apple's GPUs means that some rendering passes that needed to be separate in IMR GPUs can now be merged into a single pass, greatly reducing pipeline change overhead and providing much faster memory access, specially if they can fit in 'memoryless' render targets that would need to be copied to VRAM in IMR GPUs but can reside only in tile memory on TBDR GPUs. That IMHO could be a huge performance boost, but it takes a lot of time and deep knowledge of the whole rendering process.

iPadified · Mar 15, 2022

Andropov said:
I don't know how much render time is spent de-noising the image, but unless it's a big percentage of the total frame rendering time, I think there are other things that are going to speed it more.

From the Apple engineer's post on the Blender forum, I take that white it's now running using Metal, there are significant architecture-dependent optimizations to be made. He mentions that it's already somewhat optimized because it's not copying data back and forth due to the Unified Memory Architecture, but that comes for 'free' with the Metal implementation. There are other (structural) changes that should improve the GPU performance further, without needing to offload work to additional hardware.

For example, and since the post explicitly mentions the rendering path still being closely tied to the Nvidia/CUDA model and not optimized for Apple's architecture, maybe they haven't started trying to reorder and merge the rendering passes that only need access to tile memory. The TBDR of Apple's GPUs means that some rendering passes that needed to be separate in IMR GPUs can now be merged into a single pass, greatly reducing pipeline change overhead and providing much faster memory access, specially if they can fit in 'memoryless' render targets that would need to be copied to VRAM in IMR GPUs but can reside only in tile memory on TBDR GPUs. That IMHO could be a huge performance boost, but it takes a lot of time and deep knowledge of the whole rendering process.

For my short renders, de-noising takes 3-5 seconds irrespective of the render time. See post above. To expect a optimised AMD of M1 Metal rendering already now is unrealistic. Cycles was built with CUDA in mind so no wonder it performs better on NVIDIA cards. Looking forward to 3.2.

Comparing GPUs is nearly impossible as the performance is so strongly linked to the software implementation.

leman · Mar 15, 2022

iPadified said:
Comparing GPUs is nearly impossible as the performance is so strongly linked to the software implementation.

Yep. And that, at the end of the day is why common GPU APIs just don't work. What is the utility of a common API if different vendors require different algorithmic approaches anyway?

mi7chy · Mar 15, 2022

sunny5 said:
5 months isn't really enough and even Apple developer admitted that they need to optimize it continuously. Beside, many 3D software were optimized for several years you know.

If you know it takes several years then why do you keep asking why it's not optimized? Just wait a few years. Might actually be more than a few years considering render times have gotten slower on M1 over time.

MBA M1 Blender BMW
2m48s 12/2021 3.1 alpha
2m59s 3/2022 3.1 release

Meanwhile, 3060 is still consistently >10x faster than MBA M1, 3x faster than M1 Max 32GPU and expected to be 2x faster than M1 Ultra 64GPU.

70W mobile 3060
16s 12/2021 3.0 release
16s 3/2022 3.1 release

iPadified · Mar 15, 2022

mi7chy said:
If you know it takes several years then why do you keep asking why it's not optimized? Just wait a few years. Might actually be more than a few years considering render times have gotten slower on M1 over time.

MBA M1 Blender BMW
2m48s 12/2021 3.1 alpha
2m59s 3/2022 3.1 release

Meanwhile, 3060 is still consistently >10x faster than MBA M1, 3x faster than M1 Max 32GPU and expected to be 2x faster than M1 Ultra 64GPU.

70W mobile 3060
16s 12/2021 3.0 release
16s 3/2022 3.1 release

I down loaded the BMW scene from blender.org. Does not seem that the BMW scene supports Metal, does it? So you are only running CPU on your M1 or non metal implementation of GPU?

I my private test file, the alpha and release 3.1 gave very similar results using Metal on M1 Pro. The Metal was significantly faster using both CPU and GPU on M1 Pro.

mi7chy · Mar 15, 2022

iPadified said:
I down loaded the BMW scene from blender.org. Does not seem that the BMW scene supports Metal, does it?

BMW does support GPU since with just CPU it's even slower at 5m58s.

jmho · Mar 15, 2022

Rendering the BMW on my M1 Max takes about 50 seconds, and the GPU utilisation is about 31%.

Looking forward to them getting that closer to 100%.

Xiao_Xi · Mar 15, 2022

iPadified said:
M1 Pro GPU+CPU Metal 1:24 (silent of course)

iMac 8-clore i7 (10700K) and 5700 GPU
CPU only: Loud fan 2:50
GPU only: Silent(!) 0:54
GPU+CPU: Load fan 0:50

Looks like the 5700 is rather competitive compared to the M1 pro

According to Blender benchmarks:
5700 XT - 750
M1 Max (GPU) - 700
M1 Pro (GPU) - 360
M1 (GPU) - 200
M1 Max - 190
M1 Pro - 175
M1 - 115

Blender - Open Data

Blender Open Data is a platform to collect, display and query the results of hardware and software performance tests - provided by the public.

opendata.blender.org

sunny5 · Mar 15, 2022

mi7chy said:
If you know it takes several years then why do you keep asking why it's not optimized? Just wait a few years. Might actually be more than a few years considering render times have gotten slower on M1 over time.

MBA M1 Blender BMW
2m48s 12/2021 3.1 alpha
2m59s 3/2022 3.1 release

Meanwhile, 3060 is still consistently >10x faster than MBA M1, 3x faster than M1 Max 32GPU and expected to be 2x faster than M1 Ultra 64GPU.

70W mobile 3060
16s 12/2021 3.0 release
16s 3/2022 3.1 release

And you claimed that they already spent 5 months. "so how much longer does it need to be optimized?" Now you are saying that it will take several years? How ironic.

iPadified · Mar 15, 2022

mi7chy said:
BMW does support GPU since with just CPU it's even slower at 5m58s.

Now it works. Could not load the scene directly by double clicking. The metal tab was not visible in that case. Loading from blender worked better.
59s on 5700
2:43 CPU 10700k

At last the GPU is worth anything in Blender.

mi7chy · Mar 15, 2022

iPadified said:
59s on 5700

That's actually good for a nearly 3 year old GPU. For comparison, $579 AMD RX6800 takes 30s on BMW so 2x faster.

diamond.g · Mar 15, 2022

mi7chy said:
That's actually good for a nearly 3 year old GPU. For comparison, $579 AMD RX6800 takes 30s on BMW so 2x faster.

A 6700(xt) would be a better comparison (since the 6800 has more CU's than the 5700).

JimmyjamesEU · Mar 15, 2022

mi7chy said:
If you know it takes several years then why do you keep asking why it's not optimized? Just wait a few years. Might actually be more than a few years considering render times have gotten slower on M1 over time.

MBA M1 Blender BMW
2m48s 12/2021 3.1 alpha
2m59s 3/2022 3.1 release

Meanwhile, 3060 is still consistently >10x faster than MBA M1, 3x faster than M1 Max 32GPU and expected to be 2x faster than M1 Ultra 64GPU.

70W mobile 3060
16s 12/2021 3.0 release
16s 3/2022 3.1 release

Wow, sorry to see how little improvement there has been for the 3060 over the course of a year. Worrying times for Nvidia. Truly x86 and a dgpu separated by an anemic pci bus is a legacy platform.

leman · Mar 15, 2022

mi7chy said:
That's actually good for a nearly 3 year old GPU. For comparison, $579 AMD RX6800 takes 30s on BMW so 2x faster.

Where did you see an RX6800 for $579? ?

mi7chy · Mar 15, 2022

jmho said:
Rendering the BMW on my M1 Max takes about 50 seconds

Is that M1 Max 32GPU? Previously recorded result was 43 seconds so if that's the case it's gotten slower from alpha to release. Wonder what changed to drop performance.

JimmyjamesEU · Mar 15, 2022

mi7chy said:
Is that M1 Max 32GPU? Previously recorded result was 43 seconds so if that's the case it's gotten slower from alpha to release. Wonder what changed to drop performance.

Too much development probably. They never have spent 5 months doing it. 3 1/2 is the optimum amount of dev work on something like Blender.

sirio76 · Mar 15, 2022

mi7chy said:
If you know it takes several years then why do you keep asking why it's not optimized? Just wait a few years. Might actually be more than a few years considering render times have gotten slower on M1 over time.

MBA M1 Blender BMW
2m48s 12/2021 3.1 alpha
2m59s 3/2022 3.1 release

Meanwhile, 3060 is still consistently >10x faster than MBA M1, 3x faster than M1 Max 32GPU and expected to be 2x faster than M1 Ultra 64GPU.

70W mobile 3060
16s 12/2021 3.0 release
16s 3/2022 3.1 release

You keep posting this stuff like Blender was the benchmark and the only renderer available, as a matter of facts is the slowest GPU engine and the less optimized for M1. In Redshift for example the Ultra should be near a 3080, but silent, with tons more memory, and using a fraction of the power.

sunny5 · Mar 15, 2022

sirio76 said:
You keep posting this stuff like Blender was the benchmark and the only renderer available, as a matter of facts is the slowest GPU engine and the less optimized for M1. In Redshift for example the Ultra should be near a 3080, but silent, with tons more memory, and using a fraction of the power.

Do you have any Redshift results with m1 Max?

diamond.g · Mar 15, 2022

leman said:
Where did you see an RX6800 for $579? ?

Every Thursday AMD sells them for MSRP…

terminator-jq · Mar 15, 2022

sunny5 said:
And you claimed that they already spent 5 months. "so how much longer does it need to be optimized?" Now you are saying that it will take several years? How ironic.

You gotta realize Nvidia basically owned the entire GPU rendering market from the start…

So for the last 5+ years, basically all GPU rending engines were developed exclusively for nvidia cards. AMD didn’t really push into the GPU rendering department until the release of their Pro Render engine and even then, most GPU engines were still being only developed for CUDA. Having Apple directly help with development should speed things up but it’s still going to take longer than just 5 months to get results closer to what the M1 Max is really capable of.

jmho · Mar 15, 2022

mi7chy said:
Is that M1 Max 32GPU? Previously recorded result was 43 seconds so if that's the case it's gotten slower from alpha to release. Wonder what changed to drop performance.

Yeah, it's the 32 core. It's to be expected - the first pass is going to be just getting stuff to work, then the next pass is going to be getting things to be correct and have all the expected features, and yes that will probably increase render times (which is where we are now), and then the final pass will be optimizing while still maintaining correctness.

It's the sane and logical thing to do. Nobody wants a fast, incorrect renderer that tries to get more "correct" over time. They want a correct renderer that gets faster over time while maintaining its correctness, and thus usability, from day 1.

iPadified · Mar 16, 2022

mi7chy said:
That's actually good for a nearly 3 year old GPU. For comparison, $579 AMD RX6800 takes 30s on BMW so 2x faster.

It is surprisingly good for an iMac. If I were doing 3D full time, I would go PC due to software and NVIDIA. For general purpose and the odd 3D visualisation the M1 pro and 2020 iMac are good enough and the OS is much more pleasant. My workflow is typically CAD (Fusion 360) and visualisation of structures using Blender (Modo was too cumbersome payment structure but so easy to get nice results).

I am happy that Apple takes 3D somewhat seriously. I am surprised they had completely missed the side of creativity and got stuck with video, photography and music. Looking forward to hardware RT in MX at some point.

Is Intel/AMD CPU had cheaper and more core rich CPU 10 years ago, NVIDIA would have had larger trouble to enter the market of compute intensive tasks.

jujoje · Mar 17, 2022

Going to ask a stupid question, so totally expecting a stupid answer, but gave Blender 3.2 a go on 12.3 with my AMD Vega 64 and gave the Alabs scene a go and discovered two things:

1. USD kinda works in blender 3.2 but they haven't implemented the shaders (USD shade), so everything rendered as grey shaded. Small steps but getting there.

2. Performance was laughably bad; turned on Cycles -> Metal -> GPU and I can render the scene with full textures and lights faster on CPU in Houdini than I can render what is effectively an AO pass in Cycles.

Is cycles metal support is limited to more modern GPU's (the vega64 is definitely a bit crusty these days)? There was some talk about which GPUs were going to be supported at one stage, but haven't found a definitive answer...

jujoje · Mar 17, 2022

iPadified said:
I am happy that Apple takes 3D somewhat seriously. I am surprised they had completely missed the side of creativity and got stuck with video, photography and music. Looking forward to hardware RT in MX at some point.

It seems a pretty big omission on their part; I guess the current push seems to be hoping to remedy this in time for AR to become a thing (you've got to create the assets somewhere).

Pressure · Mar 17, 2022

sirio76 said:
You keep posting this stuff like Blender was the benchmark and the only renderer available, as a matter of facts is the slowest GPU engine and the less optimized for M1. In Redshift for example the Ultra should be near a 3080, but silent, with tons more memory, and using a fraction of the power.

And remember that the default benchmark is using 128x128 blocks which isn’t optimal for M1.

The developer Panos Zompolas mentions it here.

Please note that the benchmark is using 128x128 blocks which are not ideal to the M1 chips (due to their pretty high latencies). So the benchmark (as it stands today) might not show you particularly good scaling. The same is true for the M1 Pro/Max - but to a lesser extent.

As a stopgap solution I might look at adding a blocksize parameter to the benchmark. Or force it to 256 or larger if apple silicon is detected.

The scores from the Moana Island scene rendering with Redshift is very promising.

2 x 1080 Ti = 77m
2 x 2080 Ti = 34m:17s
1 x 3090 = 21m:45s
2 x 3090 = 12m:44s
M1 Max (64GB) = 28m:27s

3D Rendering on Apple Silicon, CPU&GPU

macrumors 6502a

macrumors 68020

macrumors Core

Suspended

macrumors 68020

Suspended

macrumors 6502a

macrumors 68000

Suspended

macrumors 68020

Suspended

macrumors G5

Suspended

macrumors Core

Suspended

Suspended

macrumors 6502a

Suspended

macrumors G5

macrumors 6502a

macrumors 6502a

macrumors 68020

macrumors 6502

macrumors 6502

macrumors 603

Our Staff