3D Rendering on Apple Silicon, CPU&GPU

iPadified · Nov 15, 2023

Xiao_Xi said:
Early benchmark scores put the 40-core M3 Max at similar performance to the RTX 4070 laptop.

Device Name Blender Version Median Score Number of Benchmarks
NVIDIA GeForce RTX 4070 Laptop GPU 4.0.0 3449.21 5
Apple M3 Max (GPU - 40 cores) 4.0.0 3417.29 1
NVIDIA GeForce RTX 4060 Laptop GPU 4.0.0 3245.13 3
NVIDIA GeForce RTX 3070 Laptop GPU 4.0.0 3102.84 2
Apple M3 Max (GPU - 30 cores) 4.0.0 2942.11 1

Blender - Open Data

Blender Open Data is a platform to collect, display and query the results of hardware and software performance tests - provided by the public.

opendata.blender.org

Is that fast enough as a workstation before sending it to a dedicated render machine? With my humble need I cannot tell.

I usually visualise research result and the latest one gave publication grade result after 20s render and denoise on a M1 Pro with 3.6 Blender. Images looks as nice as Modos internal renderer mainly due to the good denoise.

leman · Nov 15, 2023

iPadified said:
Is that fast enough as a workstation before sending it to a dedicated render machine? With my humble need I cannot tell.

I usually visualise research result and the latest one gave publication grade result after 20s render and denoise on a M1 Pro with 3.6 Blender. Images looks as nice as Modos internal renderer mainly due to the good denoise.

I am not qualified to say whether it’s sufficient or not, but M3 Pro should be comparable to M1 Ultra and 3x faster than M1 Pro in these tasks.

Quite an achievement when one thinks that it only has 12% more shader cores and runs 10% higher clock.

Rafterman · Nov 15, 2023

For raw graphics performance, the 40 core M3 Max ran at 14.2 teraflops. In comparison, the 38 core M2 Max ran at 13.5 tflops. An X-Box Series X ran about 12.5 tflops.

It seems the increase from the M2 to the M3, with 2 extra graphics cores, was negligible.

Xiao_Xi · Nov 15, 2023

l0stl0rd said:
True there is some strange stuff going on with Blender on Mac anyway, why does the same scene on Mac use more memory then Windows Ram and Vram combined.

One of the Apple developers working on Blender has posted:

Yes, the more GPU cores (volume of work that can be done in parallel) the more memory needs to be allocated to allow that to happen.

Additionally to that, differences in VRAM usage on Apple Silicon machines is expected, and is based on the amount of memory the machine has, along with the generation of GPU. There’s an algorithm which looks to leverage more UMA to increase performance when the machine has it available. This algorithm could be refactored to consider the memory consumed by the asset itself, where an out of memory would otherwise occur.

Cycles Apple Metal device feedback

Yes, the more GPU cores (volume of work that can be done in parallel) the more memory needs to be allocated to allow that to happen. Additionally to that, differences in VRAM usage on Apple Silicon machines is expected, and is based on the amount of memory the machine has, along with the...

devtalk.blender.org

Gloor · Nov 15, 2023

How do you justify M3 Pro comparable to M1 Ultra?

That is simply not realistic. Or are you talking only about single core CPU ? As M1 Ultra should do renders faster than M3 Pro when 'cpu&gpu' render is in action. Maybe M5 Pro can beat M1 Ultra
but M3 - not likely

leman said:
I am not qualified to say whether it’s sufficient or not, but M3 Pro should be comparable to M1 Ultra and 3x faster than M1 Pro in these tasks.

Quite an achievement when one thinks that it only has 12% more shader cores and runs 10% higher clock.

leman · Nov 15, 2023

Gloor said:
How do you justify M3 Pro comparable to M1 Ultra?

That is simply not realistic. Or are you talking only about single core CPU ? As M1 Ultra should do renders faster than M3 Pro when 'cpu&gpu' render is in action. Maybe M5 Pro can beat M1 Ultra
but M3 - not likely

I am talking about the GPU renders. It is possible that M1 Ultra will be slightly faster than M3 Pro in CPU+GPau, but not by much. In Blender CPU M1 Ultra leads M3 Pro by 160 points. If we add CPU and GPU scores together, m1 Ultra is just 10% faster. Quite a leap in 18 months….

leman · Nov 15, 2023

Rafterman said:
For raw graphics performance, the 40 core M3 Max ran at 14.2 teraflops. In comparison, the 38 core M2 Max ran at 13.5 tflops. An X-Box Series X ran about 12.5 tflops.

It seems the increase from the M2 to the M3, with 2 extra graphics cores, was negligible.

It seems, but M3 also massively redesigned the GPU core. They now can do FP32+FP16 or FP32+INT simultaneously, which in some cases can double performance. Also, the new resource allocation system (Dynamic Cache) can massively improve occupancy, making better use of the shader cores. That’s why we saw 50-60% improvement in Blender even with RT disabled.

GPU transition between M1/M2 and M3 is roughly comparable with Nvidia’s transition from Pascal to Turing.

Macintosh IIcx · Nov 15, 2023

leman said:
They now can do FP32+FP16 or FP32+INT simultaneously, which in some cases can double performance.

Wait, where did you get this information from?

If true, it definitely sounds like a “Turing” upgrade indeed.

altaic · Nov 15, 2023

Macintosh IIcx said:
Wait, where did you get this information from?

If true, it definitely sounds like a “Turing” upgrade indeed.

That’s from the Apple developer tech talks that were released a few days ago.

leman · Nov 15, 2023

Macintosh IIcx said:
Wait, where did you get this information from?

If true, it definitely sounds like a “Turing” upgrade indeed.

Explore GPU advancements in M3 and A17 Pro - Tech Talks - Videos - Apple Developer

Learn how Dynamic Caching, the next-generation shader core, hardware-accelerated ray tracing, and hardware-accelerated mesh shading of...

developer.apple.com

I still didn’t get around to writing a micro benchmark for testing this, but it’s on my todo list.

name99 · Nov 15, 2023

Xiao_Xi said:
Early benchmark scores put the 40-core M3 Max at similar performance to the RTX 4070 laptop.

Device Name Blender Version Median Score Number of Benchmarks
NVIDIA GeForce RTX 4070 Laptop GPU 4.0.0 3449.21 5
Apple M3 Max (GPU - 40 cores) 4.0.0 3417.29 1
NVIDIA GeForce RTX 4060 Laptop GPU 4.0.0 3245.13 3
NVIDIA GeForce RTX 3070 Laptop GPU 4.0.0 3102.84 2
Apple M3 Max (GPU - 30 cores) 4.0.0 2942.11 1

Blender - Open Data

Blender Open Data is a platform to collect, display and query the results of hardware and software performance tests - provided by the public.

opendata.blender.org

In terms of features what we know about nVidia is that
(1) we get basic ray tracing (traversal of the BVH and box/triangle tests) in Turing 2018

(2) we get support for motion blur (allow nodes of the BVH to move by small [linear in time] amounts and test against possibly moved node) in Ampere 2020

(3) we get support for opacity maps (fast way to fake handle ray tracing complex geometry like leaves) in Ada 2023
(4) we get support for facet maps (a different way to "fake" complex micro-geometry – less faking this way!)
(5) we get support for SER (shader execution reordering; ie handling ray divergence)

Apple certainly have support for 1 and 5.
The Apple ray tracing API added support for (2) years ago. I would guess Apple HW supports it, but???
Apple may or may not have support for (3) and (4). [Both would require some API changes, I think, and so even if the HW support is present, we may not know about it until WWDC. Experts correct me if I'm wrong.]

If we assume that Apple is (in some way) using IMG's Ray Tracing Hardware (bought and changed the IP or whatever) then Apple also creates the BVH in hardware, whereas nVidia still does this "in software" (I think that means on the CPU).
Perhaps the Ada Ray Tracing updates also gave nVidia hardware BVH creation?

name99 · Nov 15, 2023

Rafterman said:
For raw graphics performance, the 40 core M3 Max ran at 14.2 teraflops. In comparison, the 38 core M2 Max ran at 13.5 tflops. An X-Box Series X ran about 12.5 tflops.

It seems the increase from the M2 to the M3, with 2 extra graphics cores, was negligible.

That's as dumb as saying that a 6GHz x86 chip MUST BE twice as fast as a 3GHz Apple chip. It has twice the GHZ's!

If you think number of FMAs determines the overall performance of a GPU, you know nothing about the field.

name99 · Nov 15, 2023

Macintosh IIcx said:
Wait, where did you get this information from?

If true, it definitely sounds like a “Turing” upgrade indeed.

It was stated in the recent Apple Tech Talk video about the new GPU architecture.

komuh · Nov 15, 2023

leman said:
It seems, but M3 also massively redesigned the GPU core. They now can do FP32+FP16 or FP32+INT simultaneously, which in some cases can double performance. Also, the new resource allocation system (Dynamic Cache) can massively improve occupancy, making better use of the shader cores. That’s why we saw 50-60% improvement in Blender even with RT disabled.

GPU transition between M1/M2 and M3 is roughly comparable with Nvidia’s transition from Pascal to Turing.

Im pretty sure FP32/FP16+Int was case even on M1 already, not sure about FP16 + FP32.

Never mind its only FP32/16 + INT in specific cases not universal so indeed it can be a huge stuff especially for quantisation.

leman · Nov 15, 2023

komuh said:
Im pretty sure FP32+Int was case even on M1 already, not sure about FP16 + FP32

I don’t think so but I’ll have to test it out…

name99 · Nov 15, 2023

komuh said:
Im pretty sure FP32/FP16+Int was case even on M1 already, not sure about FP16 + FP32.

Never mind its only FP32/16 + INT in specific cases not universal so indeed it can be a huge stuff especially for quantisation.

How? There is no superscalar dispatch in any of the current leading edge designs.

nV does it by their time-multiplexing business (32-wide warps, 16-wide hardware, takes two cycles to dispatch a warp, so second cycle you can dispatch to a different execution pipe).
AMD does it by aggressive use of SIMD2 instructions (which can of course only give you FP16+FP16 or FP32+FP32)

Apple does it by dispatching warps from two independent tasks. This is feasible for them in a way that it's not for nV or AMD because they have so much larger a pool of available warps & threadblocks for scheduling. They already had about 1.5x as many in M2 courtesy of larger register file; with dynamic caching even more are usually available.

thunng8 · Nov 15, 2023

altaic said:
I spent a few hours earlier looking for exactly that, and I didn’t see anything in bug reports, devtalk, or benchmark commits. I’m not super familiar with their code base, but nothing jumped out at me WRT “rebalancing” or “increased rendering quality.” Looks like a regression to me; this release touched a lot of things, so that’s my current opinion.

@thunng8 Inquiring minds would like to know the basis of your claim. TIA 🙂

I did read it a while ago, but cannot find the link anymore. In any case, there have been lots of changes to the rendering pipeline, so that's what we have now.

Xiao_Xi · Nov 16, 2023

Gloor said:
How do you justify M3 Pro comparable to M1 Ultra?

As far as GPU-only rendering is concerned, the 18-core M3 Pro will most likely sit between the 64-core M1 Ultra and the 48-core M1 Ultra. Unfortunately, there are no scores for either of these M1 Ultra, so we'll have to wait a bit longer.

Device Name	Blender Version	Median Score	Number of Benchmarks
Apple M3 Max (GPU - 40 cores)	4.0.0	3417.29	9
Apple M2 Ultra (GPU - 76 cores)	4.0.0	3263.2	2
Apple M3 Max (GPU - 30 cores)	4.0.0	2851.21	5
Apple M3 Pro (GPU - 18 cores)	4.0.0	1510.37	7
Apple M3 Pro (GPU - 14 cores)	4.0.0	1436.99	2

Blender - Open Data

Blender Open Data is a platform to collect, display and query the results of hardware and software performance tests - provided by the public.

opendata.blender.org

Xiao_Xi · Nov 16, 2023

Tom's Hardware has compared rendering in Blender 4.0 and 3.6 with some Nvidia, AMD and Intel GPUs and Blender 4.0 takes longer to render than Blender 3.6 on all but one GPU.

	Blender v3.6.0 Geomean	Blender v4.0.0 Geomean	Percent Change
RTX 4090	4213.9	3685.5	-12.5%
RTX 4080	3119.6	2790.2	-10.6%
RTX 4070	1943.2	1743.5	-10.3%
RTX 4060	1160.3	1036.1	-10.7%
RTX 3090 Ti	2228.0	1943.2	-12.8%
RX 7900 XTX	1252.9	1260.7	0.6%
RX 7900 XT	1144.1	1094.9	-4.3%
RX 7800 XT	752.6	720.8	-4.2%
RX 7600	422.5	394.3	-6.7%
Arc A770 16GB	696.5	679.0	-2.5%
Arc A750	693.7	672.7	-3.0%

Blender 4.0 Released and Tested: New Features, More Demanding

Testing the latest open-source 3D rendering application.

www.tomshardware.com

Adult80HD · Nov 16, 2023

These are the scores from my M2 Ultra and 16" M3 Max; both with 128GB of RAM.

Screenshot 2023-11-16 at 12.54.32 PM.png

Adult80HD · Nov 16, 2023

Xiao_Xi said:
Tom's Hardware has compared rendering in Blender 4.0 and 3.6 with some Nvidia, AMD and Intel GPUs and Blender 4.0 takes longer to render than Blender 3.6 on all but one GPU.

Blender v3.6.0 Geomean Blender v4.0.0 Geomean Percent Change
RTX 4090 4213.9 3685.5 -12.5%
RTX 4080 3119.6 2790.2 -10.6%
RTX 4070 1943.2 1743.5 -10.3%
RTX 4060 1160.3 1036.1 -10.7%
RTX 3090 Ti 2228.0 1943.2 -12.8%
RX 7900 XTX 1252.9 1260.7 0.6%
RX 7900 XT 1144.1 1094.9 -4.3%
RX 7800 XT 752.6 720.8 -4.2%
RX 7600 422.5 394.3 -6.7%
Arc A770 16GB 696.5 679.0 -2.5%
Arc A750 693.7 672.7 -3.0%

Blender 4.0 Released and Tested: New Features, More Demanding

Testing the latest open-source 3D rendering application.

www.tomshardware.com

Interestingly the results on Blenders page online are MUCH higher for the 4090, and when I tested my rendering machine it got the same range as Tom's Hardware is showing here. Then I just updated the driver and it suddenly jumped into the 11,000 range like most of the ones online on Blender's page. Must have been some big change in the driver for that.

Standard · Nov 16, 2023

Hey folks. My new loaded M3 Max will be arriving soon! Will be doing extensive character/creature work for film/games and will document my experience. Previously, the M2 was absolutely great, and ran circles around my old workstation. Looking forward to seeing the extra horse power this will give, and hopefully my findings will be helpful to other artists who wish to use a Mac for specifically this type of work.

Gloor · Nov 16, 2023

Maya?

Standard said:
Hey folks. My new loaded M3 Max will be arriving soon! Will be doing extensive character/creature work for film/games and will document my experience. Previously, the M2 was absolutely great, and ran circles around my old workstation. Looking forward to seeing the extra horse power this will give, and hopefully my findings will be helpful to other artists who wish to use a Mac for specifically this type of work.

Standard · Nov 16, 2023

Gloor said:
Maya?

Yes, Maya with Arnold, Houdini with Karma, and Marmoset with RTX. Waiting on Unreal to allow alembic hairs I’ll likely also dive into Redshift now that this new hardware is here. Will be covering lots of displacement, udims, XYZ, grooming, etc.

Gloor · Nov 16, 2023

Perfect, I'm curious about maya. Thats what I do most of my time so wonder how good the performance is there.

Standard said:
Yes, Maya with Arnold, Houdini with Karma, and Marmoset with RTX. Waiting on Unreal to allow alembic hairs I’ll likely also dive into Redshift now that this new hardware is here. Will be covering lots of displacement, udims, XYZ, grooming, etc.

3D Rendering on Apple Silicon, CPU&GPU

macrumors 68020

macrumors Core

macrumors 604

macrumors 68000

macrumors 65816

macrumors Core

macrumors Core

macrumors 6502a

macrumors 6502a

macrumors Core

macrumors 68030

macrumors 68030

macrumors 68030

Suspended

macrumors Core

macrumors 68030

macrumors 65816

macrumors 68000

macrumors 68000

macrumors 6502a

macrumors 6502a

macrumors 6502

macrumors 65816

macrumors 6502

macrumors 65816

Our Staff