Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

iPadified

macrumors 68020
Apr 25, 2017
2,014
2,257
Early benchmark scores put the 40-core M3 Max at similar performance to the RTX 4070 laptop.

Device NameBlender VersionMedian ScoreNumber of Benchmarks
NVIDIA GeForce RTX 4070 Laptop GPU4.0.03449.215
Apple M3 Max (GPU - 40 cores)4.0.03417.291
NVIDIA GeForce RTX 4060 Laptop GPU4.0.03245.133
NVIDIA GeForce RTX 3070 Laptop GPU4.0.03102.842
Apple M3 Max (GPU - 30 cores)4.0.02942.111
Is that fast enough as a workstation before sending it to a dedicated render machine? With my humble need I cannot tell.

I usually visualise research result and the latest one gave publication grade result after 20s render and denoise on a M1 Pro with 3.6 Blender. Images looks as nice as Modos internal renderer mainly due to the good denoise.
 

leman

macrumors Core
Oct 14, 2008
19,520
19,670
Is that fast enough as a workstation before sending it to a dedicated render machine? With my humble need I cannot tell.

I usually visualise research result and the latest one gave publication grade result after 20s render and denoise on a M1 Pro with 3.6 Blender. Images looks as nice as Modos internal renderer mainly due to the good denoise.

I am not qualified to say whether it’s sufficient or not, but M3 Pro should be comparable to M1 Ultra and 3x faster than M1 Pro in these tasks.

Quite an achievement when one thinks that it only has 12% more shader cores and runs 10% higher clock.
 
  • Like
Reactions: ikir

Rafterman

Contributor
Apr 23, 2010
7,267
8,809
For raw graphics performance, the 40 core M3 Max ran at 14.2 teraflops. In comparison, the 38 core M2 Max ran at 13.5 tflops. An X-Box Series X ran about 12.5 tflops.

It seems the increase from the M2 to the M3, with 2 extra graphics cores, was negligible.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
True there is some strange stuff going on with Blender on Mac anyway, why does the same scene on Mac use more memory then Windows Ram and Vram combined.
One of the Apple developers working on Blender has posted:
Yes, the more GPU cores (volume of work that can be done in parallel) the more memory needs to be allocated to allow that to happen.

Additionally to that, differences in VRAM usage on Apple Silicon machines is expected, and is based on the amount of memory the machine has, along with the generation of GPU. There’s an algorithm which looks to leverage more UMA to increase performance when the machine has it available. This algorithm could be refactored to consider the memory consumed by the asset itself, where an out of memory would otherwise occur.
 
  • Like
Reactions: ikir

Gloor

macrumors 65816
Apr 19, 2007
1,025
733
How do you justify M3 Pro comparable to M1 Ultra?

That is simply not realistic. Or are you talking only about single core CPU ? As M1 Ultra should do renders faster than M3 Pro when 'cpu&gpu' render is in action. Maybe M5 Pro can beat M1 Ultra
but M3 - not likely
I am not qualified to say whether it’s sufficient or not, but M3 Pro should be comparable to M1 Ultra and 3x faster than M1 Pro in these tasks.

Quite an achievement when one thinks that it only has 12% more shader cores and runs 10% higher clock.
 

leman

macrumors Core
Oct 14, 2008
19,520
19,670
How do you justify M3 Pro comparable to M1 Ultra?

That is simply not realistic. Or are you talking only about single core CPU ? As M1 Ultra should do renders faster than M3 Pro when 'cpu&gpu' render is in action. Maybe M5 Pro can beat M1 Ultra
but M3 - not likely

I am talking about the GPU renders. It is possible that M1 Ultra will be slightly faster than M3 Pro in CPU+GPau, but not by much. In Blender CPU M1 Ultra leads M3 Pro by 160 points. If we add CPU and GPU scores together, m1 Ultra is just 10% faster. Quite a leap in 18 months….
 

leman

macrumors Core
Oct 14, 2008
19,520
19,670
For raw graphics performance, the 40 core M3 Max ran at 14.2 teraflops. In comparison, the 38 core M2 Max ran at 13.5 tflops. An X-Box Series X ran about 12.5 tflops.

It seems the increase from the M2 to the M3, with 2 extra graphics cores, was negligible.

It seems, but M3 also massively redesigned the GPU core. They now can do FP32+FP16 or FP32+INT simultaneously, which in some cases can double performance. Also, the new resource allocation system (Dynamic Cache) can massively improve occupancy, making better use of the shader cores. That’s why we saw 50-60% improvement in Blender even with RT disabled.

GPU transition between M1/M2 and M3 is roughly comparable with Nvidia’s transition from Pascal to Turing.
 

leman

macrumors Core
Oct 14, 2008
19,520
19,670
Wait, where did you get this information from?

If true, it definitely sounds like a “Turing” upgrade indeed.


I still didn’t get around to writing a micro benchmark for testing this, but it’s on my todo list.
 

name99

macrumors 68020
Jun 21, 2004
2,407
2,309
Early benchmark scores put the 40-core M3 Max at similar performance to the RTX 4070 laptop.

Device NameBlender VersionMedian ScoreNumber of Benchmarks
NVIDIA GeForce RTX 4070 Laptop GPU4.0.03449.215
Apple M3 Max (GPU - 40 cores)4.0.03417.291
NVIDIA GeForce RTX 4060 Laptop GPU4.0.03245.133
NVIDIA GeForce RTX 3070 Laptop GPU4.0.03102.842
Apple M3 Max (GPU - 30 cores)4.0.02942.111
In terms of features what we know about nVidia is that
(1) we get basic ray tracing (traversal of the BVH and box/triangle tests) in Turing 2018

(2) we get support for motion blur (allow nodes of the BVH to move by small [linear in time] amounts and test against possibly moved node) in Ampere 2020

(3) we get support for opacity maps (fast way to fake handle ray tracing complex geometry like leaves) in Ada 2023
(4) we get support for facet maps (a different way to "fake" complex micro-geometry – less faking this way!)
(5) we get support for SER (shader execution reordering; ie handling ray divergence)

Apple certainly have support for 1 and 5.
The Apple ray tracing API added support for (2) years ago. I would guess Apple HW supports it, but???
Apple may or may not have support for (3) and (4). [Both would require some API changes, I think, and so even if the HW support is present, we may not know about it until WWDC. Experts correct me if I'm wrong.]

If we assume that Apple is (in some way) using IMG's Ray Tracing Hardware (bought and changed the IP or whatever) then Apple also creates the BVH in hardware, whereas nVidia still does this "in software" (I think that means on the CPU).
Perhaps the Ada Ray Tracing updates also gave nVidia hardware BVH creation?
 
  • Like
Reactions: Xiao_Xi

name99

macrumors 68020
Jun 21, 2004
2,407
2,309
For raw graphics performance, the 40 core M3 Max ran at 14.2 teraflops. In comparison, the 38 core M2 Max ran at 13.5 tflops. An X-Box Series X ran about 12.5 tflops.

It seems the increase from the M2 to the M3, with 2 extra graphics cores, was negligible.
That's as dumb as saying that a 6GHz x86 chip MUST BE twice as fast as a 3GHz Apple chip. It has twice the GHZ's!

If you think number of FMAs determines the overall performance of a GPU, you know nothing about the field.
 

komuh

macrumors regular
May 13, 2023
126
113
It seems, but M3 also massively redesigned the GPU core. They now can do FP32+FP16 or FP32+INT simultaneously, which in some cases can double performance. Also, the new resource allocation system (Dynamic Cache) can massively improve occupancy, making better use of the shader cores. That’s why we saw 50-60% improvement in Blender even with RT disabled.

GPU transition between M1/M2 and M3 is roughly comparable with Nvidia’s transition from Pascal to Turing.
Im pretty sure FP32/FP16+Int was case even on M1 already, not sure about FP16 + FP32.

Never mind its only FP32/16 + INT in specific cases not universal so indeed it can be a huge stuff especially for quantisation.
 
Last edited:

name99

macrumors 68020
Jun 21, 2004
2,407
2,309
Im pretty sure FP32/FP16+Int was case even on M1 already, not sure about FP16 + FP32.

Never mind its only FP32/16 + INT in specific cases not universal so indeed it can be a huge stuff especially for quantisation.
How? There is no superscalar dispatch in any of the current leading edge designs.

nV does it by their time-multiplexing business (32-wide warps, 16-wide hardware, takes two cycles to dispatch a warp, so second cycle you can dispatch to a different execution pipe).
AMD does it by aggressive use of SIMD2 instructions (which can of course only give you FP16+FP16 or FP32+FP32)

Apple does it by dispatching warps from two independent tasks. This is feasible for them in a way that it's not for nV or AMD because they have so much larger a pool of available warps & threadblocks for scheduling. They already had about 1.5x as many in M2 courtesy of larger register file; with dynamic caching even more are usually available.
 

thunng8

macrumors 65816
Feb 8, 2006
1,032
417
I spent a few hours earlier looking for exactly that, and I didn’t see anything in bug reports, devtalk, or benchmark commits. I’m not super familiar with their code base, but nothing jumped out at me WRT “rebalancing” or “increased rendering quality.” Looks like a regression to me; this release touched a lot of things, so that’s my current opinion.

@thunng8 Inquiring minds would like to know the basis of your claim. TIA 🙂
I did read it a while ago, but cannot find the link anymore. In any case, there have been lots of changes to the rendering pipeline, so that's what we have now.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
How do you justify M3 Pro comparable to M1 Ultra?
As far as GPU-only rendering is concerned, the 18-core M3 Pro will most likely sit between the 64-core M1 Ultra and the 48-core M1 Ultra. Unfortunately, there are no scores for either of these M1 Ultra, so we'll have to wait a bit longer.

Device NameBlender VersionMedian ScoreNumber of Benchmarks
Apple M3 Max (GPU - 40 cores)4.0.03417.299
Apple M2 Ultra (GPU - 76 cores)4.0.03263.22
Apple M3 Max (GPU - 30 cores)4.0.02851.215
Apple M3 Pro (GPU - 18 cores)4.0.01510.377
Apple M3 Pro (GPU - 14 cores)4.0.01436.992
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Tom's Hardware has compared rendering in Blender 4.0 and 3.6 with some Nvidia, AMD and Intel GPUs and Blender 4.0 takes longer to render than Blender 3.6 on all but one GPU.

Blender v3.6.0 GeomeanBlender v4.0.0 GeomeanPercent Change
RTX 40904213.93685.5-12.5%
RTX 40803119.62790.2-10.6%
RTX 40701943.21743.5-10.3%
RTX 40601160.31036.1-10.7%
RTX 3090 Ti2228.01943.2-12.8%
RX 7900 XTX1252.91260.70.6%
RX 7900 XT1144.11094.9-4.3%
RX 7800 XT752.6720.8-4.2%
RX 7600422.5394.3-6.7%
Arc A770 16GB696.5679.0-2.5%
Arc A750693.7672.7-3.0%
 
  • Like
Reactions: l0stl0rd

Adult80HD

macrumors 6502a
Nov 19, 2019
701
837
These are the scores from my M2 Ultra and 16" M3 Max; both with 128GB of RAM.
Screenshot 2023-11-16 at 12.54.32 PM.png
 

Adult80HD

macrumors 6502a
Nov 19, 2019
701
837
Tom's Hardware has compared rendering in Blender 4.0 and 3.6 with some Nvidia, AMD and Intel GPUs and Blender 4.0 takes longer to render than Blender 3.6 on all but one GPU.

Blender v3.6.0 GeomeanBlender v4.0.0 GeomeanPercent Change
RTX 40904213.93685.5-12.5%
RTX 40803119.62790.2-10.6%
RTX 40701943.21743.5-10.3%
RTX 40601160.31036.1-10.7%
RTX 3090 Ti2228.01943.2-12.8%
RX 7900 XTX1252.91260.70.6%
RX 7900 XT1144.11094.9-4.3%
RX 7800 XT752.6720.8-4.2%
RX 7600422.5394.3-6.7%
Arc A770 16GB696.5679.0-2.5%
Arc A750693.7672.7-3.0%
Interestingly the results on Blenders page online are MUCH higher for the 4090, and when I tested my rendering machine it got the same range as Tom's Hardware is showing here. Then I just updated the driver and it suddenly jumped into the 11,000 range like most of the ones online on Blender's page. Must have been some big change in the driver for that.
 
  • Like
Reactions: richinaus

Standard

macrumors 6502
Jul 8, 2008
296
59
Canada
Hey folks. My new loaded M3 Max will be arriving soon! Will be doing extensive character/creature work for film/games and will document my experience. Previously, the M2 was absolutely great, and ran circles around my old workstation. Looking forward to seeing the extra horse power this will give, and hopefully my findings will be helpful to other artists who wish to use a Mac for specifically this type of work.
 

Gloor

macrumors 65816
Apr 19, 2007
1,025
733
Maya?


Hey folks. My new loaded M3 Max will be arriving soon! Will be doing extensive character/creature work for film/games and will document my experience. Previously, the M2 was absolutely great, and ran circles around my old workstation. Looking forward to seeing the extra horse power this will give, and hopefully my findings will be helpful to other artists who wish to use a Mac for specifically this type of work.
 

Gloor

macrumors 65816
Apr 19, 2007
1,025
733
Perfect, I'm curious about maya. Thats what I do most of my time so wonder how good the performance is there. :)



Yes, Maya with Arnold, Houdini with Karma, and Marmoset with RTX. Waiting on Unreal to allow alembic hairs I’ll likely also dive into Redshift now that this new hardware is here. Will be covering lots of displacement, udims, XYZ, grooming, etc.
 
  • Like
Reactions: RobertoDLV
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.