Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Beau10

macrumors 65816
Apr 6, 2008
1,406
732
US based digital nomad
Another point to note is that the memory bandwidth of the M3Max is already 400mb/sec, not the 273 of the M4Pro. No software makes use of this at any level even approaching 200 mb/sec, but one might assume they would, and when they do the M3Max will beat out the M4pro.

LLM inference speed is closely tied to and limited by memory bandwidth. Running local models is the overwhelming use case for why this spec is important.

After the first token, an M3 Max will generate tokens ~47% faster than an M4 Pro, an M4 Max ~37% faster than an M3 Max.
 
Last edited:
  • Like
Reactions: Macintosh IIcx

name99

macrumors 68020
Jun 21, 2004
2,407
2,308
I ran all three versions of the GB6 AI Benchmark, and the scores are vastly different for each version of the benchmark:

(MBP 14" M2 Max)

CPU:
4152 Single Precision Score
6702 Half Precision Score
5707 Quantized Score

GPU:
12632 Single Precision Score
13815 Half Precision Score
10733 Quantized Score

Neural Engine:
4132 Single Precision Score
20807 Half Precision Score
23059 Quantized Score
(a) You need to say what version of the OS you used. Performance has changed notably between macOS15 and 15.1 (and perhaps also 15.2)? And the version of GB6, since things change when get recompile to the latest macOS. Basically the SW is changing fast enough that it's an important part of the score.

(b) Neural Engine has ZERO support for FP32 (ie "single precision"). When you force macOS or iOS to run Single Precision on the Neural Engine, it will fall back to the CPU (which means AMX/SME). That's why the ANE FP32 number is exactly the same as the CPU FP32 number.
 

zarathu

macrumors 6502a
May 14, 2003
652
362
the neural engine isn't the only thing doing AI acceleration, the GPUs are doing a lot of it based on the specific workloads. that's where the newer generation sees a difference and especially going to Pros and Max versions.

you'd have to see what the apps actually use, you can see yourself if its using the GPU or NE using 'mactop' in homebrew.
See my current update on apple silicon. As to what this means, the only app that my M1Pro does NOT do very quickly is Topaz 3. I will have to wait to see what the speed increase is there using Topaz. The only one doing those comparisons is ART IS RIGHT on Youtube.
 

Homy

macrumors 68030
Jan 14, 2006
2,502
2,450
Sweden
Thanks to its large unified memory M4 Max runs circles around RTX 4090 and even 6000 ADA in 72b LLM Qwen.

Skärmavbild 2024-11-09 kl. 05.23.02.png


 
  • Like
Reactions: OptimusGrime
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.