Geekbench AI and Apple Silicon’s NPU

Beau10 · Oct 31, 2024

zarathu said:
Another point to note is that the memory bandwidth of the M3Max is already 400mb/sec, not the 273 of the M4Pro. No software makes use of this at any level even approaching 200 mb/sec, but one might assume they would, and when they do the M3Max will beat out the M4pro.

LLM inference speed is closely tied to and limited by memory bandwidth. Running local models is the overwhelming use case for why this spec is important.

After the first token, an M3 Max will generate tokens ~47% faster than an M4 Pro, an M4 Max ~37% faster than an M3 Max.

name99 · Oct 31, 2024

dmccloud said:
I ran all three versions of the GB6 AI Benchmark, and the scores are vastly different for each version of the benchmark:

(MBP 14" M2 Max)

CPU:
4152 Single Precision Score
6702 Half Precision Score
5707 Quantized Score

GPU:
12632 Single Precision Score
13815 Half Precision Score
10733 Quantized Score

Neural Engine:
4132 Single Precision Score
20807 Half Precision Score
23059 Quantized Score

(a) You need to say what version of the OS you used. Performance has changed notably between macOS15 and 15.1 (and perhaps also 15.2)? And the version of GB6, since things change when get recompile to the latest macOS. Basically the SW is changing fast enough that it's an important part of the score.

(b) Neural Engine has ZERO support for FP32 (ie "single precision"). When you force macOS or iOS to run Single Precision on the Neural Engine, it will fall back to the CPU (which means AMX/SME). That's why the ANE FP32 number is exactly the same as the CPU FP32 number.

zarathu · Oct 31, 2024

SG- said:
the neural engine isn't the only thing doing AI acceleration, the GPUs are doing a lot of it based on the specific workloads. that's where the newer generation sees a difference and especially going to Pros and Max versions.

you'd have to see what the apps actually use, you can see yourself if its using the GPU or NE using 'mactop' in homebrew.

See my current update on apple silicon. As to what this means, the only app that my M1Pro does NOT do very quickly is Topaz 3. I will have to wait to see what the speed increase is there using Topaz. The only one doing those comparisons is ART IS RIGHT on Youtube.

Homy · Friday at 8:27 PM

Thanks to its large unified memory M4 Max runs circles around RTX 4090 and even 6000 ADA in 72b LLM Qwen.

Skärmavbild 2024-11-09 kl. 05.23.02.png

Search

Search

Geekbench AI and Apple Silicon’s NPU

Beau10

macrumors 65816

name99

macrumors 68020

zarathu

macrumors 6502a

Homy

macrumors 68030

Our Staff