Apple M1 CPU & GPU speed is very disappointing

cmaier · Apr 27, 2021

robco74 said:
I'm interested to see how M1 performs playing global thermonuclear war.

Do you want to play a game?

thedocbwarren · Apr 27, 2021

cmaier said:
Do you want to play a game?

It wins by not playing.

mi7chy · Apr 27, 2021

robco74 said:
I'm interested to see how M1 performs playing global thermonuclear war.

I've played that on the Apple II so it doesn't take much.

AlexChess · Apr 28, 2021

Hi!
Would be possible to use the 16 neural units of Silicon M1 to speedup chess engines search?
(Sorry, this question could be already be answered, i haven't read all previous comments

)

AlexChess

Gudi · Apr 28, 2021

Opening all applications at once is the kind of performance benchmark I trust.

BetaPro · Apr 29, 2021

Hi OP, your desktop stats must have a ton more performance cores than M1 has, which should explain the difference.

And for SF NNUE, it is indeed slower than the x86 counter parts due to weaker SIMD units. But when I ran the benchmark for classical non NNUE version of SF, the single thread performance actually matches the strongest x86 CPUs.

Obviously it won't be able to compare to a 5950x due to way less cores and no SMT..

BetaPro · Apr 29, 2021

Also want to add that Stockfish is absolutely a good benchmark for CPUs. As a matter of fact, SPEC (both 2006 and 2017) contains Sjeng as the chess engine benchmark, which runs similar algorithm to Stockfish.

cmaier · Apr 29, 2021

BetaPro said:
Also want to add that Stockfish is absolutely a good benchmark for CPUs. As a matter of fact, SPEC (both 2006 and 2017) contains Sjeng as the chess engine benchmark, which runs similar algorithm to Stockfish.

You just disproved your point.

SPEC is a good benchmark. Chess is not. Chess is a good benchmark component. But most computing has a very different profile than chess algorithms. That’s why good benchmarks blend the results of many different kinds of computing activity.

leman · Apr 30, 2021

BetaPro said:
And for SF NNUE, it is indeed slower than the x86 counter parts due to weaker SIMD units. But when I ran the benchmark for classical non NNUE version of SF, the single thread performance actually matches the strongest x86 CPUs.

Just to make it clear, M1’s SIMD units are not weaker than those of x86 CPUs. Yes, M1 only supports 128-bit SIMD, but it has 4 of them and they operate with low latency. Most modern x86 CPUs can do two 256-bit SIMD AVX2 operations per clock, M1 can do 4 128-bit ones which has the same net result. And it you use 128-bit operations (e.g. SSE), M1 will be faster.

Now, newest Intel cores do have two 512-bit units, so they can achieve higher throughput under certain conditions and with software specifically written for them.

I dint know how exactly Stockfisch utilizes SIMD, but if M1 doesn’t perform well there my initial assumption would be that it’s simply not well optimized for ARM Neon. M1 usually does incredibly good on number-crunching benchmarks.

BetaPro · Apr 30, 2021

cmaier said:
You just disproved your point.

SPEC is a good benchmark. Chess is not. Chess is a good benchmark component. But most computing has a very different profile than chess algorithms. That’s why good benchmarks blend the results of many different kinds of computing activity.

If you define benchmarking like that (must have varied workloads), then I definitely agree with you. It's also why I hate that many x86 fans think that Cinebench is a good benchmark...

BetaPro · Apr 30, 2021

leman said:
Just to make it clear, M1’s SIMD units are not weaker than those of x86 CPUs. Yes, M1 only supports 128-bit SIMD, but it has 4 of them and they operate with low latency. Most modern x86 CPUs can do two 256-bit SIMD AVX2 operations per clock, M1 can do 4 128-bit ones which has the same net result. And it you use 128-bit operations (e.g. SSE), M1 will be faster.

Now, newest Intel cores do have two 512-bit units, so they can achieve higher throughput under certain conditions and with software specifically written for them.

I dint know how exactly Stockfisch utilizes SIMD, but if M1 doesn’t perform well there my initial assumption would be that it’s simply not well optimized for ARM Neon. M1 usually does incredibly good on number-crunching benchmarks.

SIMD is a recent addition to Stockfish, and yeah it probably isn't well optimized for Neon yet. I'm not certain about whose SIMD is better, but the current observation is that M1 loses more performance than x86 after the switch to NNUE, maybe I should look into optimizing it for M1.

crazy dave · Apr 30, 2021

BetaPro said:
If you define benchmarking like that (must have varied workloads), then I definitely agree with you. It's also why I hate that many x86 fans think that Cinebench is a good benchmark...

Yeah it depends on *why* one is benchmarking: throw a bunch varied workloads together to make a general statement about the CPU’s performance (as the OP tried to do, but with only one workload) or to benchmark processes in a specific workflow (or close to it) that the user cares about because that’s what they use a computer for (closer to what the OP actually did).

Both are valid, but different. The latter is more immediately relevant but also brittle - a workflow may change over time and programs may get further optimized. The former is less specific to a specific user’s needs, but is more likely to be robust over time and different tasks.

That’s how I view the utility of benchmarking anyway.

BigMcGuire · Apr 30, 2021

The idea of being able to have a chess engine crunch my games without getting superheated is really attractive to me. I went 100% laptop in 2015 and that's something I really missed with a Desktop, the ability to have a chess engine go 100% and being able to watch it without frying myself.

I'm Mac now too, I haven't really had time to find a good chess engine interface. I miss the days of having Deep Fritz do massive chess engine tournaments like @AlexChess does

It's been a long time. I used to have 20-30 chess engines go against each other.

Back in the day I had a computer on ICC with Winboard, even took some open source code (with the author's permission) and ran a custom engine on ICC for years.

This MacBook is the first laptop computer I'm not against pushing because of temps.

redshift27 · Apr 30, 2021

BetaPro said:
SIMD is a recent addition to Stockfish, and yeah it probably isn't well optimized for Neon yet. I'm not certain about whose SIMD is better, but the current observation is that M1 loses more performance than x86 after the switch to NNUE, maybe I should look into optimizing it for M1.

Please do. Just head over to Stockfish on Github and have a look at the NEON-specific code. Indeed, NNUE and the matrix calculations required therein are described in this excellent document: https://github.com/glinscott/nnue-pytorch/blob/f6a2e30d9393a7a8e62f0e3f8bfeecdf84b373c0/docs/nnue.md

BigMcGuire said:
I'm Mac now too, I haven't really had time to find a good chess engine interface.

One option for a good (paid) interface is to run ChessBase or Fritz as the GUI under Parallels / ARM Windows - which works pretty well - and use inBetween.exe and the UCI interface to connect to the Stockfish chess engine natively on the M1 on MacOS. Obviously if you are striving for every CPU cycle there are inefficiencies here, but for simple analysis this can work very well.

Leela (lc0) struggles compared to Stockfish on M1 hardware as it thrives on a powerful GPU which Apple Silicon does not offer. CoreML and the Apple Neural Engine do not currently offer a satisfactory solution to this.

mi7chy · Apr 30, 2021

AMD Zen 4 will get 5nm, AVX512, DDR5, etc. so it'll be even faster. M1 only looks good compared to Intel but too many trade offs compared to AMD.

jdb8167 · Apr 30, 2021

mi7chy said:
AMD Zen 4 will get 5nm, AVX512, DDR5, etc. so it'll be even faster. M1 only looks good compared to Intel but too many trade offs compared to AMD.

Apple is rumored to be already fabbing the successor to the M1. Should be available by summer. When does AMD start doing 5 nm?

leman · Apr 30, 2021

mi7chy said:
AMD Zen 4 will get 5nm, AVX512, DDR5, etc. so it'll be even faster. M1 only looks good compared to Intel but too many trade offs compared to AMD.

Firestorm offers same performance as fastest Zen 3 configuration, with 1/4 the power consumption. AMD's main selling point is that they offer more CPU cores in their budget models.

dmccloud · Apr 30, 2021

mi7chy said:
Funny how people selectively preach dorkbench when M1 does well but downplay a more relevant real world workload when M1 doesn't. Chess has been a relevant workload and benchmark going back to IBM Deep Blue to current DeepMind AlphaZero that topple all the grandmaster human players.

There was maybe one intelligent response trying to understand what the bottleneck is.

A benchmark written to take advantage of Intel-specific instructions that has not been optimized for the M1 (and does not even take full advantage of the M1s capabilities) is a poor measuring stick regardless of whether chess is a "relevant workload" or not. For me, that is meaningless since I don't play chess and have no desire to start. What I DO use the computer for is web development, graphic design, Photoshop, web browsing and some gaming, and the M1 is more than up to the task for all of those purposes and more. What's funny is how people selectively preach benchmarks that can't even properly utilize the M1 as somehow being relevant.

dmccloud · Apr 30, 2021

mi7chy said:
AMD Zen 4 will get 5nm, AVX512, DDR5, etc. so it'll be even faster. M1 only looks good compared to Intel but too many trade offs compared to AMD.

AMD can't move to 5nm until Apple moves to 3nm, since Apple booked up all 5nm capacity for the foreseeable future and also has preregistered for 3nm.

mi7chy · Apr 30, 2021

dmccloud said:
AMD can't move to 5nm until Apple moves to 3nm, since Apple booked up all 5nm capacity for the foreseeable future and also has preregistered for 3nm.

That's BS along the lines of 8GB on M1 equals 16GB, M1 iGPU is zero copy buffer and x64 is not, M1 iGPU equal high end discrete graphics, etc. Reality is AMD makes CPUs and GPUs for the 90%+ PC marketshare and only 7nm fab can provide that capacity. Nothing stopping AMD from allocating 5nm or next gen node capacity at TSMC but less of a need since their current 7nm surpasses 5nm M1 performance.

hagjohn · Apr 30, 2021

cmaier said:
Do you want to play a game?

Ever since I saw WarGames in 1983, my default answer has always been no.

I think M1 has been pretty successful for Apples first desktop CPU. I have a M1 Mini and I love mine. I am not disappointed in the least.

leman · Apr 30, 2021

mi7chy said:
Reality is AMD makes CPUs and GPUs for the 90%+ PC marketshare and only 7nm fab can provide that capacity.

This is an excellent point that many posters here tend to neglect. Supply constraints are real. Apple can only do what it does because it targets a specific market niche.

mi7chy said:
Nothing stopping AMD from allocating 5nm or next gen node capacity at TSMC but less of a need since their current 7nm surpasses 5nm M1 performance.

This is more debatable. Yes, AMD premium desktop CPUs are obviously faster than the entry level low-energy M1 (duh). But those enthusiast-class CPUs are also expensive and in low supply. I’m also not sure that booking 5nm capacity is that simple: Apple did buy almost all of it to produce iPhone chips and they have higher budget for fab expenses than AMD. There is a reason why all other 5nm products on the market currently are from Samsung fabs.

Jorbanead · Apr 30, 2021

mi7chy said:
their current 7nm surpasses 5nm M1 performance.

AMD 7nm and M1 are not comparable. M1 outperforms AMD when you look at watt per watt. M1 is a 10-15 watt chip. Can you show me a 10-15W AMD chip that outperforms M1 in single thread performance?

thedocbwarren · Apr 30, 2021

Jorbanead said:
AMD 7nm and M1 are not comparable. M1 outperforms AMD when you look at watt per watt. M1 is a 10-15 watt chip. Can you show me a 10-15W AMD chip that outperforms M1 in single thread performance?

And even if they were, Apple isn't using them.

Gnattu · Apr 30, 2021

mi7chy said:
Reality is AMD makes CPUs and GPUs for the 90%+ PC marketshare

If you called others’ statements as BS, then this is also BS.

By the way, outperforming with (more than) doubling the power is not an architecture advantage, period.

You can deny M1’s performance if you want, and I do agree AMD has extremely solid offering. I’ m using 5900x and 3900x and they are fast, but if you let me choose which chip I want to put in a sub 4 pound laptop, I’d choose Apple M1. 5800U is great, but it does use more power. I’m glad to help you to find out if this statement is true.

Apple M1 CPU & GPU speed is very disappointing

Suspended

macrumors 6502

Suspended

macrumors newbie

Suspended

macrumors member

macrumors member

Suspended

macrumors Core

macrumors member

macrumors member

macrumors 68000

macrumors G3

macrumors newbie

Suspended

macrumors 601

macrumors Core

macrumors 68040

macrumors 68040

Suspended

macrumors 68000

macrumors Core

macrumors 65816

macrumors 6502

macrumors 65816

Our Staff