Apple M1 CPU & GPU speed is very disappointing

leman · Dec 2, 2021

Leifi said:
I think most people expected the M1, M1 Max, etc. to perform closer to its "hype-factor" from fanboys running native compiled versions of these kinds of high-performance apps.

Which they do, yes. You still have to develop software for them though.

Leifi said:
For people who just use things like Numbers, Pages and Safari apps this is of course completely irrelevant, but if you are into chess it's sad that Apple currently underperforms big-time!

You still don’t get it, do you? The reason for poor performance of chess engines is primarily lack of software support. When you get some motivated devs with native hardware to improve and tune the code, the performance will undoubtedly improve. You also need to understand that high performance ARM desktops is something entirely new. People have been writing and optimizing low-level software for x86 for decades. Skill in optimizing for low-level ARM is much less common and the software people have been writing for ARM-based phones is usually far less complex.

The bottom line here is yes, Apple Silicon is currently a poor choice for chess, due to lack of mature software. Other domains, such as content creation, software development, certain data analysts/stats workloads, where you actually have mature software, show excellent performance.

Taz Mangus · Dec 2, 2021

Let's summarize this thread. The truth was finally revealed after more than 20 pages, in this thread. For anyone who cares, the reason why the OBSCURE StockFish chess game benchmark runs like crap on the M1 Apple silicon, is because the code is highly optimized for X86 and ARM Linux but not for Apple Silicon. Might as well have written a forever-loop, both will run down the battery and both will not give any meaningful results on the M1.

The sad part is the ignorance that was being perpetrated as the truth, very disingenuous. Easy to blame what you don't understand.

The title of this tread should have been something like:

Need help understanding why the StockFish chess game benchmark runs so poorly on the Apple M1 CPU & GPU.

Leifi · Dec 2, 2021

leman said:
You still don’t get it, do you? The reason for poor performance of chess engines is primarily lack of software support. When you get some motivated devs with native hardware to improve and tune the code, the performance will undoubtedly improve.

You state this as it would be an undisputable fact.. it is not a fact if you can not provide any solid proof that optimized M1 chess-engine code can outperform a similar optimized 5800H x86, avx2, or avx512 code. Period. I personally don't believe there is a snowball chance in hell, and no current benchmarks at hand indicate this. As long as no skilled developers using M1 are even willing to put their money where their heart is, this is all guesswork from your side in terms of possible performance improvement potential.

Let's say you could optimize SF code for an M1 by 50% (which I think is extremely over-optimistic) you would still be far back of the pack compared to AMD mobile CPUs, and suggesting you could "optimize" to beat a Cuda version on 3060 for the interference of large NNs is just moronic at best.

Leifi · Dec 2, 2021

Taz Mangus said:
... runs like crap on the M1 Apple silicon, is because the code is highly optimized for X86 and ARM Linux but not for Apple Silicon.

The more likely explanation is...

When comparing M1 using programs that have been highly optimized for various CPU architectures, Apple silicon does seem to fall down pretty flat. The M1 only seems to shine when comparing highly optimized M1-code compared to nonoptimized, generic x86 code not taking full advantage of all cores,that architecture and avialable special instructions for these CPUs. If both are optimized M1 always seem to kinda s*ck, sadly... I would be very happy if it didn't.

JMacHack · Dec 2, 2021

Leifi said:
You state this as it would be an undisputable fact.. it is not a fact if you can not provide any solid proof that optimized M1 chess-engine code can outperform a similar optimized 5800H x86, avx2, or avx512 code. Period. I personally don't believe there is a snowball chance in hell, and no current benchmarks at hand indicate this. As long as no skilled developers using M1 are even willing to put their money where their heart is, this is all guesswork from your side in terms of possible performance improvement potential.

Let's say you could optimize SF code for an M1 by 50% (which I think is extremely over-optimistic) you would still be far back of the pack compared to AMD mobile CPUs, and suggesting you could "optimize" to beat a Cuda version on 3060 for the interference of large NNs is just moronic at best.

If we’re throwing out anything speculative, then I’d ignore your assertion that the M1, which compares favorably in every single other metric to its competition, is somehow naturally an order of magnitude inferior because of some inherent flaw.

Leifi said:
The more likely explanation is...

When comparing M1 with programs that have been highly optimized for various CPU architectures, Apple silicon falls down pretty flat. The M1 only shines when comparing highly optimized M1-code compared to nonoptimized, generic x86 code not taking full advantage of all cores and that architecture, and special instructions. If both are optimized M1 kinda sucks, sadly... I would be very happy if it didn't.

This fails on basic ****ing logic. A cpu architecture that has been out for a single year, somehow has nearly all other benchmarks except stockfish chess more well optimized than x86, which has been the only desktop architecture for a decade and a half?

Leifi · Dec 2, 2021

JMacHack said:
If we’re throwing out anything speculative, then I’d ignore your assertion that the M1, which compares favorably in every single other metric to its competition

Well that's just a blatant lie.. It doesn't compare favorably in every other benchmark. No.

Which benchmark with a similar kind of load of a Chess engine with x86 optimized multi-threaded benchmarks, can you refer to where the M1 beats for example a Ryzen 5900? Please just refer us to that benchmark instead of pulling an "ars-Technica" flawed logic on us

ingambe · Dec 2, 2021

I’m not 100% sure, but doesn’t Stockfish uses an alpha beta algorithm under the hood?
if so, it might be very easy to test it on a simpler code base without architecture-specific optimization to see if M1 architecture is the bottleneck for some reason or if it’s software optimization

but if it’s an alpha beta, I would be very surprised, just an educated guess but the huge cache + very fast memory bandwidth should help a lot here
I’m working on an event based simulation and the M1 shines, we have 3 times better performance on a 13” m1 compared to i7-8000 laptop series

JMacHack · Dec 2, 2021

Leifi said:
Well that's just a blatant lie.. It doesn't compare favorably in every other benchmark. No.

Which benchmark with a similar kind of load of a Chess engine with x86 optimized multi-threaded benchmarks, can you refer to where the M1 beats for example a Ryzen 5900? Please just refer us to that benchmark instead of pulling an "ars-Technica" flawed logic on us

Anandtech has a comprehensive list of benchmarks in which the M1 (non pro or max) performed quite favorably against its x86 rivals, even some running through rosetta. (As has been posted before in this thread)

The 2020 Mac Mini Unleashed: Putting Apple Silicon M1 To The Test

www.anandtech.com

The 2020 Mac Mini Unleashed: Putting Apple Silicon M1 To The Test

www.anandtech.com

The 2020 Mac Mini Unleashed: Putting Apple Silicon M1 To The Test

www.anandtech.com

Now, to prevent shifting of goalposts, let’s examine your posts:

Leifi said:
The more likely explanation is...

When comparing M1 using programs that have been highly optimized for various CPU architectures, Apple silicon does seem to fall down pretty flat. The M1 only seems to shine when comparing highly optimized M1-code compared to nonoptimized, generic x86 code not taking full advantage of all cores,that architecture and avialable special instructions for these CPUs. If both are optimized M1 always seem to kinda s*ck, sadly... I would be very happy if it didn't.

This post asserts: the M1 performs well in only certain highly optimized tests.
Which is wrong.

The post quoted at top, which has since been deleted (lol), above shifts the goalposts to “chess benchmarks”. Naturally, to narrow the goal to the benchmark shown at the beginning of this thread, the stockfish chess benchmark. I won’t even comment on this because there’s been multiple people giving intelligent and detailed answers as to why this specific benchmark shows the M1 underperforming (which you ignore).

And to top it off, you’ve ignored any argument that uses basic logic without technical arguments that goes against your original argument. Which was that the M1, somehow, contrary to all evidence, contrary to the experience of users and reviewers, only performs in specific optimized tasks.

In fact, the opposite is true. The M1 underperforms in this single chess benchmark, which apparently has eluded everyone across the tech press and anyone who has bought an M1, except of course, you an the original poster of this thread.

JimmyjamesEU · Dec 2, 2021

Leifi said:
Well that's just a blatant lie.. It doesn't compare favorably in every other benchmark. No.

What is the proof that this is a lie?

JMacHack · Dec 2, 2021

JimmyjamesEU said:
What is the proof that this is a lie?

Stockfish Chess Benchmark. All other benchmarks clearly are optimized for Apple Silicon and therefore cannot be trusted.

JimmyjamesEU · Dec 2, 2021

JMacHack said:
Stockfish Chess Benchmark. All other benchmarks clearly are optimized for Apple Silicon and therefore cannot be trusted.

Accusing someone of lying seems unfair. The other options are: the accused is correct; the accused is unaware of other evidence.

As it stands, the overwhelming number of tests I’ve seen back up the idea that the M1 is extremely competitive. It’s outrageous to suggest that’s a lie.

leman · Dec 2, 2021

Leifi said:
You state this as it would be an undisputable fact..

Of course I am not, just common sense. I mean, you are comparing a fine-tuned detailed AVX2 code to an early suboptimal NEON code (as far as I can tell there is only one instance where Stockfisch uses NEON and yes, that code is probably suboptimal)

Leifi said:
it is not a fact if you can not provide any solid proof that optimized M1 chess-engine code can outperform a similar optimized 5800H x86, avx2, or avx512 code.

How am I supposed to price it? You are basically asking me to write an optimal implementation. I am not going to waste my time on that unless I am compensated. We already established that.

Leifi said:
As long as no skilled developers using M1 are even willing to put their money where their heart is, this is all guesswork from your side in terms of possible performance improvement potential.

Just like from yours. Of course, on my side we have plenty of micro benchmarks that show M1 excellent performance.

Leifi said:
Let's say you could optimize SF code for an M1 by 50% (which I think is extremely over-optimistic) you would still be far back of the pack compared to AMD mobile CPUs, and suggesting you could "optimize" to beat a Cuda version on 3060 for the interference of large NNs is just moronic at best.

Who is talking about beating 3060? Where did you get that from? But yeah, I have little doubt that M1 will beat Zen3 on properly optimized code, both in core per core and in perf per watt.

JMacHack · Dec 2, 2021

JimmyjamesEU said:
Accusing someone of lying seems unfair. The other options are: the accused is correct; the accused is unaware of other evidence.

As it stands, the overwhelming number of tests I’ve seen back up the idea that the M1 is extremely competitive. It’s outrageous to suggest that’s a lie.

I’m just pointing out the assertion in question, which is:
“The M1 only performs favorably in specific optimized workloads, in all else it is inferior to x86.”

The assertion is based on the evidence of the Stockfish Chess Benchmarks, in which Apple Silicon performs an order of magnitude below comparable x86 cpus.

Other, much more intelligent people than I, have examined why this is the case, and determined that the benchmark itself performs abnormally slow on Apple Silicon, due to specific optimizations in favor of x86, and no such similar optimizations for Apple Silicon.

While some might use this as evidence to assert the original claim, that the M1 is inferior with no optimizations given, the fact that the tech press and users almost unanimously praise the performance of Apple Silicon, and in every other benchmarks Apple silicon performs favorably, points towards the benchmark being the issue.

That is unless, there’s a grand conspiracy to make Apple Silicon seem more performant over x86, and everyone has been duped or is part of said conspiracy.

Leifi · Dec 2, 2021

JimmyjamesEU said:
What is the proof that this is a lie?

The proof is already posted in this thread earlier with a bunch of benchmarks that the M1 gets beaten..

The claim was that "M1 compares favorably in every single other metric to its competition" -....

I can give you an additional one..

The geometric means of all openbenchmarking.org benchmarks between AMD 4800U and Apple M1

JimmyjamesEU · Dec 2, 2021

JMacHack said:
I’m just pointing out the assertion in question, which is:
“The M1 only performs favorably in specific optimized workloads, in all else it is inferior to x86.”

The assertion is based on the evidence of the Stockfish Chess Benchmarks, in which Apple Silicon performs an order of magnitude below comparable x86 cpus.

Other, much more intelligent people than I, have examined why this is the case, and determined that the benchmark itself performs abnormally slow on Apple Silicon, due to specific optimizations in favor of x86, and no such similar optimizations for Apple Silicon.

While some might use this as evidence to assert the original claim, that the M1 is inferior with no optimizations given, the fact that the tech press and users almost unanimously praise the performance of Apple Silicon, and in every other benchmarks Apple silicon performs favorably, points towards the benchmark being the issue.

That is unless, there’s a grand conspiracy to make Apple Silicon seem more performant over x86, and everyone has been duped or is part of said conspiracy.

I understand your point, and agree.

It seems perfectly clear that given the overwhelmingly positive test results on other tests and benchmarks, the onus is entirely on the person claiming the extraordinary. That means he person claiming chess optimisation is a special case, has to prove their case. No one else is obliged to prove anything.

JimmyjamesEU · Dec 2, 2021

Leifi said:
The proof is already posted in this thread earlier with a bunch of benchmarks that the M1 gets beaten..

The claim was that "M1 compares favorably in every single other metric to its competition" -....

I can give you an additional one..

The geometric means of all openbenchmarking.org benchmarks between AMD 4800U and Apple M1

View attachment 1922044

You claimed it was a lie. I’m asking you to substantiate your claim. That the person you accuse is deliberately misleading. Please show your proof.

JimmyjamesEU · Dec 2, 2021

Leifi said:
The proof is already posted in this thread earlier with a bunch of benchmarks that the M1 gets beaten..

The claim was that "M1 compares favorably in every single other metric to its competition" -....

I can give you an additional one..

The geometric means of all openbenchmarking.org benchmarks between AMD 4800U and Apple M1

View attachment 1922044

You might want to investigate why few reputable people cite openbenchmark or phoronix. Most of them linux enthusiasts.

bcortens · Dec 2, 2021

Leifi said:
You state this as it would be an undisputable fact.. it is not a fact if you can not provide any solid proof that optimized M1 chess-engine code can outperform a similar optimized 5800H x86, avx2, or avx512 code. Period. I personally don't believe there is a snowball chance in hell, and no current benchmarks at hand indicate this. As long as no skilled developers using M1 are even willing to put their money where their heart is, this is all guesswork from your side in terms of possible performance improvement potential.

Let's say you could optimize SF code for an M1 by 50% (which I think is extremely over-optimistic) you would still be far back of the pack compared to AMD mobile CPUs, and suggesting you could "optimize" to beat a Cuda version on 3060 for the interference of large NNs is just moronic at best.

As someone who bothered to look at SF code, the AVX code path is declared, right in the code, to be the newest most optimized path, it also looks like it does some work to try and dispatch multiple vectors when multiple units are present but I haven't checked this so I may be wrong. This is why I conclude that you are comparing an optimized codepath to an unoptimized one. I haven't yet had time to do so but I suspect that SF doesn't use more than 1 of the M1s Neon engines (but I'd have to do more work than I hav time for to be sure). If this is true then SF is potentially 4x slower when doing vector processing on M1 than the M1 is capable of.

bcortens · Dec 2, 2021

Leifi said:
The proof is already posted in this thread earlier with a bunch of benchmarks that the M1 gets beaten..

The claim was that "M1 compares favorably in every single other metric to its competition" -....

I can give you an additional one..

The geometric means of all openbenchmarking.org benchmarks between AMD 4800U and Apple M1

View attachment 1922044

When I started looking at those benchmarks, most of them are really not optimized for the Mac or the M1, and some don't even have proper ARM support. Despite that, the M1 is doing pretty respectably, considering it only has 4P cores to the 4800Us 8.

bcortens · Dec 2, 2021

JMacHack said:
Anandtech has a comprehensive list of benchmarks in which the M1 (non pro or max) performed quite favorably against its x86 rivals, even some running through rosetta. (As has been posted before in this thread)

The 2020 Mac Mini Unleashed: Putting Apple Silicon M1 To The Test

www.anandtech.com

The 2020 Mac Mini Unleashed: Putting Apple Silicon M1 To The Test

www.anandtech.com

The 2020 Mac Mini Unleashed: Putting Apple Silicon M1 To The Test

www.anandtech.com

Now, to prevent shifting of goalposts, let’s examine your posts:

This post asserts: the M1 performs well in only certain highly optimized tests.
Which is wrong.

The post quoted at top, which has since been deleted (lol), above shifts the goalposts to “chess benchmarks”. Naturally, to narrow the goal to the benchmark shown at the beginning of this thread, the stockfish chess benchmark. I won’t even comment on this because there’s been multiple people giving intelligent and detailed answers as to why this specific benchmark shows the M1 underperforming (which you ignore).

And to top it off, you’ve ignored any argument that uses basic logic without technical arguments that goes against your original argument. Which was that the M1, somehow, contrary to all evidence, contrary to the experience of users and reviewers, only performs in specific optimized tasks.

In fact, the opposite is true. The M1 underperforms in this single chess benchmark, which apparently has eluded everyone across the tech press and anyone who has bought an M1, except of course, you an the original poster of this thread.

I hope your lunch of glue and crayons is delicious.

It would seem that Leifi only respects open benchmarks that are basically unmaintained on Apple Silicon, he considers these a fair evaluation for some reason...

leman · Dec 2, 2021

Leifi said:
The proof is already posted in this thread earlier with a bunch of benchmarks that the M1 gets beaten..

Your “bunch of benchmarks” are fundamentally flawed, which has been pointed to you out again and again. Phoronix test suite has its uses but it’s not a representative suite and testing has not been dine in a systematic fashion. Your result is based on a handful of scattered benchmarks using oddball tools, most of which are not optimized for the platform. this is not how you test performance if a platform.

leman · Dec 2, 2021

bcortens said:
It would seem that Leifi only respects open benchmarks that are basically unmaintained on Apple Silicon, he considers these a fair evaluation for some reason...

That’s the thing. Nobody uses open benchmark. And the phoronix test suite is a random collection on things mostly geared towards showcasing Linux performance. The concept is great - collect as many tools as possible to have a better overview, but if you do that you also need to consider the maturity of the software. In all Phoronix tests, macOS usually loses big time, because it’s being dragged down by all the terribly performing OpenCL tools nobody cares about.

JimmyjamesEU · Dec 2, 2021

leman said:
That’s the thing. Nobody uses open benchmark. And the phoronix test suite is a random collection on things mostly geared towards showcasing Linux performance. The concept is great - collect as many tools as possible to have a better overview, but if you do that you also need to consider the maturity of the software. In all Phoronix tests, macOS usually loses big time, because it’s being dragged down by all the terribly performing OpenCL tools nobody cares about.

Plus Phoronix is joke in the linux community. It's not seen as a bastion of benchmarking.

jeanlain · Dec 2, 2021

I'm not sure if they were part of open benchmark, but I remember some random algorithms tester by phoronix, on which the M1 performed much better under rosetta than using native ARM code. Because this code was just not optimised at all, no neon, nothing. Rosetta did a better job at translating SMID X86 code.

EDIT: I can't find these results. Maybe I misinterpreted them, but I remember they were discussed here or elsewhere.

Taz Mangus · Dec 2, 2021

Leifi said:
The more likely explanation is...

When comparing M1 using programs that have been highly optimized for various CPU architectures, Apple silicon does seem to fall down pretty flat. The M1 only seems to shine when comparing highly optimized M1-code compared to nonoptimized, generic x86 code not taking full advantage of all cores,that architecture and avialable special instructions for these CPUs. If both are optimized M1 always seem to kinda s*ck, sadly... I would be very happy if it didn't.

Your ignorance is showing in how you are posting. You are being disingenuous and obtuse. It has already been established that the SF TensorFlow backend is not using metal API. Which means that the more accurate statement would be:

When comparing M1 using the SF code programs that have been highly optimized for X86 and ARM Linux and not Apple M1, the SF code falls flat on its face on the M1

Apple M1 CPU & GPU speed is very disappointing

macrumors Core

macrumors 604

Need help understanding why the StockFish chess game benchmark runs so poorly on the Apple M1 CPU & GPU.​

macrumors regular

macrumors regular

Suspended

macrumors regular

macrumors 6502

Suspended

Suspended

Suspended

Suspended

macrumors Core

Suspended

macrumors regular

Suspended

Suspended

Suspended

macrumors 65816

macrumors 65816

macrumors 65816

macrumors Core

macrumors Core

Suspended

macrumors 68020

macrumors 604

Our Staff

Need help understanding why the StockFish chess game benchmark runs so poorly on the Apple M1 CPU & GPU.