Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Status
Not open for further replies.

leman

macrumors Core
Oct 14, 2008
19,521
19,678
I think most people expected the M1, M1 Max, etc. to perform closer to its "hype-factor" from fanboys running native compiled versions of these kinds of high-performance apps.

Which they do, yes. You still have to develop software for them though.

For people who just use things like Numbers, Pages and Safari apps this is of course completely irrelevant, but if you are into chess it's sad that Apple currently underperforms big-time!

You still don’t get it, do you? The reason for poor performance of chess engines is primarily lack of software support. When you get some motivated devs with native hardware to improve and tune the code, the performance will undoubtedly improve. You also need to understand that high performance ARM desktops is something entirely new. People have been writing and optimizing low-level software for x86 for decades. Skill in optimizing for low-level ARM is much less common and the software people have been writing for ARM-based phones is usually far less complex.

The bottom line here is yes, Apple Silicon is currently a poor choice for chess, due to lack of mature software. Other domains, such as content creation, software development, certain data analysts/stats workloads, where you actually have mature software, show excellent performance.
 

Taz Mangus

macrumors 604
Mar 10, 2011
7,815
3,504
Let's summarize this thread. The truth was finally revealed after more than 20 pages, in this thread. For anyone who cares, the reason why the OBSCURE StockFish chess game benchmark runs like crap on the M1 Apple silicon, is because the code is highly optimized for X86 and ARM Linux but not for Apple Silicon. Might as well have written a forever-loop, both will run down the battery and both will not give any meaningful results on the M1.

The sad part is the ignorance that was being perpetrated as the truth, very disingenuous. Easy to blame what you don't understand.

The title of this tread should have been something like:

Need help understanding why the StockFish chess game benchmark runs so poorly on the Apple M1 CPU & GPU.​

 
Last edited:

Leifi

macrumors regular
Nov 6, 2021
128
121
You still don’t get it, do you? The reason for poor performance of chess engines is primarily lack of software support. When you get some motivated devs with native hardware to improve and tune the code, the performance will undoubtedly improve.

You state this as it would be an undisputable fact.. it is not a fact if you can not provide any solid proof that optimized M1 chess-engine code can outperform a similar optimized 5800H x86, avx2, or avx512 code. Period. I personally don't believe there is a snowball chance in hell, and no current benchmarks at hand indicate this. As long as no skilled developers using M1 are even willing to put their money where their heart is, this is all guesswork from your side in terms of possible performance improvement potential.

Let's say you could optimize SF code for an M1 by 50% (which I think is extremely over-optimistic) you would still be far back of the pack compared to AMD mobile CPUs, and suggesting you could "optimize" to beat a Cuda version on 3060 for the interference of large NNs is just moronic at best.
 
  • Like
Reactions: Appletoni

Leifi

macrumors regular
Nov 6, 2021
128
121
... runs like crap on the M1 Apple silicon, is because the code is highly optimized for X86 and ARM Linux but not for Apple Silicon.

The more likely explanation is...

When comparing M1 using programs that have been highly optimized for various CPU architectures, Apple silicon does seem to fall down pretty flat. The M1 only seems to shine when comparing highly optimized M1-code compared to nonoptimized, generic x86 code not taking full advantage of all cores,that architecture and avialable special instructions for these CPUs. If both are optimized M1 always seem to kinda s*ck, sadly... I would be very happy if it didn't.
 
Last edited:

JMacHack

Suspended
Mar 16, 2017
1,965
2,424
You state this as it would be an undisputable fact.. it is not a fact if you can not provide any solid proof that optimized M1 chess-engine code can outperform a similar optimized 5800H x86, avx2, or avx512 code. Period. I personally don't believe there is a snowball chance in hell, and no current benchmarks at hand indicate this. As long as no skilled developers using M1 are even willing to put their money where their heart is, this is all guesswork from your side in terms of possible performance improvement potential.

Let's say you could optimize SF code for an M1 by 50% (which I think is extremely over-optimistic) you would still be far back of the pack compared to AMD mobile CPUs, and suggesting you could "optimize" to beat a Cuda version on 3060 for the interference of large NNs is just moronic at best.
If we’re throwing out anything speculative, then I’d ignore your assertion that the M1, which compares favorably in every single other metric to its competition, is somehow naturally an order of magnitude inferior because of some inherent flaw.

The more likely explanation is...

When comparing M1 with programs that have been highly optimized for various CPU architectures, Apple silicon falls down pretty flat. The M1 only shines when comparing highly optimized M1-code compared to nonoptimized, generic x86 code not taking full advantage of all cores and that architecture, and special instructions. If both are optimized M1 kinda sucks, sadly... I would be very happy if it didn't.
This fails on basic ****ing logic. A cpu architecture that has been out for a single year, somehow has nearly all other benchmarks except stockfish chess more well optimized than x86, which has been the only desktop architecture for a decade and a half?
 
Last edited by a moderator:

Leifi

macrumors regular
Nov 6, 2021
128
121
If we’re throwing out anything speculative, then I’d ignore your assertion that the M1, which compares favorably in every single other metric to its competition

Well that's just a blatant lie.. It doesn't compare favorably in every other benchmark. No.

Which benchmark with a similar kind of load of a Chess engine with x86 optimized multi-threaded benchmarks, can you refer to where the M1 beats for example a Ryzen 5900? Please just refer us to that benchmark instead of pulling an "ars-Technica" flawed logic on us :)
 
Last edited:
  • Like
Reactions: Appletoni

ingambe

macrumors 6502
Mar 22, 2020
320
355
I’m not 100% sure, but doesn’t Stockfish uses an alpha beta algorithm under the hood?
if so, it might be very easy to test it on a simpler code base without architecture-specific optimization to see if M1 architecture is the bottleneck for some reason or if it’s software optimization

but if it’s an alpha beta, I would be very surprised, just an educated guess but the huge cache + very fast memory bandwidth should help a lot here
I’m working on an event based simulation and the M1 shines, we have 3 times better performance on a 13” m1 compared to i7-8000 laptop series
 
  • Like
Reactions: Appletoni

JMacHack

Suspended
Mar 16, 2017
1,965
2,424
Well that's just a blatant lie.. It doesn't compare favorably in every other benchmark. No.

Which benchmark with a similar kind of load of a Chess engine with x86 optimized multi-threaded benchmarks, can you refer to where the M1 beats for example a Ryzen 5900? Please just refer us to that benchmark instead of pulling an "ars-Technica" flawed logic on us :)
Anandtech has a comprehensive list of benchmarks in which the M1 (non pro or max) performed quite favorably against its x86 rivals, even some running through rosetta. (As has been posted before in this thread)


Now, to prevent shifting of goalposts, let’s examine your posts:

The more likely explanation is...

When comparing M1 using programs that have been highly optimized for various CPU architectures, Apple silicon does seem to fall down pretty flat. The M1 only seems to shine when comparing highly optimized M1-code compared to nonoptimized, generic x86 code not taking full advantage of all cores,that architecture and avialable special instructions for these CPUs. If both are optimized M1 always seem to kinda s*ck, sadly... I would be very happy if it didn't.
This post asserts: the M1 performs well in only certain highly optimized tests.
Which is wrong.

The post quoted at top, which has since been deleted (lol), above shifts the goalposts to “chess benchmarks”. Naturally, to narrow the goal to the benchmark shown at the beginning of this thread, the stockfish chess benchmark. I won’t even comment on this because there’s been multiple people giving intelligent and detailed answers as to why this specific benchmark shows the M1 underperforming (which you ignore).

And to top it off, you’ve ignored any argument that uses basic logic without technical arguments that goes against your original argument. Which was that the M1, somehow, contrary to all evidence, contrary to the experience of users and reviewers, only performs in specific optimized tasks.

In fact, the opposite is true. The M1 underperforms in this single chess benchmark, which apparently has eluded everyone across the tech press and anyone who has bought an M1, except of course, you an the original poster of this thread.
 
Last edited by a moderator:

JimmyjamesEU

Suspended
Jun 28, 2018
397
426
Stockfish Chess Benchmark. All other benchmarks clearly are optimized for Apple Silicon and therefore cannot be trusted.
Accusing someone of lying seems unfair. The other options are: the accused is correct; the accused is unaware of other evidence.

As it stands, the overwhelming number of tests I’ve seen back up the idea that the M1 is extremely competitive. It’s outrageous to suggest that’s a lie.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,678
You state this as it would be an undisputable fact..

Of course I am not, just common sense. I mean, you are comparing a fine-tuned detailed AVX2 code to an early suboptimal NEON code (as far as I can tell there is only one instance where Stockfisch uses NEON and yes, that code is probably suboptimal)

it is not a fact if you can not provide any solid proof that optimized M1 chess-engine code can outperform a similar optimized 5800H x86, avx2, or avx512 code.

How am I supposed to price it? You are basically asking me to write an optimal implementation. I am not going to waste my time on that unless I am compensated. We already established that.
As long as no skilled developers using M1 are even willing to put their money where their heart is, this is all guesswork from your side in terms of possible performance improvement potential.

Just like from yours. Of course, on my side we have plenty of micro benchmarks that show M1 excellent performance.

Let's say you could optimize SF code for an M1 by 50% (which I think is extremely over-optimistic) you would still be far back of the pack compared to AMD mobile CPUs, and suggesting you could "optimize" to beat a Cuda version on 3060 for the interference of large NNs is just moronic at best.

Who is talking about beating 3060? Where did you get that from? But yeah, I have little doubt that M1 will beat Zen3 on properly optimized code, both in core per core and in perf per watt.
 

JMacHack

Suspended
Mar 16, 2017
1,965
2,424
Accusing someone of lying seems unfair. The other options are: the accused is correct; the accused is unaware of other evidence.

As it stands, the overwhelming number of tests I’ve seen back up the idea that the M1 is extremely competitive. It’s outrageous to suggest that’s a lie.
I’m just pointing out the assertion in question, which is:
“The M1 only performs favorably in specific optimized workloads, in all else it is inferior to x86.”

The assertion is based on the evidence of the Stockfish Chess Benchmarks, in which Apple Silicon performs an order of magnitude below comparable x86 cpus.

Other, much more intelligent people than I, have examined why this is the case, and determined that the benchmark itself performs abnormally slow on Apple Silicon, due to specific optimizations in favor of x86, and no such similar optimizations for Apple Silicon.

While some might use this as evidence to assert the original claim, that the M1 is inferior with no optimizations given, the fact that the tech press and users almost unanimously praise the performance of Apple Silicon, and in every other benchmarks Apple silicon performs favorably, points towards the benchmark being the issue.

That is unless, there’s a grand conspiracy to make Apple Silicon seem more performant over x86, and everyone has been duped or is part of said conspiracy.
 
  • Like
Reactions: Appletoni

Leifi

macrumors regular
Nov 6, 2021
128
121
What is the proof that this is a lie?

The proof is already posted in this thread earlier with a bunch of benchmarks that the M1 gets beaten..

The claim was that "M1 compares favorably in every single other metric to its competition" -....


I can give you an additional one..


The geometric means of all openbenchmarking.org benchmarks between AMD 4800U and Apple M1

APple_4800.jpg
 
  • Like
Reactions: Appletoni

JimmyjamesEU

Suspended
Jun 28, 2018
397
426
I’m just pointing out the assertion in question, which is:
“The M1 only performs favorably in specific optimized workloads, in all else it is inferior to x86.”

The assertion is based on the evidence of the Stockfish Chess Benchmarks, in which Apple Silicon performs an order of magnitude below comparable x86 cpus.

Other, much more intelligent people than I, have examined why this is the case, and determined that the benchmark itself performs abnormally slow on Apple Silicon, due to specific optimizations in favor of x86, and no such similar optimizations for Apple Silicon.

While some might use this as evidence to assert the original claim, that the M1 is inferior with no optimizations given, the fact that the tech press and users almost unanimously praise the performance of Apple Silicon, and in every other benchmarks Apple silicon performs favorably, points towards the benchmark being the issue.

That is unless, there’s a grand conspiracy to make Apple Silicon seem more performant over x86, and everyone has been duped or is part of said conspiracy.
I understand your point, and agree.

It seems perfectly clear that given the overwhelmingly positive test results on other tests and benchmarks, the onus is entirely on the person claiming the extraordinary. That means he person claiming chess optimisation is a special case, has to prove their case. No one else is obliged to prove anything.
 

JimmyjamesEU

Suspended
Jun 28, 2018
397
426
The proof is already posted in this thread earlier with a bunch of benchmarks that the M1 gets beaten..

The claim was that "M1 compares favorably in every single other metric to its competition" -....


I can give you an additional one..


The geometric means of all openbenchmarking.org benchmarks between AMD 4800U and Apple M1

View attachment 1922044
You claimed it was a lie. I’m asking you to substantiate your claim. That the person you accuse is deliberately misleading. Please show your proof.
 

JimmyjamesEU

Suspended
Jun 28, 2018
397
426
The proof is already posted in this thread earlier with a bunch of benchmarks that the M1 gets beaten..

The claim was that "M1 compares favorably in every single other metric to its competition" -....


I can give you an additional one..


The geometric means of all openbenchmarking.org benchmarks between AMD 4800U and Apple M1

View attachment 1922044
You might want to investigate why few reputable people cite openbenchmark or phoronix. Most of them linux enthusiasts.
 

bcortens

macrumors 65816
Aug 16, 2007
1,324
1,796
Canada
You state this as it would be an undisputable fact.. it is not a fact if you can not provide any solid proof that optimized M1 chess-engine code can outperform a similar optimized 5800H x86, avx2, or avx512 code. Period. I personally don't believe there is a snowball chance in hell, and no current benchmarks at hand indicate this. As long as no skilled developers using M1 are even willing to put their money where their heart is, this is all guesswork from your side in terms of possible performance improvement potential.

Let's say you could optimize SF code for an M1 by 50% (which I think is extremely over-optimistic) you would still be far back of the pack compared to AMD mobile CPUs, and suggesting you could "optimize" to beat a Cuda version on 3060 for the interference of large NNs is just moronic at best.
As someone who bothered to look at SF code, the AVX code path is declared, right in the code, to be the newest most optimized path, it also looks like it does some work to try and dispatch multiple vectors when multiple units are present but I haven't checked this so I may be wrong. This is why I conclude that you are comparing an optimized codepath to an unoptimized one. I haven't yet had time to do so but I suspect that SF doesn't use more than 1 of the M1s Neon engines (but I'd have to do more work than I hav time for to be sure). If this is true then SF is potentially 4x slower when doing vector processing on M1 than the M1 is capable of.
 

bcortens

macrumors 65816
Aug 16, 2007
1,324
1,796
Canada
The proof is already posted in this thread earlier with a bunch of benchmarks that the M1 gets beaten..

The claim was that "M1 compares favorably in every single other metric to its competition" -....


I can give you an additional one..


The geometric means of all openbenchmarking.org benchmarks between AMD 4800U and Apple M1

View attachment 1922044
When I started looking at those benchmarks, most of them are really not optimized for the Mac or the M1, and some don't even have proper ARM support. Despite that, the M1 is doing pretty respectably, considering it only has 4P cores to the 4800Us 8.
 
  • Like
Reactions: 3Rock and JMacHack

bcortens

macrumors 65816
Aug 16, 2007
1,324
1,796
Canada
Anandtech has a comprehensive list of benchmarks in which the M1 (non pro or max) performed quite favorably against its x86 rivals, even some running through rosetta. (As has been posted before in this thread)


Now, to prevent shifting of goalposts, let’s examine your posts:


This post asserts: the M1 performs well in only certain highly optimized tests.
Which is wrong.

The post quoted at top, which has since been deleted (lol), above shifts the goalposts to “chess benchmarks”. Naturally, to narrow the goal to the benchmark shown at the beginning of this thread, the stockfish chess benchmark. I won’t even comment on this because there’s been multiple people giving intelligent and detailed answers as to why this specific benchmark shows the M1 underperforming (which you ignore).

And to top it off, you’ve ignored any argument that uses basic logic without technical arguments that goes against your original argument. Which was that the M1, somehow, contrary to all evidence, contrary to the experience of users and reviewers, only performs in specific optimized tasks.

In fact, the opposite is true. The M1 underperforms in this single chess benchmark, which apparently has eluded everyone across the tech press and anyone who has bought an M1, except of course, you an the original poster of this thread.

I hope your lunch of glue and crayons is delicious.
It would seem that Leifi only respects open benchmarks that are basically unmaintained on Apple Silicon, he considers these a fair evaluation for some reason...
 

leman

macrumors Core
Oct 14, 2008
19,521
19,678
The proof is already posted in this thread earlier with a bunch of benchmarks that the M1 gets beaten..

Your “bunch of benchmarks” are fundamentally flawed, which has been pointed to you out again and again. Phoronix test suite has its uses but it’s not a representative suite and testing has not been dine in a systematic fashion. Your result is based on a handful of scattered benchmarks using oddball tools, most of which are not optimized for the platform. this is not how you test performance if a platform.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,678
It would seem that Leifi only respects open benchmarks that are basically unmaintained on Apple Silicon, he considers these a fair evaluation for some reason...

That’s the thing. Nobody uses open benchmark. And the phoronix test suite is a random collection on things mostly geared towards showcasing Linux performance. The concept is great - collect as many tools as possible to have a better overview, but if you do that you also need to consider the maturity of the software. In all Phoronix tests, macOS usually loses big time, because it’s being dragged down by all the terribly performing OpenCL tools nobody cares about.
 

JimmyjamesEU

Suspended
Jun 28, 2018
397
426
That’s the thing. Nobody uses open benchmark. And the phoronix test suite is a random collection on things mostly geared towards showcasing Linux performance. The concept is great - collect as many tools as possible to have a better overview, but if you do that you also need to consider the maturity of the software. In all Phoronix tests, macOS usually loses big time, because it’s being dragged down by all the terribly performing OpenCL tools nobody cares about.
Plus Phoronix is joke in the linux community. It's not seen as a bastion of benchmarking.
 

jeanlain

macrumors 68020
Mar 14, 2009
2,462
956
I'm not sure if they were part of open benchmark, but I remember some random algorithms tester by phoronix, on which the M1 performed much better under rosetta than using native ARM code. Because this code was just not optimised at all, no neon, nothing. Rosetta did a better job at translating SMID X86 code.

EDIT: I can't find these results. Maybe I misinterpreted them, but I remember they were discussed here or elsewhere.
 
Last edited:

Taz Mangus

macrumors 604
Mar 10, 2011
7,815
3,504
The more likely explanation is...

When comparing M1 using programs that have been highly optimized for various CPU architectures, Apple silicon does seem to fall down pretty flat. The M1 only seems to shine when comparing highly optimized M1-code compared to nonoptimized, generic x86 code not taking full advantage of all cores,that architecture and avialable special instructions for these CPUs. If both are optimized M1 always seem to kinda s*ck, sadly... I would be very happy if it didn't.
Your ignorance is showing in how you are posting. You are being disingenuous and obtuse. It has already been established that the SF TensorFlow backend is not using metal API. Which means that the more accurate statement would be:
When comparing M1 using the SF code programs that have been highly optimized for X86 and ARM Linux and not Apple M1, the SF code falls flat on its face on the M1
 
  • Like
Reactions: ddhhddhh2
Status
Not open for further replies.
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.