Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Status
Not open for further replies.

AlexChess

macrumors newbie
Apr 28, 2021
1
1
Hi!
Would be possible to use the 16 neural units of Silicon M1 to speedup chess engines search?
(Sorry, this question could be already be answered, i haven't read all previous comments :))

AlexChess
 
  • Like
Reactions: BigMcGuire

BetaPro

macrumors member
Oct 30, 2010
34
1
Hi OP, your desktop stats must have a ton more performance cores than M1 has, which should explain the difference.

And for SF NNUE, it is indeed slower than the x86 counter parts due to weaker SIMD units. But when I ran the benchmark for classical non NNUE version of SF, the single thread performance actually matches the strongest x86 CPUs.

Obviously it won't be able to compare to a 5950x due to way less cores and no SMT..
 

BetaPro

macrumors member
Oct 30, 2010
34
1
Also want to add that Stockfish is absolutely a good benchmark for CPUs. As a matter of fact, SPEC (both 2006 and 2017) contains Sjeng as the chess engine benchmark, which runs similar algorithm to Stockfish.
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
Also want to add that Stockfish is absolutely a good benchmark for CPUs. As a matter of fact, SPEC (both 2006 and 2017) contains Sjeng as the chess engine benchmark, which runs similar algorithm to Stockfish.

You just disproved your point.

SPEC is a good benchmark. Chess is not. Chess is a good benchmark component. But most computing has a very different profile than chess algorithms. That’s why good benchmarks blend the results of many different kinds of computing activity.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,679
And for SF NNUE, it is indeed slower than the x86 counter parts due to weaker SIMD units. But when I ran the benchmark for classical non NNUE version of SF, the single thread performance actually matches the strongest x86 CPUs.

Just to make it clear, M1’s SIMD units are not weaker than those of x86 CPUs. Yes, M1 only supports 128-bit SIMD, but it has 4 of them and they operate with low latency. Most modern x86 CPUs can do two 256-bit SIMD AVX2 operations per clock, M1 can do 4 128-bit ones which has the same net result. And it you use 128-bit operations (e.g. SSE), M1 will be faster.

Now, newest Intel cores do have two 512-bit units, so they can achieve higher throughput under certain conditions and with software specifically written for them.

I dint know how exactly Stockfisch utilizes SIMD, but if M1 doesn’t perform well there my initial assumption would be that it’s simply not well optimized for ARM Neon. M1 usually does incredibly good on number-crunching benchmarks.
 

BetaPro

macrumors member
Oct 30, 2010
34
1
You just disproved your point.

SPEC is a good benchmark. Chess is not. Chess is a good benchmark component. But most computing has a very different profile than chess algorithms. That’s why good benchmarks blend the results of many different kinds of computing activity.

If you define benchmarking like that (must have varied workloads), then I definitely agree with you. It's also why I hate that many x86 fans think that Cinebench is a good benchmark...
 
  • Like
Reactions: jdb8167

BetaPro

macrumors member
Oct 30, 2010
34
1
Just to make it clear, M1’s SIMD units are not weaker than those of x86 CPUs. Yes, M1 only supports 128-bit SIMD, but it has 4 of them and they operate with low latency. Most modern x86 CPUs can do two 256-bit SIMD AVX2 operations per clock, M1 can do 4 128-bit ones which has the same net result. And it you use 128-bit operations (e.g. SSE), M1 will be faster.

Now, newest Intel cores do have two 512-bit units, so they can achieve higher throughput under certain conditions and with software specifically written for them.

I dint know how exactly Stockfisch utilizes SIMD, but if M1 doesn’t perform well there my initial assumption would be that it’s simply not well optimized for ARM Neon. M1 usually does incredibly good on number-crunching benchmarks.

SIMD is a recent addition to Stockfish, and yeah it probably isn't well optimized for Neon yet. I'm not certain about whose SIMD is better, but the current observation is that M1 loses more performance than x86 after the switch to NNUE, maybe I should look into optimizing it for M1.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
If you define benchmarking like that (must have varied workloads), then I definitely agree with you. It's also why I hate that many x86 fans think that Cinebench is a good benchmark...

Yeah it depends on *why* one is benchmarking: throw a bunch varied workloads together to make a general statement about the CPU’s performance (as the OP tried to do, but with only one workload) or to benchmark processes in a specific workflow (or close to it) that the user cares about because that’s what they use a computer for (closer to what the OP actually did).

Both are valid, but different. The latter is more immediately relevant but also brittle - a workflow may change over time and programs may get further optimized. The former is less specific to a specific user’s needs, but is more likely to be robust over time and different tasks.

That’s how I view the utility of benchmarking anyway.
 

BigMcGuire

Cancelled
Jan 10, 2012
9,832
14,032
The idea of being able to have a chess engine crunch my games without getting superheated is really attractive to me. I went 100% laptop in 2015 and that's something I really missed with a Desktop, the ability to have a chess engine go 100% and being able to watch it without frying myself.

I'm Mac now too, I haven't really had time to find a good chess engine interface. I miss the days of having Deep Fritz do massive chess engine tournaments like @AlexChess does :D It's been a long time. I used to have 20-30 chess engines go against each other.

Back in the day I had a computer on ICC with Winboard, even took some open source code (with the author's permission) and ran a custom engine on ICC for years.

This MacBook is the first laptop computer I'm not against pushing because of temps. :D
 

redshift27

macrumors newbie
Jan 6, 2017
10
9
SIMD is a recent addition to Stockfish, and yeah it probably isn't well optimized for Neon yet. I'm not certain about whose SIMD is better, but the current observation is that M1 loses more performance than x86 after the switch to NNUE, maybe I should look into optimizing it for M1.
Please do. Just head over to Stockfish on Github and have a look at the NEON-specific code. Indeed, NNUE and the matrix calculations required therein are described in this excellent document: https://github.com/glinscott/nnue-pytorch/blob/f6a2e30d9393a7a8e62f0e3f8bfeecdf84b373c0/docs/nnue.md

I'm Mac now too, I haven't really had time to find a good chess engine interface.
One option for a good (paid) interface is to run ChessBase or Fritz as the GUI under Parallels / ARM Windows - which works pretty well - and use inBetween.exe and the UCI interface to connect to the Stockfish chess engine natively on the M1 on MacOS. Obviously if you are striving for every CPU cycle there are inefficiencies here, but for simple analysis this can work very well.

Leela (lc0) struggles compared to Stockfish on M1 hardware as it thrives on a powerful GPU which Apple Silicon does not offer. CoreML and the Apple Neural Engine do not currently offer a satisfactory solution to this.
 
Last edited:

mi7chy

macrumors G4
Oct 24, 2014
10,625
11,296
AMD Zen 4 will get 5nm, AVX512, DDR5, etc. so it'll be even faster. M1 only looks good compared to Intel but too many trade offs compared to AMD.
 

jdb8167

macrumors 601
Nov 17, 2008
4,859
4,599
AMD Zen 4 will get 5nm, AVX512, DDR5, etc. so it'll be even faster. M1 only looks good compared to Intel but too many trade offs compared to AMD.
Apple is rumored to be already fabbing the successor to the M1. Should be available by summer. When does AMD start doing 5 nm?
 

leman

macrumors Core
Oct 14, 2008
19,521
19,679
AMD Zen 4 will get 5nm, AVX512, DDR5, etc. so it'll be even faster. M1 only looks good compared to Intel but too many trade offs compared to AMD.

Firestorm offers same performance as fastest Zen 3 configuration, with 1/4 the power consumption. AMD's main selling point is that they offer more CPU cores in their budget models.
 
  • Like
Reactions: jdb8167

dmccloud

macrumors 68040
Sep 7, 2009
3,142
1,900
Anchorage, AK
Funny how people selectively preach dorkbench when M1 does well but downplay a more relevant real world workload when M1 doesn't. Chess has been a relevant workload and benchmark going back to IBM Deep Blue to current DeepMind AlphaZero that topple all the grandmaster human players.

There was maybe one intelligent response trying to understand what the bottleneck is.
A benchmark written to take advantage of Intel-specific instructions that has not been optimized for the M1 (and does not even take full advantage of the M1s capabilities) is a poor measuring stick regardless of whether chess is a "relevant workload" or not. For me, that is meaningless since I don't play chess and have no desire to start. What I DO use the computer for is web development, graphic design, Photoshop, web browsing and some gaming, and the M1 is more than up to the task for all of those purposes and more. What's funny is how people selectively preach benchmarks that can't even properly utilize the M1 as somehow being relevant.
 

dmccloud

macrumors 68040
Sep 7, 2009
3,142
1,900
Anchorage, AK
AMD Zen 4 will get 5nm, AVX512, DDR5, etc. so it'll be even faster. M1 only looks good compared to Intel but too many trade offs compared to AMD.
AMD can't move to 5nm until Apple moves to 3nm, since Apple booked up all 5nm capacity for the foreseeable future and also has preregistered for 3nm.
 

mi7chy

macrumors G4
Oct 24, 2014
10,625
11,296
AMD can't move to 5nm until Apple moves to 3nm, since Apple booked up all 5nm capacity for the foreseeable future and also has preregistered for 3nm.

That's BS along the lines of 8GB on M1 equals 16GB, M1 iGPU is zero copy buffer and x64 is not, M1 iGPU equal high end discrete graphics, etc. Reality is AMD makes CPUs and GPUs for the 90%+ PC marketshare and only 7nm fab can provide that capacity. Nothing stopping AMD from allocating 5nm or next gen node capacity at TSMC but less of a need since their current 7nm surpasses 5nm M1 performance.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,679
Reality is AMD makes CPUs and GPUs for the 90%+ PC marketshare and only 7nm fab can provide that capacity.

This is an excellent point that many posters here tend to neglect. Supply constraints are real. Apple can only do what it does because it targets a specific market niche.

Nothing stopping AMD from allocating 5nm or next gen node capacity at TSMC but less of a need since their current 7nm surpasses 5nm M1 performance.

This is more debatable. Yes, AMD premium desktop CPUs are obviously faster than the entry level low-energy M1 (duh). But those enthusiast-class CPUs are also expensive and in low supply. I’m also not sure that booking 5nm capacity is that simple: Apple did buy almost all of it to produce iPhone chips and they have higher budget for fab expenses than AMD. There is a reason why all other 5nm products on the market currently are from Samsung fabs.
 

Jorbanead

macrumors 65816
Aug 31, 2018
1,209
1,438
their current 7nm surpasses 5nm M1 performance.
AMD 7nm and M1 are not comparable. M1 outperforms AMD when you look at watt per watt. M1 is a 10-15 watt chip. Can you show me a 10-15W AMD chip that outperforms M1 in single thread performance?
 

Gnattu

macrumors 65816
Sep 18, 2020
1,107
1,672
Reality is AMD makes CPUs and GPUs for the 90%+ PC marketshare
If you called others’ statements as BS, then this is also BS.

By the way, outperforming with (more than) doubling the power is not an architecture advantage, period.

You can deny M1’s performance if you want, and I do agree AMD has extremely solid offering. I’ m using 5900x and 3900x and they are fast, but if you let me choose which chip I want to put in a sub 4 pound laptop, I’d choose Apple M1. 5800U is great, but it does use more power. I’m glad to help you to find out if this statement is true.
 
Status
Not open for further replies.
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.