Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Status
Not open for further replies.

jeanlain

macrumors 68020
Mar 14, 2009
2,462
956
View attachment 1763327

That what you're looking for? M1 MBP 13'
Yes, thanks. Your score is quite a bit lower than the 3024 kN/s reported on the forum I linked to.
Supposedly, the executable is M1 native since it's installed by homebrew. But your M1 is only about 35% faster than my iMac, while it should be more like 65% faster.
Can you run
Code:
file $(which stockfish)
to check if it's universal?
 

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,126
2,706
Benchmark: Stockfish (chess) speed

M1 CPU = 13000 kn/s

i7 3930k overclocked (from 2011 = 10 years old) = 13000 kn/s

Others = 40000 kn/s - 80000 kn/s

Desktop CPUs = 230000 kn/s and much stronger
Stockfish is scaling very well with number of CPU cores. The actual performance matters very little, more cores = better score. Your 10 year old CPU is 6c/12t, the M1 is 4hp/4he. Makes a comparison useless as it's not the actual performance that's compared here. It's also the reason why AMD 32c/64c CPUs are on top, sheer number of cores, independent of performance.
 

BigMcGuire

Cancelled
Jan 10, 2012
9,832
14,032
Yes, thanks. Your score is quite a bit lower than the 3024 kN/s reported on the forum I linked to.
Supposedly, the executable is M1 native since it's installed by homebrew. But your M1 is only about 35% faster than my iMac, while it should be more like 65% faster.
Can you run
Code:
file $(which stockfish)
to check if it's universal?
bigmcguire@BorgCube /Applications % file $(which stockfish)
/opt/homebrew/bin/stockfish: Mach-O 64-bit executable arm64

Interesting indeed :D
 

BigMcGuire

Cancelled
Jan 10, 2012
9,832
14,032
Yes, thanks. Your score is quite a bit lower than the 3024 kN/s reported on the forum I linked to.
Supposedly, the executable is M1 native since it's installed by homebrew. But your M1 is only about 35% faster than my iMac, while it should be more like 65% faster.
Can you run
Code:
file $(which stockfish)
to check if it's universal?
I rebooted, unplugged 4k monitor, waited for OS fresh boot to calm down and got similar results.
 

jeanlain

macrumors 68020
Mar 14, 2009
2,462
956
bigmcguire@BorgCube /Applications % file $(which stockfish)
/opt/homebrew/bin/stockfish: Mach-O 64-bit executable arm64
Thanks. The difference with the numbers posted online may reflect different versions of Stockfish. I think they used the "classical" version while we've installed the current (NNUE) version, which currently runs slower on M1 due to missing/insufficient SIMD optimisation. However, the NNUE version has the potential to be much faster on the M1 thanks to its Neural Engine (which would require someone to implement the algorithm in coreML, which may not happen anytime soon).
 

BigMcGuire

Cancelled
Jan 10, 2012
9,832
14,032
View attachment 1763327 Second Run: View attachment 1763330

That what you're looking for? M1 MBP 13'

My 2019 MBP 15' i7: (fans were screaming)

1619303566208.png


Wow.

M1 got 2074168 nodes/second.
 

falainber

macrumors 68040
Mar 16, 2016
3,539
4,136
Wild West
Looks like Stockfish doesn't use the Neural Engine in the M1 (which is expected, since it's a multiplatform tool apparently). If you could find a chess engine that uses the Neural Engine (using CoreML) the results might be quite different.
That would be relevant if Stockfish used a Neural Engine (or its equivalent) on x86 CPUs but does it?
 

mi7chy

macrumors G4
Oct 24, 2014
10,625
11,296
Just got back and looked into this more.

Homebrew and stockfish formula are native apps.

https://brew.sh/2021/02/05/homebrew-3.0.0/

https://formulae.brew.sh/formula/stockfish#default

To compare with results from openbenchmarking.org the same parameters should be used except for thread count.

On MBA M1 results are in-line with OP's but his is slightly higher since he probably has a MBP M1 vs my results for MBA M1.

4 threads
stockfish bench 128 4 24 default depth
===========================
Total time (ms) : 55575
Nodes searched : 453871217
Nodes/second : 8166823

8 threads
stockfish bench 128 8 24 default depth
===========================
Total time (ms) : 56470
Nodes searched : 661228390
Nodes/second : 11709374

Increasing the thread count beyond 8 just makes it worse but no surprise since M1 doesn't support SMT.

For comparison, on older low end AMD 3800xt 8-core 16-thread CPU the results are:

8 threads
stockfish_13_win_x64_avx2.exe bench 128 8 24 default depth
===========================
Total time (ms) : 36246
Nodes searched : 653634657
Nodes/second : 18033290

16 threads
stockfish_13_win_x64_avx2.exe bench 128 16 24 default depth
===========================
Total time (ms) : 38215
Nodes searched : 999371451
Nodes/second : 26151287

Conclusion is stockfish benefits from SMT and also optimized for AVX2 vector instruction and BMI2 bit manipulation instruction on Zen 3 (AMD 5000x line of CPUs) which should be even faster. M1 doesn't support SMT but does have SVE vector and BITIMTR bit manipulation instructions so it's to be seen if stockfish can be further optimized for M1.

OP is too advanced and should be hanging out on Phoronix instead.
 

mi7chy

macrumors G4
Oct 24, 2014
10,625
11,296
OP posted results without specifying anything and attributed scores of 230000 kN/s to desktop CPUs. OP doesn't appear very advanced.

I was able to replicate his results and just an average user and started using MacOS since 2/2021. Prior to that it's Mac SE.
 

jeanlain

macrumors 68020
Mar 14, 2009
2,462
956
I was able to replicate his results and just an average user and started using MacOS since 2/2021. Prior to that it's Mac SE.
And prior to your post, we compiled and ran stockfish on our Macs. Maybe you were inspired.
We were not interested in reproducing the OP's number as 8/16-thread runs are not very relevant to compare a ~20W CPU with 4 performance cores to a 105-W CPU (AMD 3800XT) with 8 SMT cores.

Anyway, it doesn't change the fact that the OP isn't much "advanced", as they provided no way to reproduce the results they posted. One cannot reproduce an experiment by guesswork.
 

Martyimac

macrumors 68020
Aug 19, 2009
2,460
1,695
S. AZ.
I quit looking at these "benchmarks" a while back. How it performs doing the tasks we want is the real test.
My observation between my new MBA 7core/256 vs my iMac with Core I9 8 core @ 3.g GHz . Using LibreOffice, same version on both, is that LibreOffice on the MBA starts up as fast as the iMac. Keep in mind that LibreOffice on the MBA has to run through Rosetta 2.
That tells me the new M1 chips are more than enough for anything I will ever want to do.
 

mi7chy

macrumors G4
Oct 24, 2014
10,625
11,296
For comparison, results from $500 Lenovo Yoga 6 with 7nm AMD 4650U 6-core 12-thread comparable to MBP M1 since they both have fan but only spin up under heavy load. Now need to find 4800U and 5800U 8-core 16-thread.

One thing I noticed is running a few tests on the MBA M1 tanks the battery but not on the AMD. Have to do a controlled test to verify.

12-thread
stockfish_13_win_x64_avx2.exe bench 128 12 24 default depth
===========================
Total time (ms) : 68529
Nodes searched : 830608234
Nodes/second : 12120536
 
Last edited:
  • Like
Reactions: Andropov

ADGrant

macrumors 68000
Mar 26, 2018
1,689
1,059
Funny how people selectively preach dorkbench when M1 does well but downplay a more relevant real world workload when M1 doesn't. Chess has been a relevant workload and benchmark going back to IBM Deep Blue to current DeepMind AlphaZero that topple all the grandmaster human players.

There was maybe one intelligent response trying to understand what the bottleneck is.
I don't see how it is a relevant benchmark if it is an Intel binary running via Rosetta 2.

Also, relatively few people actually play chess. I guess those that do should stick with Intel for now.
 

TiggrToo

macrumors 601
Aug 24, 2017
4,205
8,838
Results for $500 Lenovo Yoga 6 with 7nm AMD 4650U 6-core 12-thread comparable to MBP M1 since they both have fan but only spin up under heavy load. Now need to find 4800U and 5800U 8-core 16-thread.

12-thread
stockfish_13_win_x64_avx2.exe bench 128 12 24 default depth
===========================
Total time (ms) : 68529
Nodes searched : 830608234
Nodes/second : 12120536
So what? What possible meaning can you get out of this "benchmark"?

This isn't a (ahem) measuring contest.
 

robco74

macrumors 6502a
Nov 22, 2020
509
944
No, it's time to throw in the towel and admit that M1 is overhyped. I mean, if it can't dominate in that one benchmark, it just isn't worth buying. We should all abandon our Macs and switch. Surely Apple was foolish if they thought they could take on AMD and Intel.
 

leons

macrumors 6502a
Apr 22, 2009
662
344
No, it's time to throw in the towel and admit that M1 is overhyped. I mean, if it can't dominate in that one benchmark, it just isn't worth buying. We should all abandon our Macs and switch. Surely Apple was foolish if they thought they could take on AMD and Intel.
Agreed. EOF. DFTT.
 

Gnattu

macrumors 65816
Sep 18, 2020
1,107
1,671
I was able to replicate his results
Where? Your 3800x is score is not even close to the 230000 kn/s desktop CPU line, almost 90% lower. Either OP is using an 80 core 160 thread x86 CPU, or the OP is using a different configuration.

OP is too advanced and should be hanging out on Phoronix instead.
Oh, please don't. Do you expect a Phoronix thread comes with only a result but no testing configuration?

M1 doesn't support SMT

M1 has an 8-wide decoder, which is by fart the widest commercialized design in the industry. Intel Skylake and AMD Zen(1,2,3) only has a 4-wide decoder, mainly due the the complexity of x86 instructions. To utilize such a wide decoder (and the excecution units in the backend) we have to rely on some super-scaler technique, but combination, used in this chess engine, usually has a very sequential instruction dependency which makes instruction level parallelism hard. This particular case is where SMT providing us thread level parallelism shines.

We don't downplay unfavorable benchmarks, but this particular workload is one of many cases that M1 is not good at(for now). If this particular case is very relevant to the user, that user should not buy an M1 equipped Mac. A lot of disagreement here only want to say 'M1 is fast for me' because OP comes with a scary title.
 
Last edited:
  • Like
Reactions: leons
Status
Not open for further replies.
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.