Hi all,
You all are probably already familiar with these results, and I apologize for wasting your time if you are, but I found some of the implications intriguing.
So, first of all, I want to first thank Sopel for his fine optimization work on Stockfish.
I decided to benchmark Stockfish 14.1 as well as Sopel's neon_opt version that I compiled with PGO on a few different benchmarks both with and without nnue and whose run times spanned 15 to 42 minutes on three different laptops: a 2019 15" Intel i9 MBP, a 2020 13" Apple M1 MBP, and a 2021 16" Apple M1 Max MBP.
The following PDF includes the summary results of these benchmarks. Both values as well as relative ratios of values are provided. [If you can't read the red text below, then the tables at the end includes measurements and relative ratios separated by a colon. For instance, the typical units are (nps:rel) that denotes "nodes per second:relative ratios".]
View attachment 1928161
View attachment 1928162
Some of the things to notice in the above tables:
(1) Sopel's optimizations increase the nps for both Apple Silicon as well as for Intel Silicon.
(2) The 2020 MBP 13" with the Apple M1 uses much less power than the 2021 MBP 16" Apple M1 Max, especially relative to its lower nps values.
(3) While the Package Power measurements are not directly comparable between Apple Silicon and Intel, since the Intel power measurements do not include DRAM, GPUs, neural engines, etc. that are included in the Apple Silicon measurements, still the Applie Silicon package powers are significantly lower than Intel's while at the same time providing comparable (M1) or superior (M1Max) nps values. [More on this topic to follow.]
(4) In my hands, the nps results are not much affected by hash table sizes between 1024 MB and 8192 MB, but I decided to use 8192 MB tables in all of the above benchmark runs.
(5) For the optimized code without nnue at the maximum number of threads for each CPU, the M1 Max had nps values 2.04 times larger than the M1 while the Intel i9 had nps values 1.04 times higher than the M1.
Even more intriguing were the temperatures, fan speeds, and CPU histories. The following figures show these.
View attachment 1927699
The above figure shows the 2020 MBP 13" M1's CPU histories for 8 threads (left-hand diagram), for 4 performance threads (middle diagram), and 1 thread (right-hand diagram). These histories are as expected, but more intriguing are the corresponding temperatures and fans, as shown below:
View attachment 1927701
The above figure shows the 2020 MBP 13" M1's temperatures and fan speeds for 8, 4, and 1 threads. Notice that the fan speed never reaches its maximum rate , even for the 8 thread benchmark.
Now for the 2021 MBP 16" M1 Max:
View attachment 1927705
The above histories for the 2021 MBP 16" M1 Max are for 10 theads, 8 performance threads, and 1 thread (left to right diagrams). Interestingly, Stockfish, when utilizing just 1 thread, only appeared to use the P0-Cluster (3 to 6) and not the P1-Cluster (7 to 10) of CPUs.
View attachment 1927709
The above figure shows the corresponding temperatures and fan speeds. Notice that the CPU temperatures for the performance cores basically maxed out at 100C, but the fans never got above about 41% of their maximum RPM speeds. On the other hand, when I set the fans to their maximum RPM speeds and ran the 10 thread benchmark, the following temperatures were found:
View attachment 1927713
So notice in the above figure that when the fans are set to their maximum RPMs, then the CPU temperatures fall to 80C or less.
And on to the 2019 MBP 15" Intel i9:
View attachment 1927714
The above CPU histories are for the 2019 MBP 15" Intel i9 laptop using 16, 8, and 1 thread (left to right diagrams). Notice that when using just 8 Stockfish threads, the hyperthreaded character of the Intel architecture was not employed, as expected. This was also true for the benchmark using just 1 Stockfish thread.
View attachment 1927733
The above figure shows the temperatures and fan speeds for 16, 8, and 1 Stockfish thread running the benchmarks on the 2019 MBP 15" Intel i9 laptop. Notice that the fans were pegged at their maximum RPM values for both the 16 and 8 thread benchmarks, unlike these benchmarks running on Apple Silicon.
Some things to notice about the above temperature and fan speed results:
(1) Stockfish does not maximize the fans's RPM speeds on Apple Silicon for the benchmarks that I ran. [I know that it is possible to maximize the fans's speeds, however, as my own QFT/perturbation optimization code does it -- it is just that Stockfish is apparently not testing the Apple Silicon laptops to their maximum power performance. It's probably that I just don't know how to get Stockfish to do so, i.e., I'm sure it's my fault.]
(2) The Apple Silicon laptops were essentially "quiet" during these benchmark runs while the Intel i9 laptop sounded like a jet engine (its usual fan noise, in other words).
Sorry about such a long and tedious post; I apologize.
Solouki
Edit: Replaced the PDF results file with images.