Special compiles of Stockfish, cFish, etc. are already optimizing compiles for using M1:s NEON, etc.
They can use whatever they want, it does not mean that the code optimally uses the hardware. For example, if you code uses a naive loop with vector instructions to accumulate a long chain of numbers, an x86 CPU with AVX2 will win - it can do twice as many additions per cycle and it runs higher clocks, and your execution is limited by the dependency chains. But if you unroll your code to allow multiple instructions to execute simultaneously, M1 will get ahead as it has more vector processing units and better memory subsystem.
But they are still less than half the speed of similar priced, similar-sized, computers.
This is not true. Benchmarks show M1 outperforming anything at the same power consumption.
The fact of the matter is that M1 isn't that fast as the usual influencer-types (fanbois) make it out to be. A similar priced modern CPU from AMD runs circles around it. For certain use-cases, it may be "ok" for its "watt" but let's keep it real.. the CPU is faster on Apple PowerPoint.-presentations than it is in real-life performance. It's more or less a glorified std. ARM big-little phone CPU with a focus mainly on the low-power slow "little"-cores and relies heavily on optimized code to even be comparable to intel/AMD these days. The CPU is overrated and underperforming.. (not only for chess).
Complete nonsense. I might agree that for chess engines, the performance is slightly disappointing. For most other applications, M1 has demonstrated high performance. Industry-standard benchmarks, real-world use as well as specialized tests make it very clear. You can’t cherry pick a single domain and ignore everything else.
Just compare it with amazing new stuff like the AMD 5700G and realize that anyone looking for "real" performance of CPU+GPU for the dollar should look elsewhere than the fruity un-open company these days.
AMD 5700G uses 5x as much power than M1 and has a slower GPU. It is slower than M1 in single core and 30% faster in multi-core, despite having twice as many cores and using much more power. Yes, performance per dollar is very good. If you want performance per dollar, there are much better options out there than Macs.
No AVX2
No AVX512 (VNNI)
M1 does not need AVX2 since it has double the number of vector compute units than x86 CPUs. Modern x86 can do 2 256-bit AVX2 operations per cycle, M1 can do 4 128-bit Neon operations per cycle. And M1 sind operations have lower latency. There are plenty of tests showing that M1 can hold its own agains AVX512.
M1 does not need VNNI since it offers multiple extensions for machine learning. It has the AMX coprocessor that is capable of performing 256 FP32 operations per cycle. A modern Intel AVX512 core can only do 16 operations per cycle. In other words, a single M1 has matrix multiplication throughput comparable of that of 16 Tiger Lake cores. Not to mention that M1 has an additional matrix multiplication unit (the NPU), which is still faster.
No Hyperthreading
Who cares? Hyperthreading is a hack designed to squeeze a bit more multithreaded performance of a CPU that is unable to efficiently utilize its execution resources. M1 can have three times more instructions in flight than fastest x86 cores, it won’t benefit from hyperthreading.
…
…
…
How to compensate the losses?
Write better code. If your program runs slower on M1, you are probably doing something very wrong.
Last edited: