I'm interested to see how M1 performs playing global thermonuclear war.
Do you want to play a game?
I'm interested to see how M1 performs playing global thermonuclear war.
It wins by not playing.Do you want to play a game?
I'm interested to see how M1 performs playing global thermonuclear war.
Also want to add that Stockfish is absolutely a good benchmark for CPUs. As a matter of fact, SPEC (both 2006 and 2017) contains Sjeng as the chess engine benchmark, which runs similar algorithm to Stockfish.
And for SF NNUE, it is indeed slower than the x86 counter parts due to weaker SIMD units. But when I ran the benchmark for classical non NNUE version of SF, the single thread performance actually matches the strongest x86 CPUs.
You just disproved your point.
SPEC is a good benchmark. Chess is not. Chess is a good benchmark component. But most computing has a very different profile than chess algorithms. That’s why good benchmarks blend the results of many different kinds of computing activity.
Just to make it clear, M1’s SIMD units are not weaker than those of x86 CPUs. Yes, M1 only supports 128-bit SIMD, but it has 4 of them and they operate with low latency. Most modern x86 CPUs can do two 256-bit SIMD AVX2 operations per clock, M1 can do 4 128-bit ones which has the same net result. And it you use 128-bit operations (e.g. SSE), M1 will be faster.
Now, newest Intel cores do have two 512-bit units, so they can achieve higher throughput under certain conditions and with software specifically written for them.
I dint know how exactly Stockfisch utilizes SIMD, but if M1 doesn’t perform well there my initial assumption would be that it’s simply not well optimized for ARM Neon. M1 usually does incredibly good on number-crunching benchmarks.
If you define benchmarking like that (must have varied workloads), then I definitely agree with you. It's also why I hate that many x86 fans think that Cinebench is a good benchmark...
Please do. Just head over to Stockfish on Github and have a look at the NEON-specific code. Indeed, NNUE and the matrix calculations required therein are described in this excellent document: https://github.com/glinscott/nnue-pytorch/blob/f6a2e30d9393a7a8e62f0e3f8bfeecdf84b373c0/docs/nnue.mdSIMD is a recent addition to Stockfish, and yeah it probably isn't well optimized for Neon yet. I'm not certain about whose SIMD is better, but the current observation is that M1 loses more performance than x86 after the switch to NNUE, maybe I should look into optimizing it for M1.
One option for a good (paid) interface is to run ChessBase or Fritz as the GUI under Parallels / ARM Windows - which works pretty well - and use inBetween.exe and the UCI interface to connect to the Stockfish chess engine natively on the M1 on MacOS. Obviously if you are striving for every CPU cycle there are inefficiencies here, but for simple analysis this can work very well.I'm Mac now too, I haven't really had time to find a good chess engine interface.
Apple is rumored to be already fabbing the successor to the M1. Should be available by summer. When does AMD start doing 5 nm?AMD Zen 4 will get 5nm, AVX512, DDR5, etc. so it'll be even faster. M1 only looks good compared to Intel but too many trade offs compared to AMD.
AMD Zen 4 will get 5nm, AVX512, DDR5, etc. so it'll be even faster. M1 only looks good compared to Intel but too many trade offs compared to AMD.
A benchmark written to take advantage of Intel-specific instructions that has not been optimized for the M1 (and does not even take full advantage of the M1s capabilities) is a poor measuring stick regardless of whether chess is a "relevant workload" or not. For me, that is meaningless since I don't play chess and have no desire to start. What I DO use the computer for is web development, graphic design, Photoshop, web browsing and some gaming, and the M1 is more than up to the task for all of those purposes and more. What's funny is how people selectively preach benchmarks that can't even properly utilize the M1 as somehow being relevant.Funny how people selectively preach dorkbench when M1 does well but downplay a more relevant real world workload when M1 doesn't. Chess has been a relevant workload and benchmark going back to IBM Deep Blue to current DeepMind AlphaZero that topple all the grandmaster human players.
There was maybe one intelligent response trying to understand what the bottleneck is.
AMD can't move to 5nm until Apple moves to 3nm, since Apple booked up all 5nm capacity for the foreseeable future and also has preregistered for 3nm.AMD Zen 4 will get 5nm, AVX512, DDR5, etc. so it'll be even faster. M1 only looks good compared to Intel but too many trade offs compared to AMD.
AMD can't move to 5nm until Apple moves to 3nm, since Apple booked up all 5nm capacity for the foreseeable future and also has preregistered for 3nm.
Ever since I saw WarGames in 1983, my default answer has always been no.Do you want to play a game?
Reality is AMD makes CPUs and GPUs for the 90%+ PC marketshare and only 7nm fab can provide that capacity.
Nothing stopping AMD from allocating 5nm or next gen node capacity at TSMC but less of a need since their current 7nm surpasses 5nm M1 performance.
AMD 7nm and M1 are not comparable. M1 outperforms AMD when you look at watt per watt. M1 is a 10-15 watt chip. Can you show me a 10-15W AMD chip that outperforms M1 in single thread performance?their current 7nm surpasses 5nm M1 performance.
And even if they were, Apple isn't using them.AMD 7nm and M1 are not comparable. M1 outperforms AMD when you look at watt per watt. M1 is a 10-15 watt chip. Can you show me a 10-15W AMD chip that outperforms M1 in single thread performance?
If you called others’ statements as BS, then this is also BS.Reality is AMD makes CPUs and GPUs for the 90%+ PC marketshare