Genuine question: Does this benchmark utilize the neural engine and machine learning features of M1? If AI is one of the purposes of this benchmark, you’d think you would want to actually test the hardware that was designed for that purpose.
I was actually waiting for the chess defender to inform us about this. Guess googling it is taking a while. But since you ask, you're spot on. The answer is no. There's also no reason to utilize the NE. This is not what Stockfish is about, it's using a different approach which is why actual learning approaches without historic data can be much better at playing chess. It all depends on the setup and goal.
I remember a older paper that pointed out that most if not all chess engines are using heuristically determined subsets of moves, making these moves and then recursively doing it for new positions over and over again. This is in general bad for GPUs, as recursion performance is pretty bad on GPUs. OpenCL used to not support recursion, as it's not supported by all hardware. I'm mostly using CUDA for my research nowadays, but maybe
@leman can shed some light on it if this is supported right now.
CUDA does support recursion, however it's heavily relying on stack size per thread and they're not that large. When overflowing, this get's pushed to global memory which is introducing further latency and makes things inefficient. That's why it's usually avoided on GPUs, especially when not knowing depth and width. Then again, M1 features unified memory so this should be a major benefit for such problems. So the big question is, does GPU/NE recursion work in the current Apple eco system. I remember errors calling recursive functions from kernels, but that could have changed with newer versions of Metal and M1.