I would expect Anandtech to use the best compiler for each processor. If GPU benchmarks use the best API for each GPU, why would benchmarks for CPU not use the best compiler for each CPU?
Not really equivalent. The API a particular piece of code uses often can't simply be changed on a whim (some benchmarks do have multiple APIs, often with varying degrees of optimizations applied to each, but most don't). So indeed you are testing the performance of the API as much as the hardware. And some GPU benchmarks do use generic APIs like Vulkan, but that can have pros and cons too. GPU performance testing is thus even more full of caveats than CPU testing for that reason. It depends on what you view as the purpose of benchmarking:
1) What is the max performance possible?
2) What is are reasonable performance expectations in code yet to be written?
3) What is the actual performance on actual production code used in the wild right now?
Anandtech testing aims for 2) and 3), their choice of compiler and settings reflect that. If you vary the compiler, then you're testing the choice of the compiler as well as the processor. However, even in not changing the compiler, a compiler could produce better or worse optimized assembly for different architectures like ARM vs x86. Nothing is ever perfect.
I should note that I was wrong: the performance delta between LLVM and ICC used to be quite large, larger than I thought ... but emphasis on used to be. The references to that gap are years old and as of now, Intel has adopted LLVM as a backend.
Intel reported in a blog this week that its adoption of the open source LLVM architecture for Intel’s C/C++ compiler is complete. The transition is part of Intel’s ongoing effort […]
www.hpcwire.com
So now I have no idea what performance deltas there might be between Intel's ICC and standard LLVM if any.
EDIT: I should've read more carefully, not all the optimizations were upstreamed to LLVM, thus ICC still boasts a major increase in performance on Intel processors. Intel claims >40% on SpecInt than standard clang-LLVM which would explain these results. Thus this is all compiler shenanigans.