No it’s standard clang 9 something. Andrei measured results to a clang he downloaded and ran and the results were almost identical. Apple does do weird things like take the compiler from the version and library from this other version which pisses some people off because it makes the build idiosyncratic, but for the purposes of benchmarking it’s basically identical. As far as I can tell, Apple upstreams everything as clang is basically their baby. I think that Apple’s Xcode has a built in flag that standard clang doesn’t that helps a lot in one specific subtest.Source? I'm sure they optimized particularly the backend as much as possible.
Even if that is true it doesn't mean that it's possible get an additional 45% out of Apple's optimized Xcode compiler on the M1 too. I think it's only fair to use the best compiler available for each platform.
In terms of could Apple and AMD get similar uplifts? Yes and no. Intel has been doing this for years and as a result people have a pretty good idea of how they achieve this. What Intel is doing here is basically using a highly specialized auto vectorization tool that normally would require you to manually rearrange or pragma your code for the compiler to recognize the opportunity to vectorize. Incredibly impressive compiler engineering (occasionally breaks stuff though and not all programs benefit). They also sometimes have used specialized libraries that Spec calls on that Intel has rewritten to be faster on Intel chips. So no flags alone won’t get you there. But that’s not to say that this is reflective of anything but compiler differences and the others could very well get the same uplift.
And therein lies the rub. We actually don’t exactly know what compilation techniques they used because it’s a closed source variant of an open source backend so we can’t say for certain what is and is not available elsewhere. Why for instance would you assume that Intel would get a free 48% increase but not anybody else if the same techniques were applied? Basically again you’re measuring the difference in compiler not hardware. If you want the most optimized version for each then at most I’d say use the same compiler but turn PGO on if you want the most optimized machine code for each processor - at least the compiler is still the same.
TLDR: Intel is using a different compiler that produces optimizations for Intel chips without flags that AMD and Apple don’t get to claim hardware wins all on a compiler that really isn’t used for consumer software and likely isn’t the most used for HPC either (gcc likely is). So it’s disingenuous unless Intel is selling a complete HPC solution to customers which they aren’t here.