As long as you are not relying on any extensions explicitly (it’s not like Apple makes them visible to the user anyway), does it really matter? I would expect the behavior to be consistent across various ARM CPUs, so one should be able to develop and test on Apple and then deploy to the cloud.
This is out of my wheelhouse, so I'm just asking, but here's what I had in mind:
Suppose a developer is trying to optimize code for maximum performance, and he's comparing two ways of getting the same thing done, which we'll call code block A and code block B. It so happens that (without the dev's knowledge*), block A makes use of Apple's extended instruction set, while block B does not. As a consequence, code incorporating block A is faster on the AS Mac Pro than that using block B.
However, when deployed on the cluster (which does not have Apple's extended instruction set), B is faster than A.
Hence the AS Mac Pro would not be a good tool for doing such development testing unless it would be possibe to implement that same extended instruction set on the ARM cluster itself.
*You wrote that the extensions aren't visible to the user, suggesting to me that devs could be unaware which instructions are actually used to implement their code.
***************
Come to think of it, if you're spending 10's or 100's of millions for a new supercomputer, wouldn't it make sense to purchase, along with it, several custom workstations (for development work) that use the same processors, and as close as possible to the same architechture, as the supercomputer itself? It seems that's the only way you could do processor-specific (e.g., accounting for the sizes of the various levels of cache) and architechture-specific optimizations using workstations.