You keep saying that, yet refuse to acknowledge that properly utilising hardware instructions can have an order of magnitude speedup, and fitting your algorithm to hardware can have similar benefits.But for the non existant prospect of increasing Stockfish speed by 200-300% on a Apples ARMv8.4 CPU should be ovious for you if you even have a basic grasp of the coding you refer to.
I've personally seen (in my earlier programming days) a 50-100x speed up by simply changing a buffer size for file IO (to a hard drive).
AES instructions on intel for example can be up to a 30x performance improvement for crypto. Not 30%. 30x!
Bad code can be very slow. Even "good" code running in an environment it is not written for can expose problems or fail to exploit the machine it is running on resulting in bad performance.