But then again Apple Silicon runs on much lower frequency. And there are obviously pipeline-induced delays.
Actually 3.2 is not that much lower. Also the pipes are much shorter as well as being far wider. This is essentially the same way Intel Conroe achieved ridiculous speeds at much lower clocks than anyone else when it debuted. The wider the pipe and the shorter the pipe, the more work that can be executed on each clock cycle.