Surprised there hasn't been any discussion yet of the problems with Apple's Accelerate.framework and Numpy/Scipy, specifically that both projects have
dropped official support for it due to buggy and very out-of-date BLAS and LAPACK implementations (Accelerate.framework supports the LAPACK 3.2.1 API, which was released in 2009). According to the maintainers they've had no luck getting Apple to take a look at the problems.
As someone who relies heavily on both packages (both directly and indirectly as dependencies of others) this sucks, because without a workaround it means we can't leverage the blazing speeds of the AMX to speed up matrix math in most Python code. The single-core performance is still fast enough that it's not a show-stopper, but it's something I really wish Apple would work with the devs to fix (like they did with TensorFlow and Blender).
Leman's right though that R and STAN performance are off the charts, the huge cache and single-core speed on these chips makes for some big reductions in model time (my M1 Pro smokes our lab's ~4 year old dual-Xeon server by a factor of 2 at 6-chain models). I needed to install the experimental branch of RStan/StanHeaders from GitHub to get around a weird bug that broke adapt_delta, but other than that it's been smooth and fast sailing!