@leman: That's a plausible theory, I haven't ever thought about it! Given the nature of the work I won't be surprised if that's the case. I don't know about the apparent efficiency core usage though.
I must say I'm underwhelmed by the performance, given the hype surrounding M1 series. When I'm using a single core, my MBP is comparable to my 2019 27'' iMac (i5-8500, 6 cores). But since I can easily use all of the six cores on iMac, it outperforms MBP, with the cost being the fan blowing all the time. I'm thankful for your interesting theory, but I'd be pretty sad if I couldn't force M1pro to use the CPU cores instead of AMX.
AMX is much faster than using the FP pipeline, so if you say that the performance is underwhelming that’s not it. Could be interfacing problem though. Did you try asking at CmdStan mailing list? The people there are usually very helpful.