How can one state "lower than expected on their compute capability" for the M1 GPU's when there is nothing to compare it with in terms of Apple Silicon?! What was the actual expectation considering nothing was available from Apple to compare - even if you think of A series chips Blender isn't made for such chips, so what was teh expecation when nothing was available prior to the M1 series chips to begin with?
You compare it to other GPUs with similar compute capabilities? Not sure what the confusion is?
For example, M1 Max has the maximal theoretical compute throughput of 10TFLOPs (which has been experimentally verified). But in Blender its performance is very close to 5-6TFLOPs GPUs from other vendors (GTX 1660 Ti, RTX 2060). This can be explained either in terms of less mature software implementation, architectural limitations (whatever the nature of that limitation is), or both.
Now, M2 Max is two times (2x!!!) faster in Blender, while the hardware itself is only 30-40% faster nominally (M1 Max is 10TFLOPs, M2 Max should be around 13TFLOPs). It's Blender score is now very close to 15TFLOPs Nvidia GPUs (like the 3070 mobile), which is a fantastic improvement and shows that M2 family now offers similar (or better) Blender performance per FLOP as Nvidia GPUs. This pretty much rules out the software factor (both M1 Max and M2 Max were benchmarked using the same software), so it must be due to a hardware change in M2 family. No idea what this change was, but it allowed M2 Max to reach the same performance potential in a complex GPU workload as mature desktop GPUs. This is a huge thing for Apple, especially since Apple can get to these performance levels using 2-3x less power.