The reduction of memory bandwidth almost certainly does not play a role here. In order to saturate the available bandwidth, you’d need all cores including graphic cores to be requesting data.
That´s an important point !
You don´t need to run on all cores to saturate the bandwith. The numbers, as told by apple.
"We" have to understand how the die is built, and how the CPU cores are bound to the RAM, the RAM controllers, and overall how the databus works. The build on the die is donne in packages.
( i allways forget the terms nowadays, so i´m not able to phrase it out correctly into detail)
here is the thread:
Point is, the full bandwith apple tells us is correct if we utilisize all cores.
If we utilisize less cores, will the bandwith -available for our audio work- change.
My main app runs for example singlethreaded only ( GP, a pluginhost).
So, that available Bandwidth number for my audio task changes for me vs. paper specs from apple.
On my M1 was i indeed running under certain circumstances ( Audio, realtime-play tasks) into: the bandwith becoming my new bottle neck for my audio system. This was by running just my one main app (plugin host) , which runs as sayed on just one core.
This is all quite usecase dependend.
Bandwith numbers have to be seen "per packaging" on the die from what i gather.
CPU clusters + RAM controllers + RAM clusters. Something like that.
( plase see the linked thread. There have been some good answers given by knowledgable people)