Hi Cmaier!
I found the issue - I typed decoder but meant 8 wide decode block - Apple Silicon cores appear to have both giant caches and be 8 wide. Anandtech has written about it as did a developer. Seeing as Anandtech is the brainchild of Anand Lal Shimpi of the Apple Silicon team I think they have a good knowledge source.
Ok, this is what it says:
Other contemporary designs such as AMD’s Zen(1 through 3) and Intel’s µarch’s, x86 CPUs today still only feature a 4-wide decoder designs (Intel is 1+4) that is seemingly limited from going wider at this point in time due to the ISA’s inherent variable instruction length nature, making designing decoders that are able to deal with aspect of the architecture more difficult compared to the ARM ISA’s fixed-length instructions.
I don’t disagree with that, in principle - it says “seemingly” and “at this point in time.” So it’s not claiming there is some inherent unsolvable problem. And someone earlier claimed that AMD said it couldn’t go wider, and I don’t think AMD ever said that.
I think it all comes back to my point - going wider would have diminishing returns, because the added pipelines wouldn’t be filled sufficiently often to make it worth the added hardware worth it.