The underlying architecture itself is a major differentiator between Apple's chips and the x86 scene. That Engadget page lightly touches on the width of the M1, but doesn't go quite far enough in terms of detailing how they did that and why it matters. The easiest way I can explain it is when you're entering a toll road such as either the Pennsylvania or Florida Turnpike - if you have only four booths open, you process traffic through at a slower place than if you had 6 or even 8 booths open. But the other difference is logistical in nature. Since x86 process variable-length instructions, each decoder (booth) has to check for a start and end of every instruction at every point in the data stream. (This would be like having to charge each individual in the vehicle the toll instead of charging on a per vehicle basis). Consequently, the practical limit for the x86 architecture is 4 decoder units, something AMD has gone on the record with publicly. Since ARM uses fixed-length instructions, it's is trivial to add more decoder units, as they do not have to play hide and seek with the data stream to find the instructions.