Given that the hype train for the M1X is full steam, and performance estimates have been high. I think it might be important to temper our expectations.
So as a thought experiment, let’s make a realistic worst-case scenario for the M1X.
1. No ipc or clock speed improvements over the M1.
2. No improvements to number of display outputs
3. Maximum 16GB of memory
4. No improvements in memory speed, uses LPDDR4 still.
5. More P cores, but not a big boost in performance.
6. More graphics cores, but not a huge jump in graphical performance. Maybe 50% above the M1.
Thoughts?
You forgot the "doomsday" side effect that it may spit locust out occasionally or something.
This isn't as 'worst" a you make it out to be. This exact combination is a lower and lower probability the more you tie them together in a total conjunction.
First the context of what talking about here. The "X" stands for a substantively bigger die with more stuff on it. In that context. ( Let's stay 2x P cores , drop 2E cores , and add 2-4x GPU cores, one more TB port/controller, and about 2x increase in cache sizes. So end up with a 2.20-2.5c larger die. ). For example move from a 120mm^2 die to a range around 280-300mm^2 die.
1. No IPC or clock improvements or losses is a pretty much a 'win'. Generally when double cores the clock speed drops. Power dissipation problems , clock propagation issues , internal network latencies/'traffic jams' , etc.
If can get no single thread performance drop offs and linearly scale up then that there would be there is a very substantively better "uncore" improvements here. That is one of the principle things Apple would need for a substanitally bigger die.
[ In contrast, If Apple has to lean heavily on a sub-node process shrink to get scale. That can be indicative that the "uncore" isn't really up to the task of scaling. . ]
2. Moving to a 290mm^2 die would be around a 60% increase of circumference for the die. If all of that increase had to be thrown at more RAM channels to keep the GPU+Core feed with enough bandwidth then yeah that would be a scaling issue.
Frankly, more "powerful" A15 cores that have even higher bandwidth needs may not solve that issue.
The easier out would be to just 300mm^2 for the die. It is still relatively midsize ( not really a "large die" in 2021 terms. Skylake Xeon Low-core-count die was about 320mm^2 the XCC die (i.e., large ) was about 697mm^2 ).
Apple's SoC black hole asbsorbtion of the dGPU here of the MBP 16" is going to require some hefty size increase over an iPad Pro sized SoC. Not like a real black hole where the new matter is crushed to a singularity. The mass is going to be bigger if "suck it inside ". I guess more like the 'Blob'.
This can be coupled to issue 4. (more on that later ).
3. Issue 3 and 2 happening at the same time is a bit of an oxymoron. Both problems at the same time really don't happen with the very large increase in circumference and die size. Either threw the whole circumference at more memory channels ( e.g., gone from 2 RAM die stacks per die to 4 (or more) die stacks per die ) or you didn't. More capacity will come with more die stacks. And if you are 'stuck' with just 2 die stackes there is gobs and gobs and gobs of space for a raster display controllers . If give some of the increase to both size then get some movement in both ( smaller capacity and displacy deltas. )
so pick one. both is unlikely. ( Apple isn't likely throwing tons of off die edge space to some huge PCI-e complex or mainstream I/O . If this is a chiplet componet then would need some for interconnection . but a very short range interchip connection would not 'pad out' as much as the other because the interfaces are narrower and less power consuming. )
4. If have same speed but also got 50-100% more memory channlesl( went from 8 to 12 or 16 ) the same speed is far, far, far, from the end of world. You would have far more aggregate bandiwidth which is highly needed for the incrased core count.
if the die size was almost constant ( 130 versus 120 area ) and same number of die stacks attached..... yeah that would be a problem. But the whole poiint of a "X" is to be a bigger one. Far bigger die and still clutching at just 2 die stacks ? Why?
the aggregate bandwidth to the die stacks counts ... not the speed of the individual dies.
[ Optionally the cores don't talk to the memory directly. The memory controllers do (and are part of the "uncore"). Just because the cores haven't changed doesn't mean the "uncore" can't move. The internal central bus has moved. The memory controllers could too. Or had already had a broader range of memory could interface with if willing to sacrifice a high power draw. The use of the M1 of 8 memory controllers/channels is very LDDR5-ish in design. It isn't presenting as strictly 4 design. ]
5. More P cores means more multithreaded performance.
Single threaded performance is not some kind of huge problem for the current P cores. If go back to point one above and there is zero drop off of getting the same single thread outcome with all this "extra stuff" attached to the die then what is the problem there. that the chip could seamless shift between the two different types of workloads would be a signficant win. Usually to max out on one have to contribute some scracfice for the other.
6. This goes back to 2, 3. 4. If grew the chip by a substantive amount but attached an iPad Pro memory subsystem to it with just two RAM stacks and some deficient, overtasked raster display engine .... what in the world did you spend all the extra die space on???????????????
Again this is one of those oxymoron situations. You stuff a doubling or quadrupling of GPU cores (and associated cache allocations) into the middle of the die to grow it much, much bigger. That cause the perimeter to grow and then you stick next to nothing along that increased perimeter. Why? If the die grows bigger than typically means you get more I/O . Which I/O class you allocate to the increase to is flexible but the notion that is is zero I/O and just more dense compute/cache with a bridge to "no where" doesn't make much sense.
If the die stacks when from 2 to 3 because had to make some other I/O tradeoffs then yeah probably going to get sublinear performance increase with a 2x increase of cores. 4x is more likely to be sublinear.
If they only jumped to 8 GPU cores then pretty likely it would fail at the mission of matching the current MBP 16" dGPU performance envelope. if the point was to just cover the MBP 13" four point model then that would be a win. But would be losing for the MBP 16".
There are rumors of a JadeCut were the die probably is reduced (hence 'cut down' ) and likely takes a loss on memory and/or display output. But there 'M1X' would be two SKUs. The smaller one probably costs less too. If get less and pay less that isn't "end of the world" either. That is the other issue whether 'M1X' is a grouping or single die.
One single die to cover the space that the ucrrent MBP 13" and MBP 16" fill with all the associated BTO options on GPU. Doing all that with one single die is probably and issue if Apple wants to keep the costs the same for similar configuration. Especially since Apple charges such a high mark up for RAM. A four RAM die stack SoC will cost more in part solely on the RAM markup inflation that is soldered on to the SoC.
So yeah the SoC to primarily cover the MBP 13" might get capped at GPUs ... grow less ...and get capped at 3 RAM stacks instead of 4. GPU performance will be higher than the M1 but less than the 4 stack variant, but also cost substantively less.