How much bigger can they really go with the the die size?
One oddity I find is that this mammoth chip isn't terribly faster then the 5950X, as such the 7950X will almost certainly be faster.
The bulk of the Max/Ultra die space is allocated to GPU cores. So if test the Max/Ultra versus 5950X or 7950X on GPU computational tasks which one is faster?
With such a huge die size, increasing the GPU cores cannot continue, i.e., keep increasing the size of the die. Again, I don't see much advantages to Apple making a discrete gpu but I do think they're at a point of diminishing returns.
For what it is... Apple's die are not "Huge die size". You folks are compare packages which consists of different multiple dies sets aimed at different workloads. It is Apples to Oranges.
The overall Apple die packaging has limits. But what Apple has now isn't really at the borders of very 'big'.
www.anandtech.com
Apple is using Info-LSI for M1 Ultra which is limited to approximately the 1x reticle limit. If they shifted over to CoWoS they could due 3x. ( 3 * ~ 800mm^2 --> ~ 2400 mm^2 ). Apple probably isn't going to put HBM memory stacks on this subassembly so they can use all of that for chiplets. Take that subassembly and use some less bleeding edge packaging for tight coupling to the LPDDR RAM modules. And done.
Doing twice a big as a duo (Ultra) is quite tractable ( 4 * ~430mm^2 -> ~1,800mm^2 ) . It is substantially more expensive, but well inside the boundaries of TSMC packaging. It won't fit in a laptop. And probably don't want to do "laptop" duo's for desktop only Ultras. I doubt Apple is going to try to compete on price with the mainstream generic GPU card market. So the additional expense probably will not be an issue for the customer target base they'll be going after. ( Apple isn't going to have some generic GPU market 'killer' product. They don't want one either. )
All they need is a 'desktop' oriented Max sized building block that is meant to scale up.
Also when N3 , N2 , 'N18A' , 'N16A' come along Apple will be able to squeeze incrementally more GPU cores into that same ~1,800mm^2 space.
The 'extreme' package with all the associated LPDDR modules won't be 'desktop' small. So not likely going to see a 'taller' Studio be the Mac Pro. They probably will need to go past the 7x7 inch footprint of the Mini (even if rotate the logic board vertical it still would present problems. And might as well let in some PCI-e slots as well since logic board passes the critical mass size. ) . The Xeon W-6200 inside the current Mac Pro is not some dainty sized package either. Chopping the current Mac Pro 'box' ( sans feet and handles) in half would still leave them plenty of room for a big, 5000mm^2 , package.
The affordability of the package ( 3D layout and chiplet development ) is a bigger constraint than the die size. If Apple gets the desktop chiplet set for the Studio , 'real' iMac Pro (not some iPad on a stick design), and Mac Pro then it is probably doable. Apple isn't going to create a 'killer' for multiple very high end dGPU performance zones. But single upper mid - entry high end range they can keep up with via a single multiple die package over time.
AMD's upcoming 7900 and 7800 are not monolithic dies either ( so not about die size). It is more about doing it more affordably. Apple trying to pound laptop optimized single, dies into scalable desktop class chiplets is a bigger problem than the size of the dies. That is the round peg into a square hole problem.