Ah, sorry, I mean "single" GPU. I would classify the MI250 as two GPUs on one package. From the system configuration point of view it's not that different from any other multi-GPU system that is connected via infinity link.
What would significantly matter though is if AMD implemented the Infintity Fabric connections differently. If they are 2.5/3D imposer only different implementations they don't have to be as large as the 'long distance' Infinity Fabric implementations. The die-to-die IF connections is not exactly like what Apple did with Infinity Fabric or what AMD is doing elsewhere with IF.
https://www.anandtech.com/show/1705...i200-accelerator-family-cdna2-exacale-servers
" ... But it still comes with caveats, as each MI200 accelerator is presented as two GPUs, and even as fast as 4 IF links are, moving data between GPUs is still much slower than within a single, monolithic GPU. ..." (**)
Initial search for a public annotated floor plan turned up a blank , but the clearly the side where the die-to-die , fixed point to point connect extremely likely lies is not the same size as the general Infinity Fabric or PCI-e external package links. Four significantly smaller IF links makes a different in die space usage allocations.
For example, Nvidia completely dumped NVLink off the 4080 die to allocate more space to RTv3 units.
" ... The news is coming directly from NVIDIA founder and CEO Jensen Huang, who on a call with the tech press he said that Ada Lovelace drops the NVLink connector so that the I/O room for "something else". We do know that NVIDIA has crammed a helluva lot more into Ada Lovelace GPUs, with transistors, CUDA core, and ROP counts going through the roof.
....
Jensen said that NVIDIA engineers wanted to use as much of the silicon area they had their hands-on to "cram in as much AI processing as we could". We also now know: Ada Lovelace does NOT support PCIe 5.0, but NVIDIA included the Gen5 power connector. ..."
https://www.tweaktown.com/news/8860...-for-ada-lovelace-silent-death-sli/index.html
Apple is getting high bandwidth and lower latencies with UltraFusion but they also are tossing any significant high bandwidth link off the package down the drain to get there. There is no 'free lunch' there. If scaling to four dies means completely doing an even bigger off die bandwidth trade-off , then even less of a "free lunch" there. A path that pragmatically blocks PCI-e v5 (and CXL) for very long time would come at a cost.
(**) In Anandtech article in tech specs table about MI200 line up versus previous there is a substantive uptick in IF links number. How they got that without expending gobs of edge space is significant. But there is a trade-off ( on distance versus space).