Absolutely wrong, btw the Mac Pro is close, you should at least analyze better your argument about ram buses obstructed as is the main workaround, Only error from me was to not specify Epyc Genoa-X as it's the inFO-LSI example on how apple may approach a 4x M2 Max, and later SOC multiple m3 arrangements, inFO-LSI substrate is way more flexible than old chiplet technology, not even requires actual silicon.Epyc Genoa doesn't have LSI 3D packaging.
AMD EPYC 7004 "Genoa" processor has been pictured, features twelve Zen4 chiplets - VideoCardz.com
The chiplets there are the same "PCB interposer" tech that AMD has used all along with the general Zen chiplet strategy.
The 3D cache variants of Genoa are not "LSI 3D packaging" technology that TSMC does. The cache is on the 'wrong' side and the connection is only through the bond ( not to a another chiplet. Only two chips involved where LSI is between three. )
If want to hand wave at something then MI-300 a kitchen sink of chip bonding techniques going on. Would have better chance of hitting broad side of a barn there.
That layout has significant problems. First, really haven't accounted for where the Memory packages go and that those are NOT directly attached to the same interposer that the main core dies are attached to. The core die interposer is attached to another similar to the"PCB interposer" that the Eypc uses to couple to the memory chips. So if draw a square around you thing and then try to place the memory chips outside that square , then you run into substantive issues.
Second, The NUMA blow up there are likey quite high. If you want to get from the top right corner of the top {m2 m] chip to the lower eft corner of the [ m2 m] chip on the hub's left , then have to go all the way down and then over to the left. Nothing at all like a striaght line . Or even semi-straght line you could do from a rectangular mesh.
Making that show up to programmers transparently as a single GPU and a smoothly coherent CPU complex is likely going to be problematical. Similar to how AMD MI-250 systems show up as two GPUs; not one even though the dies are LSI-like coupled. Four would even more likely be in that substantial NUMA zone. ( decent chance MI-300 isn't going to be a completely 'flat', uniform solution either, but much better than pure discrete cards. )
Finally, the last problem is that with TSMC LSI 3D packaging the connection LSI die is completely covered by the dies being placed on top. There is no way that 4 , same sized dies are going to completely cover that hub. The edges from [m2 m] dies will come into conflict with one another as you try to draw them closer to the center.
Furthermore the "ultra fusion" containing edge of the M2 is around 22mm ( the M1 is 19.96mm and the M2 series dies bloat out bigger ) . If that hub die is 22x22 that is yet another 484mm^2 die sitting there. ( even if only bloat to 21mm that a 441 die there. Again bigger than the original M1 Max. ). The bigger that die is the longer the distances between the main [m2 m] dies. The longer the distances the bigger the latencies (and power loss ).
Pointing at the Epyc above and saying 'but, but , but the outer chiplet die that is two away from the main I/O hub of that package works". Yeah , AMD takes a make everything slower to minize the NUMA hit. CPUs are going to be much more tolerant of that than GPU will. If trying to make that look like one big unified GPU to the customer then that won't work. Precisely why the 7900 reverses the chiplet decomposition approach ( memory controllers and cache out into the chiplets and the cores on one centralized unified, single die mesh. )
Trying to pound the 'round peg' of the laptop monolithic chip into a round > 2 chiplet design how is a bad design. It is just physically aggregated sub optimally.
un-obstructed but significantly longer isn't going to solve the problem. That is a 'jumping out of the frying pan and into the fire' kind of solution.
Yeah have stuff as monstrous > 400mm^2 die in there in the middle but it physically pushes the core dies farther apart. Putting the largest possible thing in the middle doesn't help get the separate GPU core clusters closer together. Nor the even more distance memory packages closer to the GPU cores.
EDIT: link with TSMC InFO-LSI information good to educate ppl here on this technology, so our rants are more factual.
Advanced Packaging Part 2 - Review Of Options/Use From Intel, TSMC, Samsung, AMD, ASE, Sony, Micron, SKHynix, YMTC, Tesla, and Nvidia
Advanced packaging exists on a continuum of cost and throughput vs performance and density. Even though the demand for advanced packaging is obvious, there is an incredible number of advanced packaging types and brand names from Intel (EMIB, Foveros, Foveros Omni, Foveros Direct), TSMC (InFO-OS...
www.semianalysis.com
Last edited: