No doubt this was discussed ad infinitum back in the day. My impression, though, was it had little to do with additional costs and/or diminishing markets (I think if either were the case then "Jade 4C-Die" would never have made it onto the leaked roadmap in the first place), but rather it was engineering roadblocks, possibly some make-or-break feature they couldn't make work, but more likely inherent performance and/or efficiency limitations that were compounded as they multiplied the dies. Chip-first bridge packaging worked with simple, matching two-way pairs of chips, but beyond that it became problematic. We know now that Apple was the first to bring InFO-LSI to market, so it was probably a bit of an engineering feat just to make that happen. The four-way bridge design (however that was constructed) didn't work as well, so it wasn't built.
Here's what Anand Shimpi said in September 2023, possibly referring to this experience: "At the end of the day, we’re a product company. So we want to deliver, whether it’s features, performance, efficiency. If we’re not able to deliver something compelling, we won’t engage, right? ... We won’t build the chip."
Edit to add that the removal of chip-first InFO-LSI packaging from TSMC's current public relations/press release site might be a consequence of this apparent failure beyond a simple two-way bridge. Note that "InFO-LSI" still appears as part of TSMC's chip-last CoWoS-L packaging. It's not inconceivable that M4's advanced packaging will switch to that approach.
They COULD probably build an InFO-LSI 4-way design.
This patent shows one way to do it.
US20240039539A1 - Systems and methods for implementing a scalable system - Google Patents
Multi-chip systems and structures for modular scaling are described. In some embodiments an interfacing bar is utilized to couple adjacent chips. For example, a communication bar may utilized to coupled logic chips, and memory bar may be utilized to couple multiple memory chips to a logic chip.
patents.google.com
Imagine a central square, then put a Max chip on each side of the square. Inside the square put a "hub/router" chip.
Then an EMIB/InFO-LSI bridge connects each Max to the central router chip.
For the central hub there are other interesting additional options you can imagine.
Fab it on N7 for lower price?
Fill it up with an additional 256MB (or 1GB? whatever fits!) of additional SLC cache?
This option allows substantially unmodified Max chips, so that's an advantage. Other designs allow for more compact packing, but then you start to use up more than one edge for Fusion, and it becomes unclear that the single chip is what you want to sell as a Max...
BTW another interesting set of patents are
(2023) https://patents.google.com/patent/US20230214350A1
(2023) https://patents.google.com/patent/US20240085968A1
dealing with things like power and clock sync across multiple SoCs.
We've had stuff like this before for two-way designs (ie Ultra) but these are for four-way designs...