TSMC’s variant of InFO with integration of an LSI is called InFO-L or InFO-LSI, and follows a similar structure with the new addition of it integrating this new local silicon interconnect intermediary chip for communication between two chips.
TSMC describes the LSI as being either an active, or a passive chip, depending on chip designers needs and their cost sensitivities.
So it does look like InFO-L can include active circuits.
InFO-LSI has size limits though. The tech that Apple used for M1 Ultra is limited to 1x reticle size. There is around 850 mm^2 range ( with varying tech in 800-900mm^2 range) . The M1 Max is about 420mm^2 . Two times that is 840mm^2. Which likely means Apple didn't have much wiggle room. The M2 Max ( and M2 and M2 Pro) are all bigger than the M1 predecessors. ( 'more stuff' added to the die and no process shrink) TSMC has been increasing the limits of the packaging technology. CoWoS has grown over last 2-4 years. InFO-LSI may or may not have scaled past 1x. The precision of the interposer alignment is far more critical.
www.anandtech.com
The upside of InFO-LSI is that it is substantive cheaper than the alternative 3d stacked solution on LSI. ( and way better than InF
S due to narrower bump pitch which is going to boost Perf/Watt. Powering the connection gets easier. Far narrower bump pitch makes it harder to precisely combine them though. )
and
If Apple had been needing something with more than two dies, it is likely they have to shift to CoWoS-LSI for the baseline packaging anyway.
So while putting a relatively very narrow PCI-e 'shim' between two Max sized dies , if the Max dies are soaking up 98+% of the reticle limit , there is no room to put a third member into the mix of any substantive size.
There is another tech that still using a limited LSI interposer, but it substantively more expensive and complicated.
Still have multiple dies stacked (and soldered ) in a vertical fashion. [ The smaller size of the LSI interposer is still in play in CoWoS-LSI , there are must multiple LSI 'routing' dies to place ( the purple like rectangles in above left that overlap between HBM and ASIC ) . If had a PCI-e 'shim' between two 'Max' dies then would need two LSI dies in the solution. 'Die 1' <-- ultrafusion --> 'PCI-e die <-- ultrafusion --> ' Die 2' . ]
Adding the PCIe controller(s) to the LSI layer has issues. One is thermal issues have to balance. Powering PCI-e communications over far larger distances isn't going to be thermally cheap. PCI-e v4 , 5 , 6 all make that harder (so that problem is not going away). The other major problem is that UltraFusion is so incredibily wide. There 10,000 connections to transfer between dies. ( it is more usually 'wide' than 'fast'. A 16 controller or aggregate 64 PCI-e lane controllers are
3 orders of magnitude less wide ) To keep very low latencies you want those paths between chips to be quite short and quite straight. The issue is where the copious empty spare area with no channels cutting through it in your limited size LSI chip ? Cherry on top of that is the signal drivers off chip for PCIe ... not scaling with node shrinks very well either. So not only not large empty blocks , but also have 'chunky' stuff need to put in those empty blocks. [ PCI-e isn't making the main die primarily because there is no space for it given the other priorities. ]
You have also now going connections going out the bottom of the LSI have to pass through and properly connect up. Also need to deliver power up to the LSI chip also.
Just because can possibly put active stuff in the LSI layer , it is still going to have to primarily accomplish UltraFusio missing of " internal die mesh to internal die mesh " connection. That is just way , way , way wider than the "single HBM package stack to ASIC" kind of wide.
An active interposer with perhaps some limited SRAM cache that only comes in/out via the communication channels to the ports has substantively less limitations. ( still have to deliver power , but there is really not much output ( except perhaps some diagnostic data channels that are not normally used. Nothing high performance or bandwidth. ) Similar if adding some data protocol and policy adapters ( data in/out still on same paths that passive interposer had. ). That would help if combining two things that were custom designed to work together well.
Active isn't going to mean can throw the 'kitchen sink' down there and it still works well. Nor is there tons of upside in making the LSI relatively large. TSMC probably has limits on its size also.
This is significant. If the process can be used for say, implementation of a PCIe switch, it would be possible to add this capability to the interconnect fabric without changing the SoC.
The other factors get in the way of making that possible.
The bigger questionable part is whether want to keep on using the laptop optimized lay of the Max as the basic building block. That doesn't make much sense. Even if shoehorn two laptop Maxes together there are other baggage that comes along with that in terms of layout and scale.
If Apple is deeply wedded to spreading the die design costs of the Mac Studio and Mac Pro over the MBP 14/16" sales than there are other tweaks. Like the two dies not being 100% identical. It is weaving in the MBP usages into the mix that is generating part of the problem. What I/O units are on the main die just could be different for the desktop only SoC solutions.