I sure hope Apple has a hush-hush surprise for us soon.
My gut feeling is that the AS Mac Pro is going to have its own silicon; it seems the only way they could have PCIe expansion and adequate number of Thunderbolt & USB ports.
Different from all of the other laptop dies. Probably. The laptop dies are all monolithic. The Mac Pro (and Studio) probably are not. Because they are plugged in all the time they can give slightly on Perf/Watt and take the hit of having 1-4 UltraFusion connectors in the package for a relatively very small Pref/Watt retreat.
Perhaps there will be a T2 type system controller that handles the mundane ports and CPU/GPU modules that slot in (that add additional Thunderbolt/Displays and RAM; and CPU/GPU grunt.)
Neither the Intel W-3400/2400 nor AMD Threadrippers need a 'slot' to provision PCI-e lanes out of the respective packages at all.
Perhaps The 'T2' part on a chiplet that only need to add one per package, but there is no need to 'slot' it at all.
Take a Max Die and decompose that into two or three chiplets. For example
Top edge I/O -- SSD , Secure Enclave , USB , x1 PCI-e v4 lanes for basic Network/etc , HDMI/eDP out , four TBv4 controllers , one UltraFusion edge to connect back to mesh.
'the rest' -- memory controllers and System Cache blocks, CPU cores , NPU core , GPU , DisplayControllers , A/V de/encode, two UltraFusion edges; one on top and bottom
'bottom' edge I/O - two x16 PCI-e v4 , 1 or 2 x4 PCI-e v4 bundles and maybe two TBv4 controllers (or an extra SATA/USB controller).
With those three building blocks could make the following products.
1. Mn Max with 6+ TB ports. [ top edge i/o + 'the rest' + flipped top edge i/o connected to bottom 'the rest' connector]
12+ CPU cores 38+ GPU cores etc.
Could put this in a Mac Studio where don't have to 'downgrade' the front two USB ports to non Thunderbolt. ( slightly wasteful in that have extra SSD controller , extra two Thunderbolt , extra HDMI/eDP out , but relatively small 'dead space' )
2. Mn Ultra S with 6+ TB ports . [ top edge i/o + 'the rest' + 'the rest' + flipped top edge i/o connected to bottom 2nd 'the rest' ]
24+ CPU cores , 38+ GPU cores , etc.
Again could put this in a Mac Studio just like the M1 Ultra went into one.
3. Mn Ultra P 4-6 TB ports [ top edge i/o + 'the rest ' + flipped 'the rest ' + bottom edge I/O ]
24+ CPU core , 76+ GPU cores , 32+ PCI-e v4 lanes , etc.
If this have 6 TB ports could merge this with Ultra S, but if need to drop unused TB ports to get PCI-e lanes, that is a better option. That would gap the Studio and Mac Pro
4. Mn 3-deep P [ top end i/o + 'the rest' + flipped 'the rest' + 'the rest' + bottom edge I/O ]
36+ CPU cores , 114+ GPU cores , 32+ PCI-e v4 lanes , etc.
Probably wouldn't scale as well as the paring with the first two large GPU clusters closely adjacent. And last two couplings would hat two CPU groups adjacent but the third section also more distant. The on package workload direction first light up those two adjacent groupings first. When the workload spills bigger than an Ultra then fire up the more distance addition units of that type. Also likely easier if a more distance GPU cluster is driving 3rd or 4th monitor and main cluster just tasked when have single primary display.
30 CPU cores would hang much better the middle of W-3400 or 5000WX line ups. 100+ cores is likely enough to top a W6800X on more than a few workloads. And the RAM memory cap is incrementally higher (along with more bandwidth available)
5. Mn 1-deep P [ top end i/o + 'the rest' + bottom edge I/O ]
Probably not interesting to high end users, but an incremetnally more affordable 'hobby box' to toss PCI-e cards into would work. [ If Mac Pro only had 4 TB sockets wouln't necessarily need bottom edge I/O to provide the final 2. ]
[or if tighted up the thermals a bit could put this on a add-on PCI-e card for a "Mac Card" . SoC , small drive , Ethernet and display ports. If under 75W doesn't even need power connector. ]
It would help for 'the rest' modules to be TSMC N3 so that the totally distance 'top' to 'bottom' across the middle was substantively shorter. The smaller the gap the more likely it will present as 'unified and uniform' collection of cores.
The collection of silicon is different (from laptops) and yet not completely different. Some amount of reusability is likely plays a major role. It is the only pragmatic way of keeping the costs affordable.