The current Max won't work for dense packed 4 tile set up ( even if it did have interconnect which it doesn't appear to have at all). The placement of the RAM connections is wrong.
How do you know that? Did you see any die shots?
There were die shots in Apple's event.
https://live.arstechnica.com/apple-october-18-unleashed-event/
Might as well since just going to get x-rayed and released anyway by 3rd parties in a couple of days/weeks anyway.
All need to do is look at the outer edges. Memory and inter-die interconnect isn't going to be in the middle. The Max bottom basically has elements just like upper right. Since duplicated and the video en/decode are duplicated .... very high probability that it is.
The interdie would be at least as big as the memory ones. ( go look at UPI on Intel or Infinity Fabric on AMD dies ).
The top is thunderbolt and some other stuff. Plus there would be little to no rational reason to put the interconnecton on the Jade-Chop ( M1 Pro) die. Pro -> Max is just adding stuff to the "bottom" of the die. ( covered in Anantechs article, but relatively obvious here. The top "half" of Max (Jade) exactly matches the Pro (Jade-Chop). That is why Jade-Chop is a 'chop'. Not a literal chop but some very minor closing off of some internal networks and just stopping. with a smaller die mask and die product. )
The 4-die packaging problem is that the Memory I/O is all up and down the right and left sides. If put four of these Max dies in a 'square' pattern the memory on "inner" side" of the square will be much farther away since the dies are close. If try to spread the dies out so that can put memory in the middle than the comm overhead ( in power ) is much higher and have uneven latencies. ( both are bad if trying to maximize Perf/Watt. ).
[ I suspect Apple needs the NPU cores to do part of the ProRes encodings. That would be why close placement and doubling would be needed for second video encoding. NPU cores probably aren't maximum fit for the work but conveniently they are still an offload from the general CPU P cores. Perhaps a contributing reason why don't see encode in Afterburner also. ]
macOS caps out at 64 threads .
It’s a kernel implementation detail that can easily be changed. Or one can also go Windows way and use CPU affinity groups.
trying to do a '64 int' to '128 int' might be conceptually easy, the data structure ramifications ( caching and footprint) , scaling , and real time constraints ramifications aren't really easy. Very significant work went into moving past the 64 thread limit went into Linux , various Unix, and Windows moves into that zone. It is a substantive amount of work.
It isn't a deep technical problem. It is a strategic objective problem. Apple didn't have a technical problem coasting on the MP 2013 for 6 years. It was constraints of their own construction that were the primary driver. There is no upside for iOS/iPad/etc for more than 64 threads. There is not to no upside for vast majority of Mac line up either. Forking the kernel for some small single digit Mac market share probably won't happen.
MacOS got APFS in part because it was cheaper to have one file system shared over iPhones and Macs. Basically same primary driver here.
Apple could fork the kernel for a single digit share product. Apple could charge current market prices for memory and NAND storage. Neither one is likely going to happen any more than Apple going back to allowing the cloning option for macs.
( Recend rumor puts iMac 27" on M1 Pro and Max also. If so , that is an even smaller base upon which to layer a forked kernel. )
www.anandtech.com
these came out later than my post, but quite illustrative of why Apple would be keen to get on to TSMC N3. The Max is in the 432mm2 zone. Let's say a tweak that allowed for multiple chip solutions was approximately 460mm2. Four times that would be 1,840mm2 . that's doable as a multichip package using TSMC 2.5/3D tech, but pretty close to the current 2021 limits and probably on the expensive side. If Apple got that down to 340-360mm2 that would be 1360-1440mm2 which probably lowers overall SoC package costs. ( not that Apple would reduce the price to end users, but would give them high margins and a relatively super low volume SoC. Also could shrink a Jade-2 into a monolithic die size they could live with and be more higher margin. )
Longer term Apple probably will go with few exotic 3D packaging for their SoCs if can go that path for the Mac solutions. The vast majority of their product line up doesn't need it ( now and
even less so when get to N3 , N2 , etc. ).