Exactly. Imagine that Apple makes a separate GPU chiplet that comes with 32 cores and then puts a bunch of those together, wouldn’t that be a scalable solution?
That would be scalable, but I thought the performance of their integrated CPU-GPU architecture relied on having both the CPU and GPU on the same die. Thus I imagined that, if they wanted to double up on the number of GPU cores in the Mac Pro, they would need to construct the "Extreme" chip from four "M2 Max Pro" subunits instead of four M2 Max's, where each M2 Max Pro was an expanded version of the M2 Max that contained double the number of GPU cores.
I.e., if the M2 Max has X CPU cores and Y GPU cores, then the "M2 Max Pro" would have X CPU Cores and 2Y GPU cores.
I estimate that would give them ~120 TFLOPs, as compared with ~80 TFLOPs for a 4090 and 90–100 TFLOPs for a 4090Ti, i.e., half-way between a single 4090 and dual 4090's for general GPU compute performace (we'll probably need to wait for M3, which will likely be on 3 nm, to get hardware RT).
But creating this new design just for the Mac Pro seems resource-intensive, so I don't know if they'd do that.