Here's an alternate way to get increased GPU power: Instead of adding separate GPU-only dies to the SoC, they could have multiple dies with the current design (CPU and GPU on the same die), but with a much higher ratio of GPU cores : CPU cores.I think the reasonable possibilities for the Mac pro GPU is one of the following:
- a single SoC with a big GPU (e.g. four dies, around 40-50k FP32 ALUs). That’s the easiest option for Apple and it won’t really be able to challenge any of the high-end multi-GPU systems
- a single SoC with a VERY BIG GPU (multiple GPU-only tiles, a lot of cores). Very expensive, very custom, very big, but still has uses the same programming model as any other model
- multiple SoCs on separate compute boards, connected via some sort of PCIe-facilitated bus (maybe cache-coherent CLX), maybe with a shared pool of traditional RAM. This is something I’ve been thinking about for a while, as this approach would solve the issues with modularity and expandability. But it will require a new programming model that can efficiently use non-local compute clusters.
E.g., instead of the 2.4 GPU : 1 CPU ratio they have now, what about 10 GPU : 1 CPU? If they used enough dies to give them 40 CPU cores (i.e., the same number of CPU cores as a current 2 x Ultra), and increased the TFLOPS/GPU core by 20% over the M1's, that would give them ~200 TFLOPS, i.e., about twice what people are estimating for the future 4090 Ti. And if they also offered a 4 x Ultra with 80 CPU cores, that would give them ~ 4 x 4090 Ti. Those would be killer machines.
Would this benefit them them by maintaining close "local" CPU-GPU integration (local to each die), or hurt them by scattering the CPU cores over many different dies, rather than a few?
If they wanted to make it modular, they could offer 20 CPU cores/SoC, with the machine taking, say, up to four of these. And if they wanted to make it more customizable, they could offer SoC's with, say, two different ratios of GPU cores : CPU cores, catering to both CPU-heavy and GPU-heavy workloads.
Last edited: