All excellent points. I don't see many commentators noticing this. If Apple holds firmly to unified memory as an architectural mandate, that would seem to constrain memory and GPU scalability in the near term, or else incur a significant increase in SoC fabrication cost.
Maybe eventually at 2 nanometers the extra transistor budget and power efficiency would make it possible to have a high-core-count CPU and GPU on the same die similar to M1, but Apple can't wait that long to make high-end Apple Silicon iMacs and Mac Pros.
Unified memory works really well and solves a lot of inefficiency issues with separate GPU VRAM. At current fabrication technology that works for an M1-class SoC. It might work for the hypothetical M1X with double the CPU & GPU cores and maybe 32GB RAM. The problem is double the M1 GPU cores (while fast) is still not sufficient GPU horsepower for higher-end applications.
If they did use a dGPU, how would unified memory be maintained? It cannot go on a traditional bus. I see several possibilities.
- Use a "chiplet" dGPU (without VRAM) which shares the same substrate as the CPU. That is essentially a partial die which has faster bandwidth to the CPU than a separate die:
https://www.eetimes.com/chiplets-a-short-history/#
- Use a separate dGPU die (without VRAM) integrated on the same SoC, much like the HBM RAM. Bandwidth is slower than a chiplet approach but faster than external to the SoC. Intel used this approach for their first quad-core Q6600 "Kentsfield" CPU, before integration allowed putting four x86 cores on a single die.
- Use a truly discrete dGPU package soldered on the PCB and communicating via a new proprietary ultra-speed bus. No compatibility would be needed, as it wouldn't be upgradeable. It's unclear if the bandwidth and latency would be good enough for a "VRAM-less" dGPU. If instead it's a regular VRAM design, how could unified memory work? I don't see them doing that.