Something that Apple was stressing over and over again during WWDC is that Apple ARM Macs will be based on their integrated system on a chip. Among other, it means that CPU and GPU share system memory (you may say that this makes the GPU integrated since having own video RAM is the primary criterion of distinguishing between iGPUs and dGPUs). This approach does have its benefits - there is no need to transfer data between CPU and GPU for example, the GPU can take advantage of the virtual memory system, the power consumption cost is much lower and it does work well for Apple GPUs who because of their architecture need much lower memory bandwidth than current dGPUs. However, this won’t scale to high-performance applications. Even if Apple uses LPDDR5 in their new Macs (RAM bandwidth approaching 50 GB/s), they won’t be able to complete with modern GDDR6 etc. solutions that deliver bandwidth of 200 GB/s and higher.
This is where my speculation starts. What if Apple kept the unified memory approach and its many advantages, but used high bandwidth memory instead? They already have a lot of experience with HBM2 and if I understand it correctly, it’s latency is comparable to the latency of regular RAM, so it can be used as CPU RAM (unlike typical video RAM that trades latency for bandwidth). Combining an Apple SoC with 32GB of HBM2 will allow bandwidths of over 400GB/s, which compares to those of fast desktop GPU, while also potentially allowing speed ups on the CPU side.
There are reasons why I think Apple could potentially pull this off. First of all, this kind of system is going to be very expensive (interposers are complex and cost a lot of money). This is probably why we don’t see it much in everyday computing as companies prefer more conservative solutions that scale to different markets. But Apple doesn’t care about this. They don’t have to cater to different markets, they have their target pretty much locked in. The 16” MBP already costs a lot of money - and they might as well funnel the savings from using their own chips into a more expensive memory subsystem. This would also be advantageous to Apple, since nobody else would be even close to offering anything even remotely comparable. This would be a very power efficient design, potentially capable of very high performance, and at the same time it would be in some ways simpler than the traditional PC designs (no need for different types of memory, no need for a bus between CPU and GPU, power delivery system can be radically simpler). A single SoC at 80watt TDP could potentially deliver desktop-class CPU and GPU performance.
Note: Unified memory architectures are used by gaming consoles, I assume in order to simplify the design and optimize the memory transfer. But consoles use high latency memory, so programmers have to take this into account.
What do you think?
This is where my speculation starts. What if Apple kept the unified memory approach and its many advantages, but used high bandwidth memory instead? They already have a lot of experience with HBM2 and if I understand it correctly, it’s latency is comparable to the latency of regular RAM, so it can be used as CPU RAM (unlike typical video RAM that trades latency for bandwidth). Combining an Apple SoC with 32GB of HBM2 will allow bandwidths of over 400GB/s, which compares to those of fast desktop GPU, while also potentially allowing speed ups on the CPU side.
There are reasons why I think Apple could potentially pull this off. First of all, this kind of system is going to be very expensive (interposers are complex and cost a lot of money). This is probably why we don’t see it much in everyday computing as companies prefer more conservative solutions that scale to different markets. But Apple doesn’t care about this. They don’t have to cater to different markets, they have their target pretty much locked in. The 16” MBP already costs a lot of money - and they might as well funnel the savings from using their own chips into a more expensive memory subsystem. This would also be advantageous to Apple, since nobody else would be even close to offering anything even remotely comparable. This would be a very power efficient design, potentially capable of very high performance, and at the same time it would be in some ways simpler than the traditional PC designs (no need for different types of memory, no need for a bus between CPU and GPU, power delivery system can be radically simpler). A single SoC at 80watt TDP could potentially deliver desktop-class CPU and GPU performance.
Note: Unified memory architectures are used by gaming consoles, I assume in order to simplify the design and optimize the memory transfer. But consoles use high latency memory, so programmers have to take this into account.
What do you think?