Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Aye, it's been awhile since HBM was used in a consumer product - what 7 years and counting? I mean Nvidia is out to release the DGX station with HBM but I really doubt that one could call that consumer with a straight face:


I mean it is technically a "desktop workstation" but I suspect you're looking at several tens of thousands of dollars and then one gets to count the PCIe versions of Hopper/Blackwell as "consumer".

HBM3e and yup it's still very expensive. HBM4 is coming up and the new even higher throughput interconnect is said to be even more expensive - like a lot more.

[…] But who knows right? Maybe Apple will not only use HBM in their data center but release a not-really-consumer product with it into the wild a la the DGX station above as a Mac Pro for $40,000 base (I'm making the number up).
If it contains GB300 (see photo at link: two Grace, four Blackwell Ultra, and more), then it will be closer to $140,000… So a $40,000 Mac Pro with comparable (while not quite on the same level) performance might seem like a real bargain!
 
  • Haha
Reactions: crazy dave
VEGA10 (Radeon RX VEGA 56 / 64) and VEGA20 (Radeon VII) also uses HBM.
Oh yeah I knew about those, but iirc the HBM didn’t help much with graphics performance (though it did make them compute monsters, memories from the crypto craze)

Aye, it's been awhile since HBM was used in a consumer product - what 7 years and counting? I mean Nvidia is out to release the DGX station with HBM but I really doubt that one could call that consumer with a straight face:


I mean it is technically a "desktop workstation" but I suspect you're looking at several tens of thousands of dollars and then one gets to count the PCIe versions of Hopper/Blackwell as "consumer".


HBM3e and yup it's still very expensive. HBM4 is coming up and the new even higher throughput interconnect is said to be even more expensive - like a lot more.

As for HBM versus LPDDR, there was a long discussion with lots of hard numbers in the link @tenthousandthings gave in the previous posts. Short version: it depends on the application and how much RAM you need. Basically for the same amount of RAM and a much smaller physical package (because it's stacked), you can have much higher bandwidth with HBM and on a per bandwidth basis be just as energy efficient. That's how Nvidia can get multi-TB/s bandwidth on like 144-288GB of HBM for professional Hopper and Blackwell GPUs. However, the cost is quite high and for many applications overkill. High bandwidth LPDDR makes the most sense for all Apple's (consumer-facing) products. If Apple were to design a data center specific die with HBM and wanted to dual use it to help defray development costs, they'd probably have to at least redesign the memory controller for the consumer product. I don't know how easy that is or how much Apple would have to change.

But who knows right? Maybe Apple will not only use HBM in their data center but release a not-really-consumer product with it into the wild a la the DGX station above as a Mac Pro for $40,000 base (I'm making the number up).
Thanks for the explanation, I figured there was a data center use case for HBM since I was aware of NVidia’s hardware. But I figured that the conversation revolved around consumer products.

And that makes sense, if Apple’s serious about using their own hardware for data center stuff then HBM would be useful there.

Still with that explanation I have doubts about any of their consumer products using HBM. Given that Apple is very much “prosumer” oriented.
 
  • Like
Reactions: crazy dave
I would like to point out that HBM or not HBM is a red herring. What's important is having a memory solution that can support your task. It pretty much boils down to two things (simplified): performance and cost.

To recap: HBM is essentially "parallel DRAM" — you stack multiple RAM chips and give them a very wide interface. Typically we see 1024 data signaling lanes per module. While the signal rate is comparable to other modern DDR solutions, the sheer volume of signaling lanes means very high bandwidth. From the theoretical performance, this is the same as using 16 traditional 64-bit DDR modules, just that the data bus needs to be parallel. Same reasoning applies to LPDDR5.

And this is where it gets interesting. Implementors avoid super wide RAM in practice because having that many signaling lanes on the board is tricky. First, they occupy space, and there are only that many signal traces you can put on the mainboard. These traces also need to be fairly long, which brings problems with power consumption and signal quality. Overall, this tends to be a logistical nightmare. There are server mainboards offering up to 12-channel DDR5 (768-bit interface), and they are very large, power-hungry, and do not support the fastest RAM standards. From what I remember, latest EPYC boards top up at 500-600GB/s — hardly impressive.

HBM avoids these problems by using an interposer to connect all the data and signaling links. One can use advanced packaging methods like through-silicon vias here, which have considerably lower footprint and result in lower power consumption. Also, RAM and the processing unit are closer together. This ends up being considerably more efficient if you want to maximize your bandwidth without blowing power and space budget. At the same time, the drawback is the price — this kind of complex packaging is expensive and HBM modules are not as readily available. So there is a good reason why HBM is limited to high-end data center applications in practice.

There is yet another practical solution for high-bandwidth application - GDDR. Here one focuses on transmitting as much data per pin as possible by pushing the infrastructure to its limits and using advanced signal encoding. The result is fast, but also very power-hungry, which again makes it less suitable for scalable, reliable solutions.

So it means that HBM is the way to go, right? Well, not quite. Recall that the core problem is logistics, signaling logistics to be precise. Traditional RAM fails here because traditional signal traces do not scale. HBM wins because it uses advanced packaging for wiring. Which is precisely how Apple has tackled the issue. They use „traditional“ (albeit heavily customized) RAM modules and a HBM-like wiring solution to connect them to the SoC. This is obviously more expensive that the usual way, but much cheaper than actual HBM, and can be used on a larger scale because you are not limited by supply-constrained HBM. Power consumption is another big advantage - it's fairly unique to use a 512-bit RAM interface as your main memory while still delivering 10+ hours of battery.

The point I am trying to make here is that HBM and Apple's RAM packaging leverage the same solution to the fundamental problem. Hence "poor man's HBM". This does not mean that Apple does not need HBM or won't be using HBM on some of their products. It is obvious that they need more RAM bandwidth to support larger models, so if they want to do that they will need to scale. Can their custom solution support it? Absolutely. Will it be economically feasible? Who knows. There are different things to explore here. I remember seeing patent that mentioned placing RAM modules on both sides of the package, which would double bandwidth and capacity for example.
 
Last edited:
I would like to point out that HBM or not HBM is a red herring. What's important is having a memory solution that can support your task. It pretty much boils down to two things (simplified): performance and cost.

To recap: HBM is essentially "parallel DRAM" — you stack multiple RAM chips and give them a very wide interface. Typically we see 1024 data signaling lanes per module. While the signal rate is comparable to other modern DDR solutions, the sheer volume of signaling lanes means very high bandwidth. From the theoretical performance, this is the same as using 16 traditional 64-bit DDR modules, just that the data bus needs to be parallel. Same reasoning applies to LPDDR5.

And this is where it gets interesting. Implementors avoid super wide RAM in practice because having that many signaling lanes on the board is tricky. First, they occupy space, and there are only that many signal traces you can put on the mainboard. These traces also need to be fairly long, which brings problems with power consumption and signal quality. Overall, this tends to be a logistical nightmare. There are server mainboards offering up to 12-channel DDR5 (768-bit interface), and they are very large, power-hungry, and do not support the fastest RAM standards. From what I remember, latest EPYC boards top up at 500-600GB/s — hardly impressive.

HBM avoids these problems by using an interposer to connect all the data and signaling links. One can use advanced packaging methods like through-silicon vias here, which have considerably lower footprint and result in lower power consumption. Also, RAM and the processing unit are closer together. This ends up being considerably more efficient if you want to maximize your bandwidth without blowing power and space budget. At the same time, the drawback is the price — this kind of complex packaging is expensive and HBM modules are not as readily available. So there is a good reason why HBM is limited to high-end data center applications in practice.

There is yet another practical solution for high-bandwidth application - GDDR. Here one focuses on transmitting as much data per pin as possible by pushing the infrastructure to its limits and using advanced signal encoding. The result is fast, but also very power-hungry, which again makes it less suitable for scalable, reliable solutions.

So it means that HBM is the way to go, right? Well, not quite. Recall that the core problem is logistics, signaling logistics to be precise. Traditional RAM fails here because traditional signal traces do not scale. HBM wins because it uses advanced packaging for wiring. Which is precisely how Apple has tackled the issue. They use „traditional“ (albeit heavily customized) RAM modules and a HBM-like wiring solution to connect them to the SoC. This is obviously more expensive that the usual way, but much cheaper than actual HBM, and can be used on a larger scale because you are not limited by supply-constrained HBM. Power consumption is another big advantage - it's fairly unique to use a 512-bit RAM interface as your main memory while still delivering 10+ hours of battery.

The point I am trying to make here is that HBM and Apple's RAM packaging leverage the same solution to the fundamental problem. Hence "poor man's HBM". This does not mean that Apple does not need HBM or won't be using HBM on some of their products. It is obvious that they need more RAM bandwidth to support larger models, so if they want to do that they will need to scale. Can their custom solution support it? Absolutely. Will it be economically feasible? Who knows. There are different things to explore here. I remember seeing patent that mentioned placing RAM modules on both sides of the package, which would double bandwidth and capacity for example.
I wonder if any lessons can be applied from the R1 memory? It’s another kind of of “near memory” like HBM or GDDR. Very low latency and utilizes many more pins than dram but perhaps too low capacity to replace it?
 
I wonder if any lessons can be applied from the R1 memory? It’s another kind of of “near memory” like HBM or GDDR. Very low latency and utilizes many more pins than dram but perhaps too low capacity to replace it?

IIRC there was a rumour that they’d be starting a transition to LLW RAM like on the R1 for their other devices in 2026/2027 (can’t remember the exact year).
 
  • Like
Reactions: OptimusGrime
I wonder if any lessons can be applied from the R1 memory? It’s another kind of of “near memory” like HBM or GDDR. Very low latency and utilizes many more pins than dram but perhaps too low capacity to replace it?

There are not many details about R1, however looking at available information it seems to follow the same idea as HBM or Apple's on-package LPDDR. The main difference is likely even tighter integration for increased performance and power consumption.
 
There are not many details about R1, however looking at available information it seems to follow the same idea as HBM or Apple's on-package LPDDR. The main difference is likely even tighter integration for increased performance and power consumption.
So would you say it would be a good fit for Apple’s ambitions?
 
So would you say it would be a good fit for Apple’s ambitions?

I don’t have a judgement on this. Not much is known about this tech, its costs, or limitation. What I would mention is that chiplet-style RAM is hardly an economical solution for main memory. It could work as large cache though.
 
I don’t have a judgement on this. Not much is known about this tech, its costs, or limitation. What I would mention is that chiplet-style RAM is hardly an economical solution for main memory. It could work as large cache though.
That makes sense. I’ve read that it is used as pseudo-sram on the Vision Pro.
 
Last edited:
According to CTEE, AMD, first, and Apple, later, will launch products using TSMC's SoIC this year.
According to the supply chain, AMD is the first company to introduce SoIC, and Apple will follow suit in the second half of this year by introducing SoIC advanced packaging for M5 chips.
https://www.ctee.com.tw/news/20250325700068-430501 [Traditional Chinese]
 
I wonder if TSMC helped information about the 2nm HVM get out to counter negative messaging about onshore (Taiwanese) yield and dampen misinformation about expected costs when the the fabs make their way to the US (which should only add 10% to each wafer).

Yesterdays news didn't have the AMD scoop - but did hint at OpenAI's interest in TSMC 2nm (...wondering if Softbank are now involved).
 
The best description of the priorities of the Apple Silicon team is found in an interview of Anand Shimpi in February 2023. It is well worth reading that transcript if you are truly interested in understanding what Apple is doing.

"But really the thing that we see, that the iPhone and the iPad have enjoyed over the years, is this idea that every generation gets the latest of our IPs, the latest CPU IP, the latest GPU, media engine, neural engine, and so on and so forth, and so now the Mac gets to be on that cadence too."

Yet, segmentation seems to be less of a perceptual reality, and more of a conscious strategy for AAPL these days....
 
Speaking of codenames, am I right in thinking we still haven’t seen “Hidra” — ? — that rumor was it was for Mac Pro. I believe it was generally thought to be M4 Ultra, but that seems unlikely now, so what is it? M5 Pro/Max/Ultra?

I guess it makes sense that SoIC development could have a different timeline, with work on the chiplets beginning earlier, fundamentally changing how the A-series and the M-series relate to one another.
 
Speaking of codenames, am I right in thinking we still haven’t seen “Hidra” — ? — that rumor was it was for Mac Pro. I believe it was generally thought to be M4 Ultra, but that seems unlikely now, so what is it? M5 Pro/Max/Ultra?

I guess it makes sense that SoIC development could have a different timeline, with work on the chiplets beginning earlier, fundamentally changing how the A-series and the M-series relate to one another.
Yeah, my bet is that it’s the M5 Ultra.
 
Speaking of codenames, am I right in thinking we still haven’t seen “Hidra” — ? — that rumor was it was for Mac Pro. I believe it was generally thought to be M4 Ultra, but that seems unlikely now, so what is it? M5 Pro/Max/Ultra?

I guess it makes sense that SoIC development could have a different timeline, with work on the chiplets beginning earlier, fundamentally changing how the A-series and the M-series relate to one another
Yeah, my bet is that it’s the M5 Ultra.

Nah, it's the M5 Extreme...! ;^p
 
Maybe it’s the chiplet rumor overall. One chiplet, one head. It might refer to the whole M5 series.
 
Huh! On the recent codenames leak, which includes mysterious future silicon codenamed “Sotra,” here’s something:

Hidra is a Norwegian island. So is Sotra

Hidra = M5 Ultra
Sotra = M7 Ultra

Komodo and Borneo are also islands. Also Baltra. So are Donan and Brava.

I know it doesn’t mean anything, that’s the whole point of a codename, but I wasn’t aware they were all islands…
 
Last edited:
Huh! On the recent codenames leak, which includes mysterious future silicon codenamed “Sotra,” here’s something:

Hidra is a Norwegian island. So is Sotra

Hidra = M5 Ultra
Sotra = M7 Ultra

Komodo and Borneo are also islands. So are Donan and Brava.

I know it doesn’t mean anything, that’s the whole point of a codename, but I wasn’t aware they were all islands…
Nice find. Can’t believe I/many of us went all in on the Hydra thing. We’ve played too much Heroes of Might and Magic in our days.

That pretty much throws an ’Extreme’ option out the window. It’s just a new island themed codename branding. Possibly referenced because of upcoming chiplets. Brava and Donan were neither of them really islands.

Edit: I see Donan and Brava were islands, too. In that case I don’t think it means anything at all.
 
Yes, agree, just a fun fact. I deleted a clause (preserved in the @Populus reply above) after “something” as soon as I realized they are all island names.
Yeah, even Hidra is an island. Maybe Hydra? Or that’s just in the MCU (Marvel).

I’m afraid we’re all eager to know more about the upcoming M5 family of SoCs, because of how close it is, few months away of being revealed. Hopefully more details will leak soon…
 
But since we had mac identifiers 17,1 and 17,2 leak, at least we know Hidra is M5. And it would probably not be a MBP which comes in 3+ identifiers. And it won’t be a Max Studio or MBA, which just got upgraded.

That leaves two configs of Mac Pro, or Mac Mini, or a regular iMac and iMac Pro is back?
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.