M4+ Chip Generation - Speculation Megathread [MERGED]

novagamer · Mar 8, 2025

Antony Newman said:
If Apple have a supply of M5 Ultra Chips ready for the Mac Pro this year - why wouldn't they also offer this as an Additional upgrade tier in the Mac Studio for a price that is between the M3 Ultra Studio and the base price of the Mac Pro M5 Ultra? Say $1000 less than the Mac Pro?

Segmentation. This studio lineup will probably stay the same for quite a wihle imo. Margins on the Mac Pro are definitely higher and they will maintain that and possibly increase them.

Antony Newman said:
If Apple don't also offer the M5 Ultra in the Mac Studio in 2025 ... It will likely result in customers Waiting until Apple release the Studio with M5 before they buy .. or, for many to consider switching from last-in-line desktops with two generation old architecture technology to the yearly state of the art Macbooks (with Max chips) that Apple releases like clock work in Q4 each year.

This is already the case with releases from M1 Max -> present. If you don’t need more than 128GB of memory and you can afford upgrading often the 16” MBP makes the most sense, at least with the historical cadence. That calculus could change with a desktop-exclusive SoIC which might be worth waiting for depending on your use case.

..

I don’t know why some people are assuming these hypothetical many-core processors will ever ship outside Apple, they have no software that will leverage it. Apple is not publicly in the server market and it would be shocking to see them enter it, especially anytime soon given how abysmal their OS software execution has been lately. We will get something cool, configuration-wise if the wafer improvements are true but it’s a stretch to think they’d ship an ultra-high core count CPU. More P-cores and less E-cores are quite likely, though which will be a big boost.

Not being stuck offering multiples of redundant components like the media engines (the use case for more than 8 8k streams is probably nearly zero for example) being removed will save cost, power, and free up room to put something else interesting there. That’s where I see this headed, particularly with the neural and GPU cores.

tenthousandthings · Mar 9, 2025

novagamer said:
[…] I don’t know why some people are assuming these hypothetical many-core processors will ever ship outside Apple, they have no software that will leverage it. Apple is not publicly in the server market and it would be shocking to see them enter it, especially anytime soon given how abysmal their OS software execution has been lately.

People tend to forget (or not know in the first place) that the PCC hardware (and any custom silicon built for it) does not and will not run macOS. There are two Apple security blog posts on this topic: June 10, 2024 and a follow up on October 24, 2024. “We paired this hardware with a new operating system: a hardened subset of the foundations of iOS and macOS tailored to support Large Language Model (LLM) inference workloads while presenting an extremely narrow attack surface.”

novagamer said:
We will get something cool, configuration-wise if the wafer improvements are true but it’s a stretch to think they’d ship an ultra-high core count CPU. More P-cores and less E-cores are quite likely, though which will be a big boost.

Not being stuck offering multiples of redundant components like the media engines (the use case for more than 8 8k streams is probably nearly zero for example) being removed will save cost, power, and free up room to put something else interesting there. That’s where I see this headed, particularly with the neural and GPU cores.

That makes perfect sense and it will be good to see what Apple has done with this.

M4pro · Mar 10, 2025

So I saw that Bloomberg’s resident clown is reduced to selling auto-generated summaries of these very Forums (but with forum members’ useful technical analysis removed).

“In (Bloomberg's Mark Gurman ) Power On newsletter, (Gurman) said that Apple is reluctant to develop an M4 Ultra chip from scratch due to production challenges, costs, and the relatively small sales volume of its desktop computers, like the Mac Studio.”

Er… no sh_t… thanks for the insight there…

BNBMS · Mar 14, 2025

I really think Apple should start using a whole new chip design cause SoC is just totally inefficient to mass produce thanks to its die size. Max series are already big and yet too expensive to manufacture with low yield but Ultra series? They are much difficult and harder than Max. Nobody design like that and it will only hurt Apple in a long term especially for high-end desktop's performance and workstation's specs and this is the current situation.

The only way to solve this problem is to use Chiplet design. Technically, more complicated than Chiplet according to Apple's patent tho. Instead of making them all in one die, they can produce CPU, GPU, NPU, and other chips separately which is much smaller and cheaper to manufacture and then combine them as a single die.

This is the main reason why Apple is having problem with Ultra chips, Mac Studio, and Mac Pro in terms of specs and mass production. I believe Apple is planning to bring a whole new MBP with OLED in 2026 and it might be a great opportunity to start using a new chip design as well.

DrWojtek · Mar 14, 2025

BNBMS said:
I really think Apple should start using a whole new chip design cause SoC is just totally inefficient to mass produce thanks to its die size. Max series are already big and yet too expensive to manufacture with low yield but Ultra series? They are much difficult and harder than Max. Nobody design like that and it will only hurt Apple in a long term especially for high-end desktop's performance and workstation's specs and this is the current situation.

The only way to solve this problem is to use Chiplet design. Technically, more complicated than Chiplet according to Apple's patent tho. Instead of making them all in one die, they can produce CPU, GPU, NPU, and other chips separately which is much smaller and cheaper to manufacture and then combine them as a single die.

This is the main reason why Apple is having problem with Ultra chips, Mac Studio, and Mac Pro in terms of specs and mass production. I believe Apple is planning to bring a whole new MBP with OLED in 2026 and it might be a great opportunity to start using a new chip design as well.

Rumor has it they will start doing this this cycle, the M5 gen. And then for a shrink with N2 on M6.

I think there will be greatly diminishing returns after M6 and I really like to hold on to what I buy. Will try to hold on to M6 but we will see. The tech gut hasn’t been fed for so incredibly long already.

DaniTheFox · Mar 14, 2025

tenthousandthings said:
“We paired this hardware with a new operating system: a hardened subset of the foundations of iOS and macOS tailored to support Large Language Model (LLM) inference workloads while presenting an extremely narrow attack surface.”

I understand that it will not run the calendar application. However, I have a question. Could you imagine them using the new M3 Ultra for the PPC? I think its 512GB memory is specifically designed for such tasks.

BNBMS · Mar 14, 2025

DrWojtek said:
Rumor has it they will start doing this this cycle, the M5 gen. And then for a shrink with N2 on M6.

I think there will be greatly diminishing returns after M6 and I really like to hold on to what I buy. Will try to hold on to M6 but we will see. The tech gut hasn’t been fed for so incredibly long already.

Since M3 Ultra = RTX 4080 ~ 5080 especially since Nvidia messed up with RTX 50 series, it's a great chance to surpass them in terms of GPU performance. But still, Apple needs to figure out how to make Mac Pro with x4 GPU which also benefits their AI development via their own chips. M4 Ultra should be as close as RTX 5090 but no clues.

tenthousandthings · Mar 14, 2025

DaniTheFox said:
I understand that it will not run the calendar application. However, I have a question. Could you imagine them using the new M3 Ultra for the PPC? I think its 512GB memory is specifically designed for such tasks.

Yes. It would be surprising if they did not. We know they were testing M3 Ultra by April 2024, I think it’s possible the first ten months of production went straight into PCC hardware.

tenthousandthings · Mar 14, 2025

BNBMS said:
Since M3 Ultra = RTX 4080 ~ 5080 especially since Nvidia messed up with RTX 50 series, it's a great chance to surpass them in terms of GPU performance. But still, Apple needs to figure out how to make Mac Pro with x4 GPU which also benefits their AI development via their own chips. M4 Ultra should be as close as RTX 5090 but no clues.

M4 Ultra probably isn’t happening. I believe Apple has already said as much. Instead, as @DrWojtek said, it’s the M5 Pro/Max that is rumored to use chiplets. The SoIC (system on integrated chips) technology Apple is said to be using results in something similar to a monolithic SoC, just built from chiplets, allowing more flexibility. Two Max can still be fused into an Ultra.

This could come as soon as late summer, with an announcement at WWDC 2025.

mr_roboto · Mar 15, 2025

BNBMS said:
I really think Apple should start using a whole new chip design cause SoC is just totally inefficient to mass produce thanks to its die size. Max series are already big and yet too expensive to manufacture with low yield but Ultra series? They are much difficult and harder than Max. Nobody design like that and it will only hurt Apple in a long term especially for high-end desktop's performance and workstation's specs and this is the current situation.

The only way to solve this problem is to use Chiplet design. Technically, more complicated than Chiplet according to Apple's patent tho. Instead of making them all in one die, they can produce CPU, GPU, NPU, and other chips separately which is much smaller and cheaper to manufacture and then combine them as a single die.

This is the main reason why Apple is having problem with Ultra chips, Mac Studio, and Mac Pro in terms of specs and mass production. I believe Apple is planning to bring a whole new MBP with OLED in 2026 and it might be a great opportunity to start using a new chip design as well.

You claim that the Max is too big and expensive and low yield. What data do you base this on? But what is public is that they've designed a Max chip for every M-series SoC generation so far, and it seems unlikely they'd have done this four times in a row if Max chips actually were too big and expensive to manufacture.)

You claim that nobody designs like that, yet GPUs the size of Max chips (or larger) are quite common.

You claim that Apple should obviously just split everything out into its own chiplet, with very fine granularity, and seem to think that chiplets are 100% upside, but the truth is that they are not. Any time you want to move a function of a monolithic die onto a different die (AKA chiplet), you are going to have to pay some overhead (some combination of area, power, bandwidth, and latency) for that block to communicate with other blocks that are no longer on the same die. On top of this, assembling a bunch of chiplets into a complete device has its own manufacturing yield, and as a general rule you should expect that as package complexity goes up (because you increased the chiplet count), yield goes down.

Finally, Apple has a world-class team of engineers working on their chips. What makes you presume that you know better than they do what they should be doing? Maybe you should wait and see what the professionals decide to do. If it's not what you thought it would be, you could learn something.

For the record, I don't pretend to know where they'll go from here, but that's because I know enough about this topic to be aware that I personally don't have enough information to make a solid prediction. The only thing I'll offer is a very handwavy observation: Apple is a lot less cost sensitive than the wider PC industry. Even if dicing things up into a ton of small chiplets could theoretically save them money, if that compromises other things Apple cares about (power efficiency, for example), they probably won't do it.

BNBMS · Mar 15, 2025

mr_roboto said:
You claim that the Max is too big and expensive and low yield. What data do you base this on? But what is public is that they've designed a Max chip for every M-series SoC generation so far, and it seems unlikely they'd have done this four times in a row if Max chips actually were too big and expensive to manufacture.)

You claim that nobody designs like that, yet GPUs the size of Max chips (or larger) are quite common.

You claim that Apple should obviously just split everything out into its own chiplet, with very fine granularity, and seem to think that chiplets are 100% upside, but the truth is that they are not. Any time you want to move a function of a monolithic die onto a different die (AKA chiplet), you are going to have to pay some overhead (some combination of area, power, bandwidth, and latency) for that block to communicate with other blocks that are no longer on the same die. On top of this, assembling a bunch of chiplets into a complete device has its own manufacturing yield, and as a general rule you should expect that as package complexity goes up (because you increased the chiplet count), yield goes down.

Finally, Apple has a world-class team of engineers working on their chips. What makes you presume that you know better than they do what they should be doing? Maybe you should wait and see what the professionals decide to do. If it's not what you thought it would be, you could learn something.

For the record, I don't pretend to know where they'll go from here, but that's because I know enough about this topic to be aware that I personally don't have enough information to make a solid prediction. The only thing I'll offer is a very handwavy observation: Apple is a lot less cost sensitive than the wider PC industry. Even if dicing things up into a ton of small chiplets could theoretically save them money, if that compromises other things Apple cares about (power efficiency, for example), they probably won't do it.

If not, then how come Apple CANT make high-end and workstation grade chips for several years? While Apple claimed that Ultra chips are too expensive and yet only being used for limited amount of Mac computer including Mac Studio and Mac Pro, others such as Nvidia has no issues of making high-end and workstation grade GPU or even go beyond that.

Also, you are totally forgetting that Ultra series are extremely inefficient and waste of money due to the size of die. It's NOT just connecting two Max chips as TSMC mentioned and it only makes it much more expensive and yet low yield. Introducing M3 Ultra while we already have M4 series and waiting for M5 series is another proof as well.

At this point, Apple is struggling with making high-end or workstation grade chips that they can use for Mac Studio, Mac Pro, and even their own server. It has been 5 years since M series released and yet, if they still cant figure out how to make chips for Mac Pro, then it only proves my point. Hell, Apple still not able to replace Mac Pro 2019 completely yet.

Also, I have no idea why you are being so negative especially toward chiplet. Tell me if Intel, AMD, or Nvidia creates SoC including CPU, NPU, GPU, controllers, and others all together as a single chip. NONE. SoC itself is more suitable for mobile devices, not desktop and workstation after all.

Btw being a world class has nothing to do with your claim and even they make a lot of mistakes and problems as they make stupid decisions every single time.

Antony Newman · Mar 15, 2025

There are some workloads that the current top spec Mac Studio are not suitable for - where a HPC workstation with 4TB of RAM or 4 x beefy Nvidia cards are more suitable. There are, however, many ‘workstation’ workloads that the Mac Studio is well suited (and some would argue well priced).

Apple’s processor team have rarely pushed for maximum performance from their SoC’s without regard to electrical power - and instead pushed for computation efficiency (performance/watt) even when that has entailed using more transistors (and correspondingly larger SoCs).

When the Mac Pro is eventually updated we will no doubt see what Apple thinks is suitable for the workstation market.
It does not seem inconceivable that an M5 Ultra could address 2-4TB of memory, or have a performance that is twice that an M3 Ultra. It may meet the needs of >95% of the professional market - but there will still be specialist requirements where number crunching requires a dedicated accelerator (which far higher performance/watt) or a boat load of GPUs.

With Apple pulling their Hidra ‘workstation’ SoC team to work on (larger) Data Centre architectures and SoCs - they will be leveraging Google’s Tensor knowhow, Nvidia’s LLM processing (collaboration), and Broadcoms DataCentre scaling magic (likely on chip network / intra data centre comms). If the work with Broadcom is intended to bear fruit for Apple’s on a TSMC node at the end of 2026 (on N2P or A16) - there is some trickledown design knowledge that could make it’s way into Apples future high end machines in 2027/2028 on the M7.

With A16 offering the same performance as N3E at half the wattage - I predict we will not see a Mac Pro with an M7; Its duties giving way to a Studio sized box - something performant enough for the vast majority Apples target power user market.

With no discussions on TB6 in the works (and Intel’s new CEO concentrating on keeping the company alive and above water) - I think we will initially see the emergence of some TB5 expansion chassis (for huge NVME RAID + PCIE cards) - followed by some specialist PCIe cards makers offering standalone TB5 versions of their products for the Apple market.

chars1ub0w · Mar 15, 2025

Part of what defines a professional workstation is max RAM. Lots of cores too. 128GB isn't really enough for big compute code. The M3 Ultra with 512GB shows Apple is pretty serious about cashing in on the workstation market NOW, even though it's tiny compared to the laptop market. We can see this as Apple is not waiting to do a M4 Ultra. More high performance cores would be nice, but a high RAM ceiling is absolutely vital to avoid swap slowdowns. And the unified RAM means it has an advantage over the Threadripper PRO (that I have) in my lab that has 4090s, although it doesn't have the 64 cores the lab machine has.

tenthousandthings · Mar 16, 2025

BNBMS said:
If not, then how come Apple CANT make high-end and workstation grade chips for several years?

Is this a reference to the delay in the release of M3 Ultra?

BNBMS said:
While Apple claimed that Ultra chips are too expensive and yet only being used for limited amount of Mac computer including Mac Studio and Mac Pro, others such as Nvidia has no issues of making high-end and workstation grade GPU or even go beyond that.

Apple has never claimed anything like this. The best description of the priorities of the Apple Silicon team is found in an interview of Anand Shimpi in February 2023. It is well worth reading that transcript if you are truly interested in understanding what Apple is doing.

BNBMS said:
[…] Ultra series are extremely inefficient and waste of money due to the size of die. It's NOT just connecting two Max chips as TSMC mentioned and it only makes it much more expensive and yet low yield.

This is not a fact. M1-M2-M3 Ultra uses the same TSMC advanced packaging technology as Nvidia’s Blackwell: InFO-LSI. That is a fact.

Blackwell fuses two identical dies that are the largest possible size, at the “reticle limit” of 104 billion transistors each for a total of 208 billion. Ultra fuses two identical M3 Max dies that have 92 billion transistors each for a total of 184 billion. These are also facts.

That Blackwell GPU will start at around US $30,000. Apple’s M3 Ultra, built on a more advanced, more efficient node, with 512 GB of unified memory, costs $9,500. That is another fact.

BNBMS said:
Introducing M3 Ultra while we already have M4 series and waiting for M5 series is another proof as well.

If this is “proof” of anything, it suggests the Mac Studio is no longer on the same product tier as the Mac Pro. It does not suggest that Apple does not have a plan for the Mac Pro. See my earlier comment in reply to you above.

BNBMS said:
At this point, Apple is struggling with making high-end or workstation grade chips that they can use for Mac Studio, Mac Pro, and even their own server. It has been 5 years since M series released and yet, if they still cant figure out how to make chips for Mac Pro, then it only proves my point. Hell, Apple still not able to replace Mac Pro 2019 completely yet. […] Tell me if Intel, AMD, or Nvidia creates SoC including CPU, NPU, GPU, controllers, and others all together as a single chip. NONE. SoC itself is more suitable for mobile devices, not desktop and workstation after all.

M1-M2-M3 Ultra uses an advanced “chiplet” design. The two Max function as chiplets. It is true that Intel Core Ultra, AMD Ryzen G-series, and Nvidia GB10 in Project DIGITS all use chiplet designs. But they also all have integrated graphics, so the end result is comparable to Apple’s Ultra.

innerproduct · Mar 19, 2025

I wonder, could apple use some significantly faster memory for their hidra project? Like hbm or some other tech? Would that require some total rearchitecting or is it mostly a cost/benfit issue? Clearly the mem bandwidth of the ultra chips is the limiting factor atm for actually becoming relevant for running large llms in meaningful ways.

Chancha · Mar 19, 2025

innerproduct said:
I wonder, could apple use some significantly faster memory for their hidra project? Like hbm or some other tech? Would that require some total rearchitecting or is it mostly a cost/benfit issue? Clearly the mem bandwidth of the ultra chips is the limiting factor atm for actually becoming relevant for running large llms in meaningful ways.

https://3dfabric.tsmc.com/english/dedicatedFoundry/technology/SoIC.htm

I think Apple Silicon is still with the bottom left approach, M3 onwards. They have gone on record saying better packaging is one thing they are looking forward to, which probably mean the two on the right. Whether or not HBM can be involved depends.

Antony Newman · Mar 19, 2025

If Apple adopt a denser (W.Hr/kg) battery - they could forgo advanced packaging of lowest tier M5 chip on N3P running at a (speculation) 10% higher frequency - through use of Pyrolytic (or equivalent) sheets heat spreader (or heat pipe).

If Apple intend to grow their NPU/GPU processing for the M5 Pro / M5 Max and are thermally constrained, and TSMC yields of N2 were expected to be in the 50-60% range this year (and design and fab costs are still sky high), and Apple themselves would like to use higher end M5s in their US data centres - it seems likely that Apple would opt for the DataCentre packaging option that would maximise the performance per watt.

Another reason that Apple might have wanted to stick with N3(P) is that TSMC packaging (stacking of SOIC) is not expected to be ready for N2 until 2027 (anandtech).

tenthousandthings · Mar 19, 2025

innerproduct said:
I wonder, could apple use some significantly faster memory for their hidra project? Like hbm or some other tech? Would that require some total rearchitecting or is it mostly a cost/benfit issue? Clearly the mem bandwidth of the ultra chips is the limiting factor atm for actually becoming relevant for running large llms in meaningful ways.

There was a thread on memory types recently, here:

https://forums.macrumors.com/threads/would-apples-custom-ai-server-chip-use-hbm-or-stick-with-lpddr.2448848/

It does look like there are advances in memory on the horizon, but for now it seems like HBM is too expensive for use in a Mac Studio or Mac Pro, and it really only makes sense for massive, scaled deployments. So Apple might be using it for PCC, but it’s unlikely to appear in retail products.

Also, @leman said something to the effect that Apple’s use of memory amounts to “HBM lite” anyhow.

tenthousandthings · Mar 19, 2025

Antony Newman said:
If Apple adopt a denser (W.Hr/kg) battery - they could forgo advanced packaging of lowest tier M5 chip on N3P running at a (speculation) 10% higher frequency - through use of Pyrolytic (or equivalent) sheets heat spreader (or heat pipe).

If Apple intend to grow their NPU/GPU processing for the M5 Pro / M5 Max and are thermally constrained, and TSMC yields of N2 were expected to be in the 50-60% range this year (and design and fab costs are still sky high), and Apple themselves would like to use higher end M5s in their US data centres - it seems likely that Apple would opt for the DataCentre packaging option that would maximise the performance per watt.

Another reason that Apple might have wanted to stick with N3(P) is that TSMC packaging (stacking of SOIC) is not expected to be ready for N2 until 2027 (anandtech).

I rate the Jeff Pu research (the source of the A20-on-N3P rumor) as dubious, since his CoWoS claim seems pretty far-fetched. I could be wrong, but the only uses I’m aware of for that are in high-end server/workstation chips and the like.

I could see CoWoS used to make an “Extreme” configuration for the Mac Pro (2x Ultra) but it’s difficult to imagine how it would apply to a phone or tablet, or even a laptop.

But yes, I’m guessing either N2 or N2P for M7 in 2027.

Also, let’s not forget that TSMC 2nm introduces a new transistor, what they are calling “Nanosheet” — so it’s a big change. I could see Apple being cautious with it, and M6 staying on TSMC 3nm.

crazy dave · Mar 19, 2025

Antony Newman said:
If Apple adopt a denser (W.Hr/kg) battery - they could forgo advanced packaging of lowest tier M5 chip on N3P running at a (speculation) 10% higher frequency - through use of Pyrolytic (or equivalent) sheets heat spreader (or heat pipe).

If Apple intend to grow their NPU/GPU processing for the M5 Pro / M5 Max and are thermally constrained, and TSMC yields of N2 were expected to be in the 50-60% range this year (and design and fab costs are still sky high), and Apple themselves would like to use higher end M5s in their US data centres - it seems likely that Apple would opt for the DataCentre packaging option that would maximise the performance per watt.

Another reason that Apple might have wanted to stick with N3(P) is that TSMC packaging (stacking of SOIC) is not expected to be ready for N2 until 2027 (anandtech).

It depends on top vs bottom die. The anandtech article actually says 2026 for N2+N3 SOIC, 2027 is N2+A16 SOIC.

tenthousandthings said:
There was a thread on memory types recently, here:

https://forums.macrumors.com/threads/would-apples-custom-ai-server-chip-use-hbm-or-stick-with-lpddr.2448848/

It does look like there are advances in memory on the horizon, but for now it seems like HBM is too expensive for use in a Mac Studio or Mac Pro, and it really only makes sense for massive, scaled deployments. So Apple might be using it for PCC, but it’s unlikely to appear in retail products.

Also, @leman said something to the effect that Apple’s use of memory amounts to “HBM lite” anyhow.

innerproduct said:
I wonder, could apple use some significantly faster memory for their hidra project? Like hbm or some other tech? Would that require some total rearchitecting or is it mostly a cost/benfit issue? Clearly the mem bandwidth of the ultra chips is the limiting factor atm for actually becoming relevant for running large llms in meaningful ways.

Now that Apple has seemingly shown a willingness to go back and modify previous dies for new products, even though this might be a much more substantial difference, it's possible Apple could make one version with HBM for their PCC data center and one for the public with LPDDR. HBM is just super expensive (and only getting more so) and basically no one has offered a consumer-facing product with it in years, especially since these days GDDR/LPDDR offer "good enough" high bandwidth memory performance in this segment with the use of one over the other depending on your application. So yeah, depending on Apple's design goals, Hidra may remain just LPDDR only or at least if Hidra ever gets a consumer variant (likely in my book to help defray costs), then it would (extremely likely) be LPDDR. But that Apple might also use HBM, at least internally, just went up a notch.

MRMSFC · Mar 19, 2025

crazy dave said:
It depends on top vs bottom die. The anandtech article actually says 2026 for N2+N3 SOIC, 2027 is N2+A16 SOIC.

View attachment 2493758

Now that Apple has seemingly shown a willingness to go back and modify previous dies for new products, even though this might be a much more substantial difference, it's possible Apple could make one version with HBM for their PCC data center and one for the public with LPDDR. HBM is just super expensive (and only getting more so) and basically no one has offered a consumer-facing product with it in years, especially since these days GDDR/LPDDR offer "good enough" high bandwidth memory performance in this segment with the use of one over the other depending on your application. So yeah, depending on Apple's design goals, Hidra may remain just LPDDR only or at least if Hidra ever gets a consumer variant (likely in my book to help defray costs), then it would (extremely likely) be LPDDR. But that Apple might also use HBM, at least internally, just went up a notch.

I was under the impression that Apple’s “poor man’s HBM” system or however they architected their RAM connection makes any theoretical gains from HBM minisucule?

Speaking of which it’s been over a decade since I’ve first heard of HBM being used in any consumer product (the Radeon R9 Fury), and it’s still expensive? Are we on HBM 3 now, or is it still 2?

Pressure · Mar 19, 2025

MRMSFC said:
I was under the impression that Apple’s “poor man’s HBM” system or however they architected their RAM connection makes any theoretical gains from HBM minisucule?

Speaking of which it’s been over a decade since I’ve first heard of HBM being used in any consumer product (the Radeon R9 Fury), and it’s still expensive? Are we on HBM 3 now, or is it still 2?

VEGA10 (Radeon RX VEGA 56 / 64) and VEGA20 (Radeon VII) also uses HBM.

crazy dave · Mar 19, 2025

Pressure said:
VEGA10 (Radeon RX VEGA 56 / 64) and VEGA20 (Radeon VII) also uses HBM.

Aye, it's been awhile since HBM was used in a consumer product - what 7 years and counting? I mean Nvidia is out to release the DGX station with HBM but I really doubt that one could call that consumer with a straight face:

NVIDIA DGX Station

The Ultimate Desktop AI Supercomputer Powered by NVIDIA Grace Blackwell.

www.nvidia.com

I mean it is technically a "desktop workstation" but I suspect you're looking at several tens of thousands of dollars and then one gets to count the PCIe versions of Hopper/Blackwell as "consumer".

MRMSFC said:
I was under the impression that Apple’s “poor man’s HBM” system or however they architected their RAM connection makes any theoretical gains from HBM minisucule?

Speaking of which it’s been over a decade since I’ve first heard of HBM being used in any consumer product (the Radeon R9 Fury), and it’s still expensive? Are we on HBM 3 now, or is it still 2?

HBM3e and yup it's still very expensive. HBM4 is coming up and the new even higher throughput interconnect is said to be even more expensive - like a lot more.

As for HBM versus LPDDR, there was a long discussion with lots of hard numbers in the link @tenthousandthings gave in the previous posts. Short version: it depends on the application and how much RAM you need. Basically for the same amount of RAM and a much smaller physical package (because it's stacked), you can have much higher bandwidth with HBM and on a per bandwidth basis be just as energy efficient. That's how Nvidia can get multi-TB/s bandwidth on like 144-288GB of HBM for professional Hopper and Blackwell GPUs. However, the cost is quite high and for many applications overkill. High bandwidth LPDDR makes the most sense for all Apple's (consumer-facing) products. If Apple were to design a data center specific die with HBM and wanted to dual use it to help defray development costs, they'd probably have to at least redesign the memory controller for the consumer product. I don't know how easy that is or how much Apple would have to change.

But who knows right? Maybe Apple will not only use HBM in their data center but release a not-really-consumer product with it into the wild a la the DGX station above as a Mac Pro for $40,000 base (I'm making the number up).

quarkysg · Mar 19, 2025

crazy dave said:
But who knows right? Maybe Apple will not only use HBM in their data center but release a not-really-consumer product with it into the wild a la the DGX station above as a Mac Pro for $40,000 base (I'm making the number up).

The way that Apple implemented their memory controllers would be like a distributed form of HBM?

What would HBM offer to Apple vs what they are already doing? Maybe simpler memory controller? But that would mean having to support HBM even for the base Mx SoC? Otherwise the design will be forked where multiple memory controllers have to be maintained.

The way I see it, Apple could just increase the 128-bits memory controller channel to say 256-bits per memory controller and get better value for their design dollars compared to using HBM?

crazy dave · Mar 19, 2025

quarkysg said:
The way that Apple implemented their memory controllers would be like a distributed form of HBM?

Kind of.

quarkysg said:
What would HBM offer to Apple vs what they are already doing? Maybe simpler memory controller? But that would mean having to support HBM even for the base Mx SoC? Otherwise the design will be forked where multiple memory controllers have to be maintained.

The way I see it, Apple could just increase the 128-bits memory controller channel to say 256-bits per memory controller and get better value for their design dollars compared to using HBM?

I'm far from an expert, but from what I know and wrote in the other thread, the smallest bandwidth bus unit that you can get off a single 24GB HBM3e stack is 1024bit, which is what Apple offers in the M3 Ultra and the minimum RAM there is 96GB distributed I think across 16 6GB LPDDR RAM sticks (this may not be the actual minimum amount of LPDDR RAM you can get away with with a 1024bit LPDDR bus but you get the idea). This means, with HBM, you can have far less RAM in far less area delivering the same amount of bandwidth and if you have the same amount of HBM RAM as LPDDR RAM, you can deliver far more bandwidth. You can technically offer as much bandwidth in LPDDR as HBM, but it would require a much larger "shoreline" die area, a much larger minimum amount of RAM, and a much larger package size.* But yes you pay a premium for all that. That no one offers a consumer product with HBM anymore and haven't for a long time suggests the premium is quite high (and the use case for most consumer systems is dubious given the price).

As far as I can tell, HBM is as power efficient as LPDDR per bandwidth, if not more, (though not necessarily per GB) but power efficiency numbers are hard to come by.

*Edit: I should also state that Apple in the R1 played around with a new LPDDR RAM packaging that effectively doubled bandwidth while decreasing power. I've seen research from other companies exploring similar ideas. Probably more expensive than normal LPDDR RAM packaging, but not sure how much more or how it compares to HBM.

M4+ Chip Generation - Speculation Megathread [MERGED]

macrumors 6502

Contributor

macrumors regular

macrumors member

macrumors 6502

macrumors 6502

macrumors member

Contributor

Contributor

macrumors 6502a

macrumors member

macrumors member

macrumors 6502

Contributor

macrumors regular

macrumors 68030

macrumors member

Contributor

Contributor

macrumors 68000

macrumors 6502

macrumors 603

macrumors 68000

macrumors 65816

macrumors 68000

Our Staff