Is the M1 GPU sharing LPDDR4 RAM going to be enough?

BenRacicot · Jul 26, 2021

@leman have you seen the rumors of the Nvidia 4090 spec? Please explain to all of us how this use of HBM RAM could confirm that LPDDR4 is lackluster.

I personally believe that if the M1X is using LPDDR4 (potentially without Fabric) then it is now outdated.

we may want to wait for any M2 we’re any shared RAM is not a bottleneck. Please explain in detail.

leman · Jul 26, 2021

BenRacicot said:
@leman have you seen the rumors of the Nvidia 4090 spec? Please explain to all of us how this use of HBM RAM could confirm that LPDDR4 is lackluster.

Can you point me to a specific bit you’d like me to comment on? Not sure why you mention HBM or LPDDR4, the rumors I’ve seen talked about GDDR6X…

BenRacicot said:
I personally believe that if the M1X is using LPDDR4 (potentially without Fabric) then it is now outdated.

My personal bet is 256-bit LPDDR5. But why would LPDDR4 be outdated? And what’s “Fabric”?

BenRacicot said:
we may want to wait for any M2 we’re any shared RAM is not a bottleneck. Please explain in detail.

Whether shared RAM is a bottleneck or not depends on your application. If M1 is lackluster for your purpose, you’d probably want to wait for the prosumer Apple Silicon.

diamond.g · Jul 26, 2021

leman said:
Can you point me to a specific bit you’d like me to comment on? Not sure why you mention HBM or LPDDR4, the rumors I’ve seen talked about GDDR6X…

My personal bet is 256-bit LPDDR5. But why would LPDDR4 be outdated? And what’s “Fabric”?

Whether shared RAM is a bottleneck or not depends on your application. If M1 is lackluster for your purpose, you’d probably want to wait for the prosumer Apple Silicon.

Rumors state that hopper should be MCM instead of Lovelace (reported 4090). I would be curious about the memory controller, if each chip would be 384 and have full access to all of the frame buffer or not.

BenRacicot · Jul 27, 2021

@diamond.g yes! I think I read that Nvidia is likely waiting for the competition's specs but may leapfrog to Hopper.

@leman you're right. In the Tweaktown article they speak of Micron's GDDR6X, not specifically HBM. But the snippet from Micron discusses the prevalence of HBM2E.

(Regarding "Fabric" I mean TSMC's 3DFabric stacking tech)

Screen Shot 2021-07-27 at 8.04.44 AM.png

Who am I kidding? The M-series will probably see LPDDR4X for many more iterations.

diamond.g · Jul 27, 2021

BenRacicot said:
@diamond.g yes! I think I read that Nvidia is likely waiting for the competition's specs but may leapfrog to Hopper.

@leman you're right. In the Tweaktown article they speak of Micron's GDDR6X, not specifically HBM. But the snippet from Micron discusses the prevalence of HBM2E.

(Regarding "Fabric" I mean TSMC's 3DFabric stacking tech)

View attachment 1811412

Who am I kidding? The M-series will probably see LPDDR4X for many more iterations.

Well the Steam Deck is using LPDDR5 so I don’t see why Apple couldn’t as well.

leman · Jul 27, 2021

diamond.g said:
Well the Steam Deck is using LPDDR5 so I don’t see why Apple couldn’t as well.

One could also ask why they didn't use LPDDR5 with M1 already. I guess in the end it has to do with component availability. Apple's problem is that they have to ship a lot of these chips, and it is likely that there was simply not enough supply of LPDDR5... let's hope that can find some for the prosumer Macs

A 256-bit LPDDR5 would have bandwidth around 200GB/s, which is not too shabby.

BenRacicot · Aug 4, 2021

leman said:
One could also ask why they didn't use LPDDR5 with M1 already. I guess in the end it has to do with component availability. Apple's problem is that they have to ship a lot of these chips, and it is likely that there was simply not enough supply of LPDDR5... let's hope that can find some for the prosumer Macs A 256-bit LPDDR5 would have bandwidth around 200GB/s, which is not too shabby.

Oh, I thought that since it's "on-chip" that it would be part of the lith process when the cores are made. Are you saying they bring in the RAM separately and add it to the chip?

leman · Aug 4, 2021

BenRacicot said:
Oh, I thought that since it's "on-chip" that it would be part of the lith process when the cores are made. Are you saying they bring in the RAM separately and add it to the chip?

RAM is separate chips, they are just integrated in the SoC package. It's on-package, not on-chip.

AgentMcGeek · Aug 4, 2021

A picture is worth a thousand words.

BenRacicot · Aug 4, 2021

Wow, I can't believe I've never seen that before. Ok well that makes it a lot more realistic for them update its type.
Thanks for explaining.

BenRacicot · Sep 18, 2021

Semianalysis.com just wrote their thoughts on the A15 and believe it may contain LPDDR5 and that it’s cache has been doubled to 32M

It also goes on to say that LPDDR5 may not be present:

“CPU gains are 7.7% in single thread overall. In general increases are about the same as clock increases from 3GHz to 3.23GHz. The weak scaling on some of these indicates to me that LPDDR5 may not be present. Part of the motivation for this could be that the doubled LLC was sufficient and they didn’t need to spend the ~30% more on LPDDR5 vs LPDDR4x. Some tests may benefit from the LLC doubling to 32MB, others won’t scale perfectly with clocks.”

this may be hopeful news for the new M series.

leman · Sep 18, 2021

BenRacicot said:
Semianalysis.com just wrote their thoughts on the A15 and believe it may contain LPDDR5 and that it’s cache has been doubled to 32M

It also goes on to say that LPDDR5 may not be present:

“CPU gains are 7.7% in single thread overall. In general increases are about the same as clock increases from 3GHz to 3.23GHz. The weak scaling on some of these indicates to me that LPDDR5 may not be present. Part of the motivation for this could be that the doubled LLC was sufficient and they didn’t need to spend the ~30% more on LPDDR5 vs LPDDR4x. Some tests may benefit from the LLC doubling to 32MB, others won’t scale perfectly with clocks.”

this may be hopeful news for the new M series.

The iPhone probably doesn’t need LPDDR5, it is already fast enough. The new Mac prosumer chips could be a different matter though, they are much more likely to need significantly improved bandwidth (especially on the rumored 32-core GPU). Given the relative scarcity of LPDDR5, it would make perfect sense for Apple to continue using cheaper, more ubiquitous LPDDR4X for the phones while reserving the harder to get faster RAM for the Macs.

P.S. Some android flagships can afford shipping LPDDR5 because their volume is not nearly close to the iPhone…

AgentMcGeek · Sep 18, 2021

So, at this point, do we think M1X will be derived from A14 or A15?
I had assumed A14, but seems like the changes in A15 are quite minor, and now I'm thinking lots of these tweaks may end up in the MBP chip design.

sunny5 · Sep 18, 2021

LPDDR5 is not enough. The bandwidth is extremely poor compared to GDDR6 and HBM2e. I really don't think M1X will use LPDDR5 instead of DDR5.

leman · Sep 18, 2021

sunny5 said:
LPDDR5 is not enough. The bandwidth is extremely poor compared to GDDR6 and HBM2e.

Depends on how many memory channels they use. A 256-bit LPDDR5 interface can easily reach above 180GB/s, which, paired with Apple's huge caches and TBDR GPU should be enough to reach performance levels of GPUs with GDDR6.

sunny5 said:
I really don't think M1X will use LPDDR5 instead of DDR5.

What would that do? They offer the same performance. In fact, the latest standard LPDDR5X is faster than the standard DDR5 if I remember correctly.

sunny5 · Sep 18, 2021

leman said:
Depends on how many memory channels they use. A 256-bit LPDDR5 interface can easily reach above 180GB/s, which, paired with Apple's huge caches and TBDR GPU should be enough to reach performance levels of GPUs with GDDR6.

Speed and bandwidth are separate things. LPDDR5 is still far from being able to replace GDDR6. Not only that, 16 inch MBP used HBM2 graphic cards which is more faster than GDDR6.

theorist9 · Sep 18, 2021

Homy said:
Actually M1 uses LPDDR4X which is faster than LPDDR4: https://www.anandtech.com/show/16252/mac-mini-apple-m1-tested/3

5GB RAM usage limit is for iPad OS not Mac OS. M1 GPU is faster than Radeon RX 560X and sometimes even as fast as GF 1650 (link above).

TFLOPS is not everything but the rumored 128-core GPU would be crazy fast. It would be faster than any GPU on the market, including GF 3090!!

M1 8 GPU cores 2.6 TFLOPS
M? 16 GPU cores 5.2 TFLOPS
M? 32 GPU cores 10.4 TFLOPS
M? 64 GPU cores 20.8 TFLOPS
M? 128 GPU cores 41.6 TFLOPS

Radeon Pro 5700 6.2 TFLOPS
Radeon Pro 5700 XT 7.7 TFLOPS
Radeon Pro Vega II 14.06 TFLOPS
Radeon Pro Vega II Duo 2x14.06 TFLOPS
GF RTX 3060 14.2 TFLOPS
GF RTX 3060 Ti 16.2 TFLOPS
Radeon RX 6800 16.2 TFLOPS
GF RTX 3070 20.3 TFLOPS
Radeon RX 6800 XT 20.7 TFLOPS
Radeon RX 6900 XT 23 TFLOPS
GF RTX 3080 29.8 TFLOPS
GF RTX 3090 35.6 TFLOPS

I extrapolated some gaming benchmarks for M2 and it will be impressive (1260p is for iMac 24"):

- M1 GPU 8 cores: Borderlands 3 1080p Ultra 22 fps - medium 30 fps (1260p 19-26, 1440p 15-23)
- M2 GPU 16 cores 1440p 30-46 fps, 32 cores 1440p 60-92 fps

- M1 GPU 8 cores: Deus Ex: Mankind Divided 1080p Ultra 24 fps (1260p 20, 1440p 18)
- M2 GPU 16 cores 1440p 36 fps, 32 cores 72 fps

- M1 GPU 8 cores: Shadow of the Tomb Raider 1080p Medium 24 fps (1260p 20, 1440p 18)
- M2 GPU 16 cores 1440p 36 fps, 32 cores 72 fps

- M1 GPU 8 cores: Metro Exodus 1080p medium 25-45 fps (1260p 21-38, 1440p 19-35)
- M2 GPU 16 1440p 38-70 fps, 32 cores 76-140 fps

32-core M2 GPU doing 60 fps at 1440p Ultra in Borderlands 3 (via Rosetta 2) will be on par with Radeon 5700 XT, RTX 2070 Super, 2080 or 1080 Ti.

GPU performance often increases proportionally thanks to parallel computing. If everything else in the architecture is the same more cores means that you can render more stuff at the same time. I don't know about all games but many games, especially newer, can take advantage of that. It's not always the case in reality and 4x more cores in theory doesn't always mean 4x the performance, but we can always hope when we're guessing, especially when M1 GPU already has exceeded our expectations.

We know that M1 with 8-core GPU at 10W can perform as good as other GPUs with much higher TDP. So a M2 with 32-core GPU at 40W could perform as the 2070 Super at 200W. I used the benchmarks in the videos below where M1 gets 22 fps at 1080p ultra in BL3 built-in benchmark and about 30 in game play. M2 32-core GPU would manage around 60 at 1440p Ultra while 2070 Super manages 56-66 at the same settings. I don't even take into account that M2 may have faster CPU or higher clocked GPU and LPDDR5 or other new benefits. It will be very exiting to see what Apple can come up with.

IIUC, the 128-core GPU was rumored for the Mac Pro, not the consumer/prosumer Macs. So if you're doing comparisons with the 128 core AS GPU, you'll want to add NVIDIA's most powerful GPU for professional video work to your list: the RTX A6000 (39 TFLOPS).

And, of course, just as AS's GPU processing capability is scalable by adding more cores, the GPU-processing capability of NVIDIA-equipped systems is scalable by using multiple GPU's. E.g., this testing from Puget Systems shows approximately linear scaling with up to four RTX A6000's (⇒ ≈156 TFLOPS).

The AS Mac Pro likely won't be able to compete at that rarefied end of the workstation market, since it seems unlikely Apple would offer the option of outfitting it with 4 x 128 core AS GPUs. But they might offer, say, 2 x 128 as a modular upgrade option*, which would be enough for nearly all of the workstation market (the fraction that need more is likely tiny). [*And given AS's unified memory architecture, it's possible the moduar upgrade would only be offered as a CPU+GPU module, rather than additional GPU cores only.]

GPU Rendering - NVIDIA RTX A6000 Multi-GPU Scaling

With the launch of Nvidia's RTX A6000 video card, we look at how well these cards scale in multi-GPU configurations for rendering in Redshift, OctaneRender, and V-Ray.

www.pugetsystems.com

AgentMcGeek · Sep 18, 2021

sunny5 said:
LPDDR5 is not enough. The bandwidth is extremely poor compared to GDDR6 and HBM2e. I really don't think M1X will use LPDDR5 instead of DDR5.

Apple has been using LPDDR on all its MBP for quite some time now, way before the ASi transition. I see no reason they would roll back now. In fact, MBP were stuck for a while on LPDDR3 even though DDR4 was out. Just because the LP version wasn't out and supported yet.

ChrisA · Sep 18, 2021

leman said:
The reason why I am asking is because unified memory is pretty much the default assumption across the entire range of Apple Silicon Macs (Apple communication has been fairly clear on this so far). So every time someone suggests that they are going to use separate VRAM I have to ask why that person thinks that Apple would break their elegant design and what kind of benefit that would bring in their opinion.

Your problem is you are asking this on a general forum. There are very few engineers or computer scientests here.

Gaming enthusisets don't yet onderstand thew idea of nonhomogenius multi processing others then having two kinds of processors, CPU and GPU.

Back when the GPU was made by a different company as an add-on product there was not option BUT to include the VRAM on the GPU card. but now that all is in the hands of one comppany and there is no option on add-on parts, Apple cross-barr multipled RAM is ideal. They can get any desired RAM bandwidth ny adding more RAM moduals and making the crossbar wider.

This is not a new idea. But it is new to gammers because few of them remeber computrer architecure before Intel's first 386 chip. Look for example of the CDC mainframes of the 1960's. They would use 10 RAM pages and MANY funtional units inside the processor along with 10 or 20 perriferal processing units for I/O. All this on a RAM crossbar switch. This type of design was not just by CDC and Cray. It was the way high-end was done back in the day when each computer company build it's own CPUs from the ground up -- Pre-Intel.

But post-Intel add-on cards had to be self-contained and not depend too much of the machine they were beibg added into.

I see Apple Silicon as a return to the old days when each company (IBM, DEC, Data General, Perkin Elmer, CDC, Sperry-Univac) all has there own architectures. Don't laught. The new model is the company designes it and TSMC makes it. Even a hobbyest, like me can make a low-perforance CPU architecture by programming a "soft CPU" in an FPGA. And if I don't like it, I can reflash the FPGA All for under $100. The bar for entry to this is now very low so I expect many to do this.

The gammers on this forrum just want what they are used to, few understand how any of this works.

sunny5 · Sep 18, 2021

AgentMcGeek said:
Apple has been using LPDDR on all its MBP for quite some time now, way before the ASi transition. I see no reason they would roll back now. In fact, MBP were stuck for a while on LPDDR3 even though DDR4 was out. Just because the LP version wasn't out and supported yet.

WRONG. 16/15 inch MBP had been using SDRAM or DDR3,4 for a long time. Also, those LPDDR3,4 MBP do NOT support an external GPU at all like 16 inch MBP which had both GDDR5,6 and HBM2.

deconstruct60 · Sep 18, 2021

AgentMcGeek said:
So, at this point, do we think M1X will be derived from A14 or A15?
I had assumed A14, but seems like the changes in A15 are quite minor, and now I'm thinking lots of these tweaks may end up in the MBP chip design.

if 2-3 years ago Apple planned for them to use A14 cores then it will have those cores. if they even 2-3 years ago planned for the MBP 16 to come out in Fall 2021 then there is a window for them to have A15 cores. if the initial target was June 2021 then A14 core foundation is probably more on track.

The "uncore" portion of the M1X wasn't going to be the same as the A14 or A15 or even the M1. That is really what mattered most. Really boils down to how much Apple wanted to control the complexity and risk of the roll out over the first 12 months of the transition. Doing A15 cores would have been a more risky gamble if hitting a particular month in the year, but would have saved work because could have skipped doing any "big die" for for the first generation. And Apple may be on a track where the "big dies" only show up on every even number generations ( that the smaller dies get done on a shorter update cycle. Same as the Ann were on a shorter updates cycle as the AnnX version. The latter only when with process node changes. The even number M series big could take a page of only targeting the "plus" increments that come after that ( N5P , N4P , N3P , etc. )

the "M1X" is going to be a much bigger die than Apple has every shipped in volume. Production acjustment controls are going to be better ( since mastered on A14 volume and A14 is on ramp down with leading edge iPhone transition to A15. More wafer to hand out on "old" node. ). Just like sticking with LPDDR4X over LPDDR5 has risk mitigations due to maturity . They could do only odd ( and only take up the bleeding edge nodes after been around for a year, instead of "P" optimizations when they are new. )

deconstruct60 · Sep 18, 2021

sunny5 said:
LPDDR5 is not enough. The bandwidth is extremely poor compared to GDDR6 and HBM2e. I really don't think M1X will use LPDDR5 instead of DDR5.

Apple doesn't care most about "poor compared" the TDP and price that GDDR6 and HBM2e have. It is about performance per watt and price. Price being a player because it is system RAM that is at issue not "VRAM" and Apple is going to want to keep their current margins. LP is just obviously better power.

Apple will just go "wider" bus and throw stuff out like external PCI-e lanes and DIMMs.

AgentMcGeek · Sep 18, 2021

sunny5 said:
WRONG. 16/15 inch MBP had been using SDRAM or DDR3,4 for a long time. Also, those LPDDR3,4 MBP do NOT support an external GPU at all like 16 inch MBP which had both GDDR5,6 and HBM2.

You may be right. From memory 13” versions have been using LPDDR for a while now. I mistakenly assumed it would be the same with the 15/16”.
I’m not sure about the eGPU though. I fail to see how that’s related to LPDDR. Apple adds its own additional TB controller anyway on the four ports 13.

deconstruct60 · Sep 18, 2021

sunny5 said:
WRONG. 16/15 inch MBP had been using SDRAM or DDR3,4 for a long time. Also, those LPDDR3,4 MBP do NOT support an external GPU at all like 16 inch MBP which had both GDDR5,6 and HBM2.

There are no discrete GPUs in macOS on M-series What the old Intel MBP did isn't particularly relevant.

sunny5 · Sep 18, 2021

AgentMcGeek said:
You may be right. From memory 13” versions have been using LPDDR for a while now. I mistakenly assumed it would be the same with the 15/16”.
I’m not sure about the eGPU though. I fail to see how that’s related to LPDDR. Apple adds its own additional TB controller anyway on the four ports 13.

deconstruct60 said:
There are no discrete GPUs in macOS on M-series What the old Intel MBP did isn't particularly relevant.

It is relevant. You still need a memory for both GPU and CPU. Apple is using LPDDR for both of them through unified memory but for higher end GPU still requires much faster memory such as HBM2. What you are saying is use LPDDR5 for GPU. Are there any high end GPU using LPDDR5 as a VRAM?

Do you really think LPDDR5 can replace HBM2? I think not. Even Nvidia's Grace CPU with A100 uses both LPDDR5 and HBM2 and then link them with NVlink. So far, nobody ever used LPDDR5 to replace both GDDR6 and HBM2 for high end and workstation GPU.

LPDDR5 is far from replacing HBM2 memory and there is no way to beat its bandwidth. How would you overcome its bandwidth physically?

Is the M1 GPU sharing LPDDR4 RAM going to be enough?

macrumors member

macrumors Core

macrumors G5

macrumors member

macrumors G5

macrumors Core

macrumors member

macrumors Core

macrumors 6502

macrumors member

macrumors member

macrumors Core

macrumors 6502

Suspended

macrumors Core

Suspended

macrumors 601

macrumors 6502

macrumors G5

Suspended

macrumors G5

macrumors G5

macrumors 6502

macrumors G5

Suspended

Our Staff