Is the M1 GPU sharing LPDDR4 RAM going to be enough?

leman · Oct 18, 2021

BenRacicot said:
Oh, hmm. Well now let's get into it! Question, does 64GB of shared memory play into this somehow?

Not really, it’s just more RAM.

diamond.g said:
Looks like the whole M1 Max can pull ~90 watts? Was I seeing those slides correctly?

I thought the GPU was like 50W, but I don’t remember

killawat · Oct 18, 2021

What a time to be a MacBook user. If only Steve could have seen this himself.

Homy · Oct 18, 2021

A very interesting thing is the number of execution units. It's huge compared to M1.

M1: 16 billion transistors, 128 execution units, 2,6 TFLOP, 200 GB/s memory bandwidth
M1 Pro: 33.7 billion transistors, 2048 execution units, 5.2 TFLOP, 200 GB/s memory bandwidth
M1 Max: 57 billion transistors, 4096 execution units, 10.4 TFLOP, 400 GB/s memory bandwidth

Apple graphs compare the M1 Pro and Max to Lenovo Legion 5 82JW0012US with RTX 3050 Ti M 4 GB and Razer Blade 15 Advanced RZ09-0409CE53with RTX 3080 M 16 GB.

In their graphs M1 Max performs better than 3080 M!!

BenRacicot · Oct 18, 2021

Are you all seeing this mention of LPDDR5?

Found this but not sure if it's correct:
"LPDDR5 offers a max data rate of 6400Mbps, while the LPDDR5X offers 8533Mbps." - gizbot.com

If so this was also unexpected. What can it do for the GPU perf?

Screen Shot 2021-10-18 at 3.20.58 PM.png

Malus120 · Oct 18, 2021

BenRacicot said:
Are you all seeing this mention of LPDDR5?

Found this but not sure if it's correct:
"LPDDR5 offers a max data rate of 6400Mbps, while the LPDDR5X offers 8533Mbps." - gizbot.com

If so this was also unexpected. What can it do for the GPU perf?

View attachment 1870021

Apple has already stated memory bandwidth is 400GB/s (which is absolutely insane) on the M1 Max. The numbers you're quoting are talking about the max bandwidth of the LPDDR5 spec, not the speed of any given implementation.

Macintosh IIcx · Oct 18, 2021

Interestingly, we can compare to other unified memory architectures out there, namely PlayStation 5 (448GB/s) and Xbox Series X (10GB @ 560 GB/s, 6GB @ 336 GB/s).
I believe that M1 Max will offer much lower latency since the Xbox and PS5 uses GDDR6 memory, but still interesting. Color me impressed with Apple as well....

deconstruct60 · Oct 18, 2021

BenRacicot said:
Are you all seeing this mention of LPDDR5?

Found this but not sure if it's correct:
"LPDDR5 offers a max data rate of 6400Mbps, while the LPDDR5X offers 8533Mbps." - gizbot.com

If so this was also unexpected. What can it do for the GPU perf?

View attachment 1870021

NOTE: the 256-bit interface. The standard LPDDR5 interface is 16 bits. so 16 memory channels ( a 2x of the M1's 8 ).

The M1 Max has 512-bit interface which is 32 memory channels.

They don't need ultimate Maximum LPDDR5, they are drawing data at much slower clock rates on a much wider path. This is only way going to keep up with the large increase in execution units asking for different bits of data.

P.S. That said I think Apple isn't being transparent here. There is some concoction to their number.

leman · Oct 18, 2021

Homy said:
A very interesting thing is the number of execution units. It's huge compared to M1.

M1: 16 billion transistors, 128 execution units, 2,6 TFLOP, 200 GB/s memory bandwidth

M1 has 1024 execution units. An Apple GPU core has 128 units (4x32-wide ALUs)

Homy said:
In their graphs M1 Max performs better than 3080 M!!

For Pro workloads, yes.

BenRacicot said:
Are you all seeing this mention of LPDDR5?

Yes, this is some very fast LPDDR5, which was unexpected to say the least. The 512-bit interface on the big chip is a shock. What does it mean for GPU performance? Well, there is enough bandwidth there to challenge mid-range desktop GPUs. Actually more than that given the large caches of these chips.

Boil · Oct 18, 2021

leman said:
Yes, this is some very fast LPDDR5, which was unexpected to say the least. The 512-bit interface on the big chip is a shock. What does it mean for GPU performance? Well, there is enough bandwidth there to challenge mid-range desktop GPUs. Actually more than that given the large caches of these chips.

I was not sure Apple would go with ~~DDR5~~ LPDDR5, but I was hoping, because ~~DDR5~~ LPDDR5 would be the right move for such an advanced SoC (giving everything the bandwidth it needs)...?

Excited to see the (rumored) Jade 2C & Jade 4C products next year...! ;^p

leman · Oct 18, 2021

Boil said:
I was not sure Apple would go with DDR5, but I was hoping, because DDR5 would be the right move for such an advanced SoC (giving everything the bandwidth it needs)...?

It’s not DDR5, it’s LPDDR5. Different tech.

Boil · Oct 18, 2021

You know what I meant...! ;^p

deconstruct60 · Oct 18, 2021

leman said:
Yes, this is some very fast LPDDR5, which was unexpected to say the least. The 512-bit interface on the big chip is a shock. What does it mean for GPU performance? Well, there is enough bandwidth there to challenge mid-range desktop GPUs. Actually more than that given the large caches of these chips.

On the M1 Apple was treating the LPDDR4 like LPDDR5. In terms of bus width and higher number of memory channels.

8 channels at 16 bits. ---> 128.

quadrupled 8 GPU cores into 32 cores

4 * 128 ---> 512

It shouldn't be that shocking. Straightforward linear increase of the "non Pro" version width that matches the GPU core count increase.

P.S. Have more even more semi-custom LPDDR5 RAM packages here. The M1 Pro's two packages are doing width work of four packages from M1 and the M1 Max's is even better space/volume reduction.

leman · Oct 18, 2021

deconstruct60 said:
On the M1 Apple was treating the LPDDR4 like LPDDR5. In terms of bus width and higher number of memory channels.

8 channels at 16 bits. ---> 128.

quadrupled 8 GPU cores into 32 cores

4 * 128 ---> 512

It shouldn't be that shocking. Straightforward linear increase of the "non Pro" version width that matches the GPU core count increase.

What I mean that I expected 256bit LPDDR5 RAM (in fact, I was mentioning 200GB/s multiple times in the last months). But I certainly didn’t expect a 512bit variant. That’s a very pleasant surprise.

Homy · Oct 18, 2021

leman said:
M1 has 1024 execution units. An Apple GPU core has 128 units (4x32-wide ALUs)

I was referring to this image from last year. So that was per GPU core apparently, which I didn't know but now they showed a total number, hence my confusion.

Skärmavbild 2021-10-18 kl. 21.21.00.png

deconstruct60 · Oct 18, 2021

leman said:
What I mean that I expected 256bit LPDDR5 RAM (in fact, I was mentioning 200GB/s multiple times in the last months). But I certainly didn’t expect a 512bit variant. That’s a very pleasant surprise.

Well might not be as pleasant for the the additional RAM packages for those with limited budgets. 512-bit means also required to buy more RAM packages. Apple is charging a pretty hefty sum for that. More money for Apple.. again shouldn't be too surprising. Going from M1 Pro to 'binned' M1 Max is about $900.

leman · Oct 18, 2021

deconstruct60 said:
Well might not be as pleasant for the the additional RAM packages for those with limited budgets. 512-bit means also required to buy more RAM packages. Apple is charging a pretty hefty sum for that. More money for Apple.. again shouldn't be too surprising. Going from M1 Pro to 'binned' M1 Max is about $900.

It’s a hefty price but IMO justified (both from the component perspective and the market value perspective). It’s a beast of a workstation machine. A similarly priced x86 workstation just doesn’t compare. And 32+GB on a laptop GPU is simply unprecedented.

BenRacicot · Oct 21, 2021

leman said:
It’s a hefty price but IMO justified (both from the component perspective and the market value perspective). It’s a beast of a workstation machine. A similarly priced x86 workstation just doesn’t compare. And 32+GB on a laptop GPU is simply unprecedented.

I have a question about this.

When gaming with 32 cores we have 8 incredible CPU cores and a neural engine sitting around with access to the same cache data don’t we?

I wonder if those other systems will able to be leveraged by developers for simultaneous computing?

leman · Oct 21, 2021

BenRacicot said:
When gaming with 32 cores we have 8 incredible CPU cores and a neural engine sitting around with access to the same cache data don’t we?

Yes we do!

BenRacicot said:
I wonder if those other systems will able to be leveraged by developers for simultaneous computing?

That's the point

The heterogeneous computing model of Apple Silicon opens up new possibilities that were not feasible with the traditional "isolated" model. We can now write software that utilizes the CPU and the GPU simultaneously to build some very interesting stuff. Now, I don't know what this stuff is going to be in the end, but I am exited

P.S. I am currently prototyping my 2D game engine on Apple Silicon where I render the dynamic game world directly from the planar graph. No triangles, no nothing. It looks very promising so far

BenRacicot · Oct 21, 2021

leman said:
Yes we do!

That's the point The heterogeneous computing model of Apple Silicon opens up new possibilities that were not feasible with the traditional "isolated" model. We can now write software that utilizes the CPU and the GPU simultaneously to build some very interesting stuff. Now, I don't know what this stuff is going to be in the end, but I am exited

P.S. I am currently prototyping my 2D game engine on Apple Silicon where I render the dynamic game world directly from the planar graph. No triangles, no nothing. It looks very promising so far

This is incredible. More than what I had hoped for. You’re project sounds awesome.

ChrisA · Oct 21, 2021

Chozes said:
Chances are Apple will use separate VRAM. Much like any dedicated gpu.

Absolutly not. They seem very committed to the big matrix switch for RAM and all those processing units.

The entire reason for VRAM is because you have low bandwidth to the rest of the system. Apple does not have that problem, so they don't need that solution. Shared memory means "zero copy". Data gets to the GPU with no need to move it over a PCIe bus.

quarkysg · Oct 21, 2021

ChrisA said:
Absolutly not. They seem very committed to the big matrix switch for RAM and all those processing units.

The entire reason for VRAM is because you have low bandwidth to the rest of the system. Apple does not have that problem, so they don't need that solution. Shared memory means "zero copy". Data gets to the GPU with no need to move it over a PCIe bus.

I agree. Apple going to a 512-bits data bus with 400GB/s memory bandwidth in a notebook is really unexpected, as least for me anyway. I expected them to go wide but not this wide.

Going to the traditional 'VRAM' route would be a regression for Apple IMHO.

leman · Oct 21, 2021

ChrisA said:
Absolutly not. They seem very committed to the big matrix switch for RAM and all those processing units.

The entire reason for VRAM is because you have low bandwidth to the rest of the system. Apple does not have that problem, so they don't need that solution. Shared memory means "zero copy". Data gets to the GPU with no need to move it over a PCIe bus.

quarkysg said:
I agree. Apple going to a 512-bits data bus with 400GB/s memory bandwidth in a notebook is really unexpected, as least for me anyway. I expected them to go wide but not this wide.

Going to the traditional 'VRAM' route would be a regression for Apple IMHO.

The release of these new Mac chips puts change of dedicated VRAM to 0%. As you guys point out, Apple demonstrated its commitment unified model very convincingly (400GB/s system memory bandwidth is something nobody expected in a laptop), and for them going back to the inferior VRAM model would make no sense at all.

chumps · Oct 21, 2021

leman said:
The release of these new Mac chips puts change of dedicated VRAM to 0%. As you guys point out, Apple demonstrated its commitment unified model very convincingly (400GB/s system memory bandwidth is something nobody expected in a laptop), and for them going back to the inferior VRAM model would make no sense at all.

For the Mac Pro, going by the rumors, a 40 core AS Mac Pro would mean stitching together 4 M1 Max's right? Via some kind of fabric or whatever. Can you speculate on what the memory bandwidth would look like?

leman · Oct 21, 2021

chumps said:
For the Mac Pro, going by the rumors, a 40 core AS Mac Pro would mean stitching together 4 M1 Max's right? Via some kind of fabric or whatever. Can you speculate on what the memory bandwidth would look like?

You know, one can speculate in many different directions here, all of which will kind make sense. If we assume that Apple will follow with their bandwidth scaling, they would want around 1.5TB/s RAM bandwidth for a quad M1 Max. That would be a 2048bit RAM interface which starts getting a bit problematic (16 of LPDDR5 modules would take a lot of space — they are quite large going by Apple's published pictures). So if they go this way they will probably use something more compact, likely HBM3 (what a coincidence that Hynix announced their HBM3 like yesterday, right?).

And to speculate some more, they might ship a modular Mac Pro where large SoCs+fast unified memory are mounted in compute boards via an MPX-like interface, and the you additionally have some slower shared DDR5 RAM (say, 8 channels) for user-expandability.

AgentMcGeek · Oct 22, 2021

chumps said:
For the Mac Pro, going by the rumors, a 40 core AS Mac Pro would mean stitching together 4 M1 Max's right? Via some kind of fabric or whatever. Can you speculate on what the memory bandwidth would look like?

If we assume it’s going to be 4x M1 Max chips and that it scales linearly, that’s 400GB/s x4, or 1,600GB/s.

Is the M1 GPU sharing LPDDR4 RAM going to be enough?

macrumors Core

macrumors 68000

macrumors 68030

macrumors member

macrumors 6502a

macrumors 6502a

macrumors G5

macrumors Core

macrumors 68040

macrumors Core

macrumors 68040

macrumors G5

macrumors Core

macrumors 68030

macrumors G5

macrumors Core

macrumors member

macrumors Core

macrumors member

macrumors G5

macrumors 65816

macrumors Core

Cancelled

macrumors Core

macrumors 6502

Our Staff