Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
A very interesting thing is the number of execution units. It's huge compared to M1.

M1: 16 billion transistors, 128 execution units, 2,6 TFLOP, 200 GB/s memory bandwidth
M1 Pro: 33.7 billion transistors, 2048 execution units, 5.2 TFLOP, 200 GB/s memory bandwidth
M1 Max: 57 billion transistors, 4096 execution units, 10.4 TFLOP, 400 GB/s memory bandwidth

Apple graphs compare the M1 Pro and Max to Lenovo Legion 5 82JW0012US with RTX 3050 Ti M 4 GB and Razer Blade 15 Advanced RZ09-0409CE53with RTX 3080 M 16 GB.

In their graphs M1 Max performs better than 3080 M!!
 
Are you all seeing this mention of LPDDR5?

Found this but not sure if it's correct:
"LPDDR5 offers a max data rate of 6400Mbps, while the LPDDR5X offers 8533Mbps." - gizbot.com

If so this was also unexpected. What can it do for the GPU perf?

Screen Shot 2021-10-18 at 3.20.58 PM.png
 
Are you all seeing this mention of LPDDR5?

Found this but not sure if it's correct:
"LPDDR5 offers a max data rate of 6400Mbps, while the LPDDR5X offers 8533Mbps." - gizbot.com

If so this was also unexpected. What can it do for the GPU perf?

View attachment 1870021
Apple has already stated memory bandwidth is 400GB/s (which is absolutely insane) on the M1 Max. The numbers you're quoting are talking about the max bandwidth of the LPDDR5 spec, not the speed of any given implementation.
 
  • Like
Reactions: BenRacicot
Interestingly, we can compare to other unified memory architectures out there, namely PlayStation 5 (448GB/s) and Xbox Series X (10GB @ 560 GB/s, 6GB @ 336 GB/s).
I believe that M1 Max will offer much lower latency since the Xbox and PS5 uses GDDR6 memory, but still interesting. Color me impressed with Apple as well....
 
Are you all seeing this mention of LPDDR5?

Found this but not sure if it's correct:
"LPDDR5 offers a max data rate of 6400Mbps, while the LPDDR5X offers 8533Mbps." - gizbot.com

If so this was also unexpected. What can it do for the GPU perf?

View attachment 1870021


NOTE: the 256-bit interface. The standard LPDDR5 interface is 16 bits. so 16 memory channels ( a 2x of the M1's 8 ).

The M1 Max has 512-bit interface which is 32 memory channels.

They don't need ultimate Maximum LPDDR5, they are drawing data at much slower clock rates on a much wider path. This is only way going to keep up with the large increase in execution units asking for different bits of data.


P.S. That said I think Apple isn't being transparent here. There is some concoction to their number.
 
  • Like
Reactions: BenRacicot
A very interesting thing is the number of execution units. It's huge compared to M1.

M1: 16 billion transistors, 128 execution units, 2,6 TFLOP, 200 GB/s memory bandwidth

M1 has 1024 execution units. An Apple GPU core has 128 units (4x32-wide ALUs)


In their graphs M1 Max performs better than 3080 M!!

For Pro workloads, yes.

Are you all seeing this mention of LPDDR5?

Yes, this is some very fast LPDDR5, which was unexpected to say the least. The 512-bit interface on the big chip is a shock. What does it mean for GPU performance? Well, there is enough bandwidth there to challenge mid-range desktop GPUs. Actually more than that given the large caches of these chips.
 
  • Like
Reactions: BenRacicot
Yes, this is some very fast LPDDR5, which was unexpected to say the least. The 512-bit interface on the big chip is a shock. What does it mean for GPU performance? Well, there is enough bandwidth there to challenge mid-range desktop GPUs. Actually more than that given the large caches of these chips.
I was not sure Apple would go with DDR5 LPDDR5, but I was hoping, because DDR5 LPDDR5 would be the right move for such an advanced SoC (giving everything the bandwidth it needs)...?

Excited to see the (rumored) Jade 2C & Jade 4C products next year...! ;^p
 
Last edited:
  • Like
Reactions: BenRacicot
I was not sure Apple would go with DDR5, but I was hoping, because DDR5 would be the right move for such an advanced SoC (giving everything the bandwidth it needs)...?

It’s not DDR5, it’s LPDDR5. Different tech.
 
  • Like
Reactions: Boil
Yes, this is some very fast LPDDR5, which was unexpected to say the least. The 512-bit interface on the big chip is a shock. What does it mean for GPU performance? Well, there is enough bandwidth there to challenge mid-range desktop GPUs. Actually more than that given the large caches of these chips.

On the M1 Apple was treating the LPDDR4 like LPDDR5. In terms of bus width and higher number of memory channels.

8 channels at 16 bits. ---> 128.

quadrupled 8 GPU cores into 32 cores

4 * 128 ---> 512

It shouldn't be that shocking. Straightforward linear increase of the "non Pro" version width that matches the GPU core count increase.

P.S. Have more even more semi-custom LPDDR5 RAM packages here. The M1 Pro's two packages are doing width work of four packages from M1 and the M1 Max's is even better space/volume reduction.
 
  • Like
Reactions: BenRacicot
On the M1 Apple was treating the LPDDR4 like LPDDR5. In terms of bus width and higher number of memory channels.

8 channels at 16 bits. ---> 128.

quadrupled 8 GPU cores into 32 cores

4 * 128 ---> 512

It shouldn't be that shocking. Straightforward linear increase of the "non Pro" version width that matches the GPU core count increase.

What I mean that I expected 256bit LPDDR5 RAM (in fact, I was mentioning 200GB/s multiple times in the last months). But I certainly didn’t expect a 512bit variant. That’s a very pleasant surprise.
 
M1 has 1024 execution units. An Apple GPU core has 128 units (4x32-wide ALUs)
I was referring to this image from last year. So that was per GPU core apparently, which I didn't know but now they showed a total number, hence my confusion.

Skärmavbild 2021-10-18 kl. 21.21.00.png
 
What I mean that I expected 256bit LPDDR5 RAM (in fact, I was mentioning 200GB/s multiple times in the last months). But I certainly didn’t expect a 512bit variant. That’s a very pleasant surprise.

Well might not be as pleasant for the the additional RAM packages for those with limited budgets. 512-bit means also required to buy more RAM packages. Apple is charging a pretty hefty sum for that. More money for Apple.. again shouldn't be too surprising. Going from M1 Pro to 'binned' M1 Max is about $900.
 
Well might not be as pleasant for the the additional RAM packages for those with limited budgets. 512-bit means also required to buy more RAM packages. Apple is charging a pretty hefty sum for that. More money for Apple.. again shouldn't be too surprising. Going from M1 Pro to 'binned' M1 Max is about $900.

It’s a hefty price but IMO justified (both from the component perspective and the market value perspective). It’s a beast of a workstation machine. A similarly priced x86 workstation just doesn’t compare. And 32+GB on a laptop GPU is simply unprecedented.
 
It’s a hefty price but IMO justified (both from the component perspective and the market value perspective). It’s a beast of a workstation machine. A similarly priced x86 workstation just doesn’t compare. And 32+GB on a laptop GPU is simply unprecedented.
I have a question about this.

When gaming with 32 cores we have 8 incredible CPU cores and a neural engine sitting around with access to the same cache data don’t we?

I wonder if those other systems will able to be leveraged by developers for simultaneous computing?
 
When gaming with 32 cores we have 8 incredible CPU cores and a neural engine sitting around with access to the same cache data don’t we?

Yes we do!

I wonder if those other systems will able to be leveraged by developers for simultaneous computing?

That's the point :) The heterogeneous computing model of Apple Silicon opens up new possibilities that were not feasible with the traditional "isolated" model. We can now write software that utilizes the CPU and the GPU simultaneously to build some very interesting stuff. Now, I don't know what this stuff is going to be in the end, but I am exited :)

P.S. I am currently prototyping my 2D game engine on Apple Silicon where I render the dynamic game world directly from the planar graph. No triangles, no nothing. It looks very promising so far :)
 
Yes we do!



That's the point :) The heterogeneous computing model of Apple Silicon opens up new possibilities that were not feasible with the traditional "isolated" model. We can now write software that utilizes the CPU and the GPU simultaneously to build some very interesting stuff. Now, I don't know what this stuff is going to be in the end, but I am exited :)

P.S. I am currently prototyping my 2D game engine on Apple Silicon where I render the dynamic game world directly from the planar graph. No triangles, no nothing. It looks very promising so far :)
This is incredible. More than what I had hoped for. You’re project sounds awesome.
 
Chances are Apple will use separate VRAM. Much like any dedicated gpu.
Absolutly not. They seem very committed to the big matrix switch for RAM and all those processing units.

The entire reason for VRAM is because you have low bandwidth to the rest of the system. Apple does not have that problem, so they don't need that solution. Shared memory means "zero copy". Data gets to the GPU with no need to move it over a PCIe bus.
 
Absolutly not. They seem very committed to the big matrix switch for RAM and all those processing units.

The entire reason for VRAM is because you have low bandwidth to the rest of the system. Apple does not have that problem, so they don't need that solution. Shared memory means "zero copy". Data gets to the GPU with no need to move it over a PCIe bus.
I agree. Apple going to a 512-bits data bus with 400GB/s memory bandwidth in a notebook is really unexpected, as least for me anyway. I expected them to go wide but not this wide.

Going to the traditional 'VRAM' route would be a regression for Apple IMHO.
 
Absolutly not. They seem very committed to the big matrix switch for RAM and all those processing units.

The entire reason for VRAM is because you have low bandwidth to the rest of the system. Apple does not have that problem, so they don't need that solution. Shared memory means "zero copy". Data gets to the GPU with no need to move it over a PCIe bus.

I agree. Apple going to a 512-bits data bus with 400GB/s memory bandwidth in a notebook is really unexpected, as least for me anyway. I expected them to go wide but not this wide.

Going to the traditional 'VRAM' route would be a regression for Apple IMHO.

The release of these new Mac chips puts change of dedicated VRAM to 0%. As you guys point out, Apple demonstrated its commitment unified model very convincingly (400GB/s system memory bandwidth is something nobody expected in a laptop), and for them going back to the inferior VRAM model would make no sense at all.
 
  • Like
Reactions: quarkysg
The release of these new Mac chips puts change of dedicated VRAM to 0%. As you guys point out, Apple demonstrated its commitment unified model very convincingly (400GB/s system memory bandwidth is something nobody expected in a laptop), and for them going back to the inferior VRAM model would make no sense at all.
For the Mac Pro, going by the rumors, a 40 core AS Mac Pro would mean stitching together 4 M1 Max's right? Via some kind of fabric or whatever. Can you speculate on what the memory bandwidth would look like?
 
For the Mac Pro, going by the rumors, a 40 core AS Mac Pro would mean stitching together 4 M1 Max's right? Via some kind of fabric or whatever. Can you speculate on what the memory bandwidth would look like?

You know, one can speculate in many different directions here, all of which will kind make sense. If we assume that Apple will follow with their bandwidth scaling, they would want around 1.5TB/s RAM bandwidth for a quad M1 Max. That would be a 2048bit RAM interface which starts getting a bit problematic (16 of LPDDR5 modules would take a lot of space — they are quite large going by Apple's published pictures). So if they go this way they will probably use something more compact, likely HBM3 (what a coincidence that Hynix announced their HBM3 like yesterday, right?).

And to speculate some more, they might ship a modular Mac Pro where large SoCs+fast unified memory are mounted in compute boards via an MPX-like interface, and the you additionally have some slower shared DDR5 RAM (say, 8 channels) for user-expandability.
 
For the Mac Pro, going by the rumors, a 40 core AS Mac Pro would mean stitching together 4 M1 Max's right? Via some kind of fabric or whatever. Can you speculate on what the memory bandwidth would look like?
If we assume it’s going to be 4x M1 Max chips and that it scales linearly, that’s 400GB/s x4, or 1,600GB/s.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.