Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Argon_

macrumors 6502
Original poster
Nov 18, 2020
426
263
The DDR5 standard, along with more memory channels could raise APU graphics performance to levels currently impossible. An M3 chip, with, say LPDDR5 6400? The same standard would also advantage AMD's iGPU performance. One theory for AMD currently shipping APUs with Vega cores that are generations old, is that DDR4 would bottleneck Navi cores, and flatten the performance uplift. The same bandwidth limitations apply to ASi, even with unified memory.

Thoughts?
 
Last edited:
I would be very surprised if the new chips announced this fall are not using LPDDR5 or something comparable. DDR5 looks like a match made in heaven for Apple Silicon.
 
  • Like
Reactions: EntropyQ3
As for loads of memory channels, the M1 already has 8 channels. And it runs 4266 MT/s LPDDR4X with a bandwidth of over 60GB/s. At JEDEC 2933 MT/s DDR4X, the dual channel setup of a Core i7 10700 (or any equivalent setup) gives you roughly 46GB/s. Apple has already shown to be aggressive with memory performance on their chips, though I don't know how much it can scale beyond this - I mean LPDDR5, sure, but going beyond 8 channels seems unlikely.
 
As for loads of memory channels, the M1 already has 8 channels. And it runs 4266 MT/s LPDDR4X with a bandwidth of over 60GB/s. At JEDEC 2933 MT/s DDR4X, the dual channel setup of a Core i7 10700 (or any equivalent setup) gives you roughly 46GB/s. Apple has already shown to be aggressive with memory performance on their chips, though I don't know how much it can scale beyond this - I mean LPDDR5, sure, but going beyond 8 channels seems unlikely.

It has 8 channels, but they are much narrower than DDR4 channels. The total ram width is still 128 bits just like regular dual-channel DDR4. The multiple narrower channels are „better“ because they are more flexible if you have multiple parallel memory requests (as is typical with modern multi processors and especially GPUs). Note that DDR5 reduces the channel width to 32 bits for this very reason, so todays dual-channel DDR4 will become tomorrows quad-channel DDR5 at the same bus width (I‘m sure marketing will jump on it).

If Apple wants to scale their performance up, they need higher bandwidth. Which means going beyond the 128bit RAM. A 256bit DDR5/LPDDR5 interface will yield around 200GB/s with plenty of memory level parallelism. Combine this with large caches and it’s enough to challenge GPUs with 500GBs or higher VRAM bandwidth.
 
If Apple wants to scale their performance up, they need higher bandwidth. Which means going beyond the 128bit RAM. A 256bit DDR5/LPDDR5 interface will yield around 200GB/s with plenty of memory level parallelism. Combine this with large caches and it’s enough to challenge GPUs with 500GBs or higher VRAM bandwidth.
How'd you get to that 200GB/s figure? Seems high. I mean I agree with everything else you say, but if we double the M1 we get ≈120GB/s. I'm not convinced we get a 256-bit wide system but let's go with that.

My 2011 MacBook Pro with Sandy Bridge's memory controller and 1333MT/s DDR3 offered, IIRC; 25.6GB/s. My 10700K with DDR4 2666MT/s is around 45-ish GB/s. Those are 9 years apart. If we look at the Xeons in the Mac Pro with 6 channels of DDR4 we're sitting around 100GB/s. - I feel like going above 200GB/s seems like a sudden jump, but I haven't sat down and done the maths here, haha. But that's in the bandwidth territory of GDDR6 GPUs. The Radeon RX 5500 had around 200GB/s of memory bandwidth I believe.
 
How'd you get to that 200GB/s figure? Seems high. I mean I agree with everything else you say, but if we double the M1 we get ≈120GB/s. I'm not convinced we get a 256-bit wide system but let's go with that.

My 2011 MacBook Pro with Sandy Bridge's memory controller and 1333MT/s DDR3 offered, IIRC; 25.6GB/s. My 10700K with DDR4 2666MT/s is around 45-ish GB/s. Those are 9 years apart. If we look at the Xeons in the Mac Pro with 6 channels of DDR4 we're sitting around 100GB/s. - I feel like going above 200GB/s seems like a sudden jump, but I haven't sat down and done the maths here, haha. But that's in the bandwidth territory of GDDR6 GPUs. The Radeon RX 5500 had around 200GB/s of memory bandwidth I believe.
I just calculated 204.8 GB/s for eight channel 256 bit DDR5 6400, so leman's math checks out. Certainly enough to keep a fast GPU well fed.
 
How'd you get to that 200GB/s figure? Seems high. I mean I agree with everything else you say, but if we double the M1 we get ≈120GB/s. I'm not convinced we get a 256-bit wide system but let's go with that.

DDR5-6400 offers 51.2GB/S PER 64-bit module. Hynix mentioned about a year ago that they plan to ship DDR5-8400 (67GB/s per module). Recently ratified LPDDR5X supports similar bandwidth. I think it's unlikely that 8400 or higher DDR5 will be in production by the end of the year, but LPDDR5-6400 should be realistic.
 
Overclocking the DDR5 8400 by the same multiple as the current 4266 modules in the M1 yields 11198.25, which I'll round to 11200.

Running my earlier numbers with 11200 as the speed outputs 358.4GB/s!

Imagine then, a 32 core GPU built into an M1X. If we assume linear scaling, the result is 82324 points in Geekbench 5, beating the Radeon Pro W5700XT by a respectable margin, but this is theoretical, and the M1's RAM config might hobble it to far below that.

Now two generations ahead, with a conservative ten percent increase over said generations, overclocked DDR5 and 64 cores, the result is 181112 points. For context, the Radeon Pro W6900X scores 168783.

358.4GB/s could feed such a theoretical GPU.
 
  • Like
Reactions: caribbeanblue
Overclocking the DDR5 8400 by the same multiple as the current 4266 modules in the M1 yields 11198.25, which I'll round to 11200.

DDR5-8400 is already "overclocked". Besides, Apple must deliver good power efficiency (they do it by sourcing custom-built RAM modules that consume less power than even the usual mobile RAM), and they need a supplier that can provide reasonable volume of these chips. The super-fast versions of DDR5 are still miles away I am afraid.
 
DDR5-8400 is already "overclocked". Besides, Apple must deliver good power efficiency (they do it by sourcing custom-built RAM modules that consume less power than even the usual mobile RAM), and they need a supplier that can provide reasonable volume of these chips. The super-fast versions of DDR5 are still miles away I am afraid.

I read that 8400 was the highest listed speed of DDR5 from the SK Hynix specs, much like 3200 is the highest listed speed of the DDR4 spec, yet many manufacturers sell RAM that can run faster.

SK_hynix_DDR5_Specifications.png
 
I read that 8400 was the highest listed speed of DDR5 from the SK Hynix specs, much like 3200 is the highest listed speed of the DDR4 spec, yet many manufacturers sell RAM that can run faster.

If I am not mistaken, official fastest DDR5 is DDR5-6400. I don't think Apple ever used "overclocked" RAM. In the end, it's abut energy efficiency and volume. DDR5-8400 is certainly great, but it's of little use to Apple if it is only produced in super low volume (like current "gaming" RAM) and uses extreme amounts of power.
 
A single stack of HBM3 would give a bandwidth of 665GB/s. Totally overkill unless we are talking about a bigger chip than what goes in the low-end products.
 
So now that we've seen Apple claims with M1 Pro and M1 Max, correct me if I'm wrong but that means the Pro is running LPDDR5 in 4-channels and the M1 Max in 8 channels as DDR is 51.2 GB/s and 4 x 50 = 200 GB/s and 8 x 50 = 400 GB/s?
 
So now that we've seen Apple claims with M1 Pro and M1 Max, correct me if I'm wrong but that means the Pro is running LPDDR5 in 4-channels and the M1 Max in 8 channels as DDR is 51.2 GB/s and 4 x 50 = 200 GB/s and 8 x 50 = 400 GB/s?
DDR5 and LPDDR5 are different technologies and is not even similar except the name.

Apple uses 16-bit wide LPDDR channels, so that would be 16 channels for M1 Pro and 32 channels for M1 Max, running at 6400 MT/s.
 
Last edited:
@Gnattu Thanks, I think I have a better handle, was trying to wrap my brain around this.

Alright, so we have 32 channel x 16 bit = 512-bit wide LPDDR4
Each LPDDR5 module can do 6.4 Gigabits (0.8 GB/s) so we end up with 512 x 0.8 GB/s to land at the theorical 409.6 GB which Anandtech listed the 408 GB/s for the M1 Max? and the Pro being 256 x 0.8 GB/s to land at the 204.8 GB/s?

This seems to check out as converted back to MT/s as 409.6 GB/s = 6400 MT/s
(Formula used is 1 Megatransfers per Second = 0.064 Gigabit per Second)
 
@Gnattu That wasn't any sort of jab at Anandtech on my part, I figured I was getting close enough in the unit conversions to figure out the basic idea to get to the 408ish and 204ish GB/s. Just wanted to understand the memory break down a bit more for the M1 Pro and Max. Don't need to drop the Dr. title on me ? there's a reason I'm using Anandtech as my point of reference.
 
@Gnattu That wasn't any sort of jab at Anandtech on my part, I figured I was getting close enough in the unit conversions to figure out the basic idea to get to the 408ish and 204ish GB/s. Just wanted to understand the memory break down a bit more for the M1 Pro and Max. Don't need to drop the Dr. title on me ? there's a reason I'm using Anandtech as my point of reference.
Lol, you’re all good. The difference looks to me like a truncation error due to first calculating the M1 Pro’s bandwidth and not wanting decimals. Journalists ?‍♂️
 
recent actionrecent actionI would be very surprised if the new chips announced this fall are not using LPDDR5 or something comparable. DDR5 looks like a match made in heaven for Apple Silicon.
You nailed that one. I had doubts since I think they could have gotten pretty far with LPDDR4X. Then they announced 400GB/s ?

I can’t wait to see what clever software people come up with. I know I’m going to be stuck in profiling land for the foreseeable future. There are so many things to take advantage of with heterogeneous computing!

Might have to revisit statically typed functional languages. I wonder what Simon Peyton Jones thinks about all of this.

Edit: For that matter, I wonder what Chris Lattner thinks. He might be a good reason for Apple to acquire SiFive.

Edit: My mentioning Chris Lattner and SPJ wasn’t hyperbole. Check out this recent paper. If this was around a decade ago, SPJ would have had a field day; check out the Doctor (of type theory) in recent action. And MLIR would serve Apple well WRT heterogeneous computing. Exciting times ahead!
 
Last edited:
You nailed that one. I had doubts since I think they could have gotten pretty far with LPDDR4X. Then they announced 400GB/s ?

I can’t wait to see what clever software people come up with. I know I’m going to be stuck in profiling land for the foreseeable future. There are so many things to take advantage of with heterogeneous computing!

Might have to revisit statically typed functional languages. I wonder what Simon Peyton Jones thinks about all of this.

Edit: For that matter, I wonder what Chris Lattner thinks. He might be a good reason for Apple to acquire SiFive.

And this is only the beginning. The first generation of ASi. Think of what five years of iteration and another two die shrinks could bring.
 
  • Like
Reactions: altaic
Based on what I was reading, there's already LPDDR5x which means up to 546 GB/s although it looks like from the Anandtech article that the M1 Max really isn't pushing the bandwidth of the memory too hard as is, so there seems to be a clear vector forward with LPDDR5 that could happen in a year or two as Samsung and Micron already are making it from what I read. It's one thing to see GDDR6 to have bandwidth above that but the latency of LPDDR5 is less.
 
Last edited:
Maybe Apple reserves LPDDR5X for the Mn Max Duo / Quadro SiPs...?
I doubt it, based on the rumors, they'll still be using the Jade-C die, we haven't seen the Jade 2C or Jade 4C.

This most certainly will be down the line as Samsung and Micron just announced production starting this last summer. It's going to take awhile to ramp up, especially with supply chains being what they are. Apple launching the M1 Pro/Max was already delayed as is. Based on the Anandtech review there's a lot of overhead still on the M1 Max, quite a bit in fact.
 
I doubt it, based on the rumors, they'll still be using the Jade-C die, we haven't seen the Jade 2C or Jade 4C.

This most certainly will be down the line as Samsung and Micron just announced production starting this last summer. It's going to take awhile to ramp up, especially with supply chains being what they are. Apple launching the M1 Pro/Max was already delayed as is. Based on the Anandtech review there's a lot of overhead still on the M1 Max, quite a bit in fact.

Jade 2C is two Jade-C on a package (SiP), Jade 4C is four Jade-C on a package (SiP); the RAM is not part of the die, so using LPDDR5X rather than LPDDR5 shouldn't be too much hassle...?
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.