Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

altaic

Suspended
Jan 26, 2004
712
484
From the die shots shown for the M1 Pro & Max, it appears that the memory controllers are split between the 2 P/E CPU clusters. For M1 Max, it looks like the 400GB/s bandwidth are split between the two CPU clusters. So each clusters could potentially get 200GB/s of bandwidth independently, as long as both cluster's SLC does not need to sync it's cache.
By "memory controllers" do you mean the LPDDR5 PHYs? How can you tell how the how they're split from the die shots?
 

quarkysg

macrumors 65816
Oct 12, 2019
1,247
841
By "memory controllers" do you mean the LPDDR5 PHYs? How can you tell how the how they're split from the die shots?
I'm going off from what Anandtech showed here:


I'm just speculating how it may be designed, since the memory controllers are placed just next to the SLC blocks. The first stop of the memory fetches is the SLC.

Only Apple will know how they are actually wired. It makes sense to me to split it that way tho.
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
I'm going off from what Anandtech showed here:


I'm just speculating how it may be designed, since the memory controllers are placed just next to the SLC blocks. The first stop of the memory fetches is the SLC.

Only Apple will know how they are actually wired. It makes sense to me to split it that way tho.

The channels between the SLCs and the memory controllers appear to me to be plenty wide enough to allow either SLC block on a side to hit either memory controller on that side. Also appears to be room down the middle to allow left-to-right to be connected, but without seeing the metal I can’t tell what’s actually there.
 

quarkysg

macrumors 65816
Oct 12, 2019
1,247
841
The channels between the SLCs and the memory controllers appear to me to be plenty wide enough to allow either SLC block on a side to hit either memory controller on that side. Also appears to be room down the middle to allow left-to-right to be connected, but without seeing the metal I can’t tell what’s actually there.
I do wonder how Apple wired up the SLC with the memory controllers tho.

Does the SLC on the left only cache memory requests to the left memory banks and it's then fed to the L2/L1 of the IP cores?

Or do the SLC from both sides sync their contents?

Wonder if this can be tested with software?
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
I do wonder how Apple wired up the SLC with the memory controllers tho.

Does the SLC on the left only cache memory requests to the left memory banks and it's then fed to the L2/L1 of the IP cores?

Or do the SLC from both sides sync their contents?

Wonder if this can be tested with software?
It wouldn’t surprise me if left talks only to left and right talks Only to right.
 

theorist9

macrumors 68040
May 28, 2015
3,881
3,060
Part commentary/part standup routine from The Macalope...
Alder Lake? More like Scalder Lake.

No? Yeah, okay. That’s fair.

This is, of course, the desktop version of the CPU. So, we can expect the laptop version, which Intel is planning to release next year, will either run significantly slower or will not technically be a “laptop” processor because putting a computer using it on your lap would be… inadvisable....

The problems here for Intel are going to be electricity, heat, and wearing shorts in the summer.

If some laptop maker decides to take a page from Apple and put the product name on the bottom of the device this could be a real branding opportunity.

Huh? Huh?!
Again, no?

Jeez, tough audience today.

 

throAU

macrumors G3
Feb 13, 2012
9,204
7,354
Perth, Western Australia
I got my 14” m1 pro this morning.

It arrived with 80% charge
I plugged it into my MacBook Air charger rather than bother to unbox the new one.

Time machine restore. Time machine backup. OS update. App installs. A bunch of file sync.

4 hours later on a charger reporting 24 watt output: full charge despite doing all that with an external display attached and running the internal one at 80%

Wow.

The 2020 mba would be screaming fans at me.

This has been totally silent and sipping 24 watts to do all that and charge itself at the same time.
 

Pressure

macrumors 603
May 30, 2006
5,182
1,545
Denmark
TLDR: It looks like, when they say the 400 GB/s bandwidth of the M1 Max's unified memory is unusually high for a laptop, they're comparing it to typical CPU RAM bandwidth, not typical GPU RAM bandwidth. I.e., more specifically, the Max's unified memory is giving the CPU an unusually high RAM bandwidth, and giving the GPU a typical RAM bandwidth (compared to other laptops in its class).

****
I've read the tradeoff between DDR and GDDR RAM is that the former provides low latency (needed for CPUs) and the latter provides high bandwidth (needed for GPUs). So it seems Apple decided they couldn't accept the high latency of GDDR RAM for unified memory, and instead used DDR RAM and increased the bandwidth to the point it was comparable to what is available from mobile-workstation-class GDDR.

For instance, a comparably-equipped (64 GB RAM, 4 TB SSD, 120 Hz screen) 17" Dell 7760 workstation laptop, with an 8-core Xeon W11955M and A4000 mobile, costs about the same (within 10%) as a 16" M1 Max ($4900 for the MBP and $5440 for the Dell, based on current pricing on Dell's website*). And the A4000 mobile offers 384 GB/s GPU RAM bandwidth (see link below).**

So it sounds like the Max's 400 GB/s isn't unprecedented when it comes to GPU bandwidth (compared to other laptops in its class), but might instead be unprecedented compared to their CPU bandwidth (?).


*At the time I configured it, a couple of days ago, they were offering an automatic discount of 35% off; they always offer heavy discounts, but I don't know if this particular discount is typical or not. Interestingly, I tried configuring it, left, and came back, and the 2nd time it offered me a higher discount--so maybe its tracks you and offers a lower discount if you don't bite the first time :).

**You don't even need to go to a workstation-class laptop to get that GPU RAM bandwidth. For instance, an RTX3080 mobile also offers 384 GB/s memory bandwidth. Also, for comparison, the A4000 and RTX3080 desktop GPUs offer twice that: 768 GB/s
You are comparing Immediate Mode Rendering with Tile-Based Deferred Rendering.

The latter, which Apple's GPU is designed around, uses vastly less memory bandwidth.

For example the M1 Max 32-core GPU maxes out at around 90GB/s of memory bandwidth under load according to AnandTech.
 
  • Like
Reactions: Stratus Fear

quarkysg

macrumors 65816
Oct 12, 2019
1,247
841
For example the M1 Max 32-core GPU maxes out at around 90GB/s of memory bandwidth under load according to AnandTech.
For rasterisation workload, TBDR definitely has lower bandwidth requirements.

For compute workload, not sure if the GPU will be able to suck in more memory faster. I would think it will be able to go much higher than 90 GB/s. I would not be surprised that it can saturate the entire 400GB/s bandwidth.
 

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,665
OBX
For rasterisation workload, TBDR definitely has lower bandwidth requirements.

For compute workload, not sure if the GPU will be able to suck in more memory faster. I would think it will be able to go much higher than 90 GB/s. I would not be surprised that it can saturate the entire 400GB/s bandwidth.
Actually that brings up a good question, Ethereum is memory bandwidth sensitive (it doesn't actually care as much about compute power), yet the M1 Max with it's 400GB of available bandwidth isn't able to match the 6700XT hashrate (and it has less bandwidth). I point out the 6700XT as it has half the WGPs of the 6900XT yet the ETH hashrate is like 80% of the 6900XT. There has to be a reason why.
 

quarkysg

macrumors 65816
Oct 12, 2019
1,247
841
Actually that brings up a good question, Ethereum is memory bandwidth sensitive (it doesn't actually care as much about compute power), yet the M1 Max with it's 400GB of available bandwidth isn't able to match the 6700XT hashrate (and it has less bandwidth). I point out the 6700XT as it has half the WGPs of the 6900XT yet the ETH hashrate is like 80% of the 6900XT. There has to be a reason why.
I would think the mining software is not optimised for the M1.

The standard PC architecture is bottlenecked by the PCIe bus, so I'm thinking that will play a part between the 6700XT and 6900XT hash rate, i.e. getting diminishing returns with ever greater compute horsepower.

The way Apple is re-architecting the Macs is a break away from the modular approach, which allows them to remove some of the bottlenecks.
 

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,665
OBX
I would think the mining software is not optimised for the M1.

The standard PC architecture is bottlenecked by the PCIe bus, so I'm thinking that will play a part between the 6700XT and 6900XT hash rate, i.e. getting diminishing returns with ever greater compute horsepower.

The way Apple is re-architecting the Macs is a break away from the modular approach, which allows them to remove some of the bottlenecks.
Ehhhh the 3090 has almost double the memory bandwidth of the 6900XT and gets about double the hashrate, so it scales fairly linearly as far as I can tell.
 

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,665
OBX
How about the hashrate between the 3080 and 3090?
~100 Mh/s from ~760 GB/s of bandwidth. The Ti should get closer to the 3090 since the memory bandwidth is the same if you could bybass the LHR mode.
If it were not a LHR card. All original 30-series cards (except the 3060) are "full hashrate" cards. The newer ti's are all supposed to be LHR cards, though I have seen that you can sorta work around the limits (usually be dual mining).


For reference the 3090 should get ~125 Mh/s from ~936 GB/s of bandwidth.
 
Last edited:

JMacHack

Suspended
Mar 16, 2017
1,965
2,424
Then what advantage does DDR5 give over LPDDR5 that would cause a PC maker to choose the less efficient DDR5 for its desktops? Is it simply that DDR5 is less expensive and/or more available?
Well it lowers manufacturing costs for OEMs and Mobo manufacturers since they don’t have to buy the RAM modules and solder them to the board themselves, and they don’t need to make many different SKUs for different RAM combinations.

OEMs just buy the DIMMs separately, and DIYers buy as much as they please. It’s why cheaper laptops and desktops often have socketed components, they can make many SKUs from the parts bin. (This isn’t to say that non-socketed components are more “premium,” just the economics of building systems).

And of course, since it’s a desktop, power efficiency is less of a concern over raw performance. So LPDDR loses it’s advantage there.
 

JMacHack

Suspended
Mar 16, 2017
1,965
2,424
How about the hashrate between the 3080 and 3090?
Speaking of hash rates, this is a little off-topic but I want to throw in my obsession with Vega and GCN.
Vs

GCN was crazy good for compute despite being crap for graphics.

And, completely off-topic, but @cmaier what would I have to study to learn about designing circuits for processors. I find it interesting but I doubt I could hack it at the math.
 

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,678
Then what advantage does DDR5 give over LPDDR5 that would cause a PC maker to choose the less efficient DDR5 for its desktops? Is it simply that DDR5 is less expensive and/or more available?

Desktop PC makers usually don't care about efficiency that much. SO-DIMM DDR is more ubiquitous, modular and cheaper. Historically, it was also faster, but the recent iterations of LPDDR have caught up in performance (and sometimes even overtook the desktop DDR).

What's also important is that LPDDR is a more sophisticated device that has additional features. This allows it to be much more power efficient, but also increases the latency by a good amount (seems to be around 20-30% for modern implementations).
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
Speaking of hash rates, this is a little off-topic but I want to throw in my obsession with Vega and GCN.
Vs

GCN was crazy good for compute despite being crap for graphics.

And, completely off-topic, but @cmaier what would I have to study to learn about designing circuits for processors. I find it interesting but I doubt I could hack it at the math.

It depends on whether you really mean circuits or something else. Hennessy and Patterson is the ”bible” for computer microarchitecture - it’s the textbook that I assume is still used, and was used at least as far back as the mid 1990’s. They actually have a bunch of books, but “Computer Architecture A Quantitive Approach” is the one everyone knows.

Also, Tannenbaum had a book, Structured Computer Organization, that I recall being pretty good.

If you want to get below the level of computer architecture and microarchitecture, the next level would be logic design - Boolean logic, figuring out how to convert an architecture into NAND/NOR gates and flip-flops, etc. I don’t know of a modern book on that topic - last time I picked up a text on that stuff was decades ago - now it’s all in my head :)

If you want to get below that level, and learn about designing the actual circuits at the transistor level, that’s a whole other can of worms, and my recommendation would depend on what kind of background you have. At least some electrical engineering (circuit analysis) or physics would be necessary to get started, I think. Then you have physical design, the actual polygons, and that requires less math, probably, but it wouldn’t do you much good to learn if you don’t understand the circuits.
 

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,665
OBX
Speaking of hash rates, this is a little off-topic but I want to throw in my obsession with Vega and GCN.
Vs

GCN was crazy good for compute despite being crap for graphics.

And, completely off-topic, but @cmaier what would I have to study to learn about designing circuits for processors. I find it interesting but I doubt I could hack it at the math.
Yeah CDNA is based on GCN but with like none of the rasterization bits. I do wonder what the hashrates of the new MI200 series would be.
 

theorist9

macrumors 68040
May 28, 2015
3,881
3,060
Desktop PC makers usually don't care about efficiency that much. SO-DIMM DDR is more ubiquitous, modular and cheaper. Historically, it was also faster, but the recent iterations of LPDDR have caught up in performance (and sometimes even overtook the desktop DDR).

What's also important is that LPDDR is a more sophisticated device that has additional features. This allows it to be much more power efficient, but also increases the latency by a good amount (seems to be around 20-30% for modern implementations).
In a desktop AS implementation, would DDR's ubiquity, modularity*, reduced latency, and reduced cost provide enough benefit to outweigh LPDDR's efficiency and whatever other benefits it might provide?

*I list DDR's modularity** as a benefit for desktop (e.g., Mac Pro) buyers, but I know you disagree on this. [I also think it would be a marketing benefit to Apple, since I'll bet they get a lot of negativity from the pro community if they introduce a MacPro with non-upgradeable RAM—unless they provide a compelling architectural explanation for why soldered-in RAM is necessary on a desktop machine.]

**From what I've subsquently learned, LPDDR can also be modular—though that is not its typical implementation; and this appears more complicated to implement than with DDR: https://news.ycombinator.com/item?id=18408496
 
Last edited:

Kpjoslee

macrumors 6502
Sep 11, 2007
417
269
In a desktop AS implementation, would DDR's ubiquity, modularity*, reduced latency, and reduced cost provide enough benefit to outweigh LPDDR's efficiency and whatever other benefits it might provide?

*I list DDR's modularity** as a benefit for desktop (e.g., Mac Pro) buyers, but I know you disagree on this. [I also think it would be a marketing benefit to Apple, since I'll bet they get a lot of negativity from the pro community if they introduce a MacPro with non-upgradeable RAM—unless they provide a compelling architectural explanation for why soldered-in RAM is necessary on a desktop machine.]

**From what I've subsquently learned, LPDDR can also be modular—though that is not its typical implementation; and this appears more complicated to implement than with DDR: https://news.ycombinator.com/item?id=18408496

Modular LPDDR can be done but I doubt Apple would even bother doing it, when there is more conventional and less costly option with just using either tradional DIMM or SO-DIMM DDR5. If they ever planning on making AS Mac Pro user-upgradable.
I would also assume DDR5 would be the only option if they are trying to match at least the maximum memory current Mac Pro is offering (1.5TB).
 
Last edited:

theorist9

macrumors 68040
May 28, 2015
3,881
3,060
I like this thread, since I'm getting so many of my questions answered!

Given this, I'd like to ask my three I/O questions:

1) Apple specifies a limit of two external displays on the M1 Pro. Owners of the M1 Pro have confirmed this is a hard limit (there are probably workarounds that allow more, but I'm talking by direct connection).

With 3 x TB4 and 1 x HDMI 2.0, and a powerful GPU, this maximum of two external displays seems like a surprising limitation. Heck, my mid-2014 MBP, with 2 x TB2 and 1 x HDMI 1.4, can (and does) drive three external displays. I.e., it drives displays from each of its video-capable ports.

What is it about the M1 Pro's architecture that explains this limitation? And likewise for the M1 Max, which is limited to three external displays (rather than being able to drive displays from all four of its video-capable ports, like the 16" Intel MBP could).

2) The TB4 standard calls for 40 Gb/s full duplex (bidirectionally). It also includes DisplayPort Alt Mode 2.0, which enables the interface to alternately support 80 Gb/s unidrectional transmission (see https://en.wikipedia.org/wiki/Thunderbolt_(interface)). However, from what poster Krevnik wrote below, it sounds like neither of these are available in practice, including on the M1 Pro/Max. Is there a consensus that this is the case?
And might we perhaps see one or both of these capabilities in the 2022 iMac Pro/Max's TB4 implementation?
First, Thunderbolt 3 and 4 as available on the market are a single 40Gbps full duplex connection. That’s it. While the cable was originally reported by anandtech as being able to carry two channels, it’s not used in either TB3 or 4. It’s possible that this second PHY channel is what will allow TB5 to deliver 80Gbps, but for now, both TB3 and 4 are a single connection. So for TrippLite to use the discussion they did, is misleading at best.

As for DisplayPort mode, things get a bit more complicated. For DisplayPort 2.0, it uses the Thunderbolt 3 PHY layer to reach the 80Gbps needed. Which means it has to be using both channels in the cable. However, Intel’s current TB4 controllers are DP1.4a compliant: https://www.intel.com/content/www/u...-thunderbolt-4-controller/specifications.html, meaning even though DP 2.0 can be carried over a Thunderbolt 3 active cable, the chips required to handle the alt mode aren’t (yet) here. But with Intel releasing GPUs with DP 2.0 support “Soon(tm)”, I suspect the controller chips can’t be far off either.

3) I've pasted, at the bottom, a nice explanation from usernames need to be uniq for why Apple limited the SD port to UHS-II (mainly, if they went with USH-III, then UHS-II cards would downgrade to UHS-I; plus UHS-III cards don't exist). But what's the explanation for why Apple decided to limit its HDMI port to 2.0? Was it a bandwidth limitation, a lack of reliable HDMI 2.1 controller chips, or something else?

Sony's A1 has a dual CFe Type A and UHS-ii SD card slot. An elegant solution but only Sony makes a CFE Type A card and they are expensive and limited in capacity compared to CFe Type B cards. "Most" A1 users are relying on only SD cards due to cost and no need for the higher bit rates and when it was released, they were hard to find.
UHS-iii cards don't exist. Even if they are, then UHS-iii is not backward compatible with UHS-ii speeds ie they fall back to UHS-i speed so it would have been worse for current users to put in a UHS-iii slot.
I don't expect UHS-iii cards to be commercially available. Maybe SD Express in a couple of years but CFe Type B will be well entrenched by then and available at a cheaper cost.
 
Last edited:

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
In a desktop AS implementation, would DDR's ubiquity, modularity*, reduced latency, and reduced cost provide enough benefit to outweigh LPDDR's efficiency and whatever other benefits it might provide?

*I list DDR's modularity** as a benefit for desktop (e.g., Mac Pro) buyers, but I know you disagree on this. [I also think it would be a marketing benefit to Apple, since I'll bet they get a lot of negativity from the pro community if they introduce a MacPro with non-upgradeable RAM—unless they provide a compelling architectural explanation for why soldered-in RAM is necessary on a desktop machine.]

**From what I've subsquently learned, LPDDR can also be modular—though that is not its typical implementation; and this appears more complicated to implement than with DDR: https://news.ycombinator.com/item?id=18408496

Modular LPDDR can be done but I doubt Apple would even bother doing it, when there is more conventional and less costly option with just using either tradional DIMM or SO-DIMM DDR5. If they ever planning on making AS Mac Pro user-upgradable.
I would also assume DDR5 would be the only option if they are trying to match at least the maximum memory current Mac Pro is offering (1.5TB).


The modularity is also an issue though. In order to get the full bandwidth you need a lot of DDR slots and all the slots have to be filled. You have to rely on the user to do that right. This is even more critical if it’s feeding the GPU as well as the CPU.

The current rumors are that Apple will still release a new Ice Lake Mac Pro system (there are references to such a system) and later the first AS Mac Pro will actually be a new mini Mac Pro.
 

Kpjoslee

macrumors 6502
Sep 11, 2007
417
269
The modularity is also an issue though. In order to get the full bandwidth you need a lot of DDR slots and all the slots have to be filled. You have to rely on the user to do that right. This is even more critical if it’s feeding the GPU as well as the CPU.
I think just filling up all the slots is going to be easier than dealing with this lol.
화면 캡처 2021-11-11 160350.png


The current rumors are that Apple will still release a new Ice Lake Mac Pro system (there are references to such a system) and later the first AS Mac Pro will actually be a new mini Mac Pro.
I think Apple might have had Ice Lake Mac Pro planned initially, but may never see the light of the day.
AS "Mac Pro" could end being the entirely new system and might as well drop the Mac Pro name. They can just settle with either 256GB or 512GB(if they manage to double up the density next year) maximum memory configuration on 4x Jade-C
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.