Intel Alder Lake vs. Apple M1

altaic · Nov 10, 2021

quarkysg said:
From the die shots shown for the M1 Pro & Max, it appears that the memory controllers are split between the 2 P/E CPU clusters. For M1 Max, it looks like the 400GB/s bandwidth are split between the two CPU clusters. So each clusters could potentially get 200GB/s of bandwidth independently, as long as both cluster's SLC does not need to sync it's cache.

By "memory controllers" do you mean the LPDDR5 PHYs? How can you tell how the how they're split from the die shots?

quarkysg · Nov 10, 2021

altaic said:
By "memory controllers" do you mean the LPDDR5 PHYs? How can you tell how the how they're split from the die shots?

I'm going off from what Anandtech showed here:

https://images.anandtech.com/doci/17019/M1MAX.jpg

I'm just speculating how it may be designed, since the memory controllers are placed just next to the SLC blocks. The first stop of the memory fetches is the SLC.

Only Apple will know how they are actually wired. It makes sense to me to split it that way tho.

cmaier · Nov 10, 2021

quarkysg said:
I'm going off from what Anandtech showed here:

https://images.anandtech.com/doci/17019/M1MAX.jpg

I'm just speculating how it may be designed, since the memory controllers are placed just next to the SLC blocks. The first stop of the memory fetches is the SLC.

Only Apple will know how they are actually wired. It makes sense to me to split it that way tho.

The channels between the SLCs and the memory controllers appear to me to be plenty wide enough to allow either SLC block on a side to hit either memory controller on that side. Also appears to be room down the middle to allow left-to-right to be connected, but without seeing the metal I can’t tell what’s actually there.

quarkysg · Nov 10, 2021

cmaier said:
The channels between the SLCs and the memory controllers appear to me to be plenty wide enough to allow either SLC block on a side to hit either memory controller on that side. Also appears to be room down the middle to allow left-to-right to be connected, but without seeing the metal I can’t tell what’s actually there.

I do wonder how Apple wired up the SLC with the memory controllers tho.

Does the SLC on the left only cache memory requests to the left memory banks and it's then fed to the L2/L1 of the IP cores?

Or do the SLC from both sides sync their contents?

Wonder if this can be tested with software?

cmaier · Nov 10, 2021

quarkysg said:
I do wonder how Apple wired up the SLC with the memory controllers tho.

Does the SLC on the left only cache memory requests to the left memory banks and it's then fed to the L2/L1 of the IP cores?

Or do the SLC from both sides sync their contents?

Wonder if this can be tested with software?

It wouldn’t surprise me if left talks only to left and right talks Only to right.

theorist9 · Nov 10, 2021

Part commentary/part standup routine from The Macalope...

Alder Lake? More like Scalder Lake.

No? Yeah, okay. That’s fair.

This is, of course, the desktop version of the CPU. So, we can expect the laptop version, which Intel is planning to release next year, will either run significantly slower or will not technically be a “laptop” processor because putting a computer using it on your lap would be… inadvisable....

The problems here for Intel are going to be electricity, heat, and wearing shorts in the summer.

If some laptop maker decides to take a page from Apple and put the product name on the bottom of the device this could be a real branding opportunity.

Huh? Huh?!
Again, no?

Jeez, tough audience today.

Don't believe those lying Alder Lake vs M1 Pro benchmarks

Intel's latest may beat the M1 Pro and Max, but it's for desktops.

www.macworld.com

throAU · Nov 10, 2021

I got my 14” m1 pro this morning.

It arrived with 80% charge
I plugged it into my MacBook Air charger rather than bother to unbox the new one.

Time machine restore. Time machine backup. OS update. App installs. A bunch of file sync.

4 hours later on a charger reporting 24 watt output: full charge despite doing all that with an external display attached and running the internal one at 80%

Wow.

The 2020 mba would be screaming fans at me.

This has been totally silent and sipping 24 watts to do all that and charge itself at the same time.

Pressure · Nov 11, 2021

theorist9 said:
TLDR: It looks like, when they say the 400 GB/s bandwidth of the M1 Max's unified memory is unusually high for a laptop, they're comparing it to typical CPU RAM bandwidth, not typical GPU RAM bandwidth. I.e., more specifically, the Max's unified memory is giving the CPU an unusually high RAM bandwidth, and giving the GPU a typical RAM bandwidth (compared to other laptops in its class).

****
I've read the tradeoff between DDR and GDDR RAM is that the former provides low latency (needed for CPUs) and the latter provides high bandwidth (needed for GPUs). So it seems Apple decided they couldn't accept the high latency of GDDR RAM for unified memory, and instead used DDR RAM and increased the bandwidth to the point it was comparable to what is available from mobile-workstation-class GDDR.

For instance, a comparably-equipped (64 GB RAM, 4 TB SSD, 120 Hz screen) 17" Dell 7760 workstation laptop, with an 8-core Xeon W11955M and A4000 mobile, costs about the same (within 10%) as a 16" M1 Max ($4900 for the MBP and $5440 for the Dell, based on current pricing on Dell's website*). And the A4000 mobile offers 384 GB/s GPU RAM bandwidth (see link below).**

So it sounds like the Max's 400 GB/s isn't unprecedented when it comes to GPU bandwidth (compared to other laptops in its class), but might instead be unprecedented compared to their CPU bandwidth (?).

https://www.nvidia.com/content/dam/en-zz/Solutions/design-visualization/documents/nvidia-rtx-line-card-for-mobile-workstation.pdf

*At the time I configured it, a couple of days ago, they were offering an automatic discount of 35% off; they always offer heavy discounts, but I don't know if this particular discount is typical or not. Interestingly, I tried configuring it, left, and came back, and the 2nd time it offered me a higher discount--so maybe its tracks you and offers a lower discount if you don't bite the first time .

**You don't even need to go to a workstation-class laptop to get that GPU RAM bandwidth. For instance, an RTX3080 mobile also offers 384 GB/s memory bandwidth. Also, for comparison, the A4000 and RTX3080 desktop GPUs offer twice that: 768 GB/s

You are comparing Immediate Mode Rendering with Tile-Based Deferred Rendering.

The latter, which Apple's GPU is designed around, uses vastly less memory bandwidth.

For example the M1 Max 32-core GPU maxes out at around 90GB/s of memory bandwidth under load according to AnandTech.

quarkysg · Nov 11, 2021

Pressure said:
For example the M1 Max 32-core GPU maxes out at around 90GB/s of memory bandwidth under load according to AnandTech.

For rasterisation workload, TBDR definitely has lower bandwidth requirements.

For compute workload, not sure if the GPU will be able to suck in more memory faster. I would think it will be able to go much higher than 90 GB/s. I would not be surprised that it can saturate the entire 400GB/s bandwidth.

diamond.g · Nov 11, 2021

quarkysg said:
For rasterisation workload, TBDR definitely has lower bandwidth requirements.

For compute workload, not sure if the GPU will be able to suck in more memory faster. I would think it will be able to go much higher than 90 GB/s. I would not be surprised that it can saturate the entire 400GB/s bandwidth.

Actually that brings up a good question, Ethereum is memory bandwidth sensitive (it doesn't actually care as much about compute power), yet the M1 Max with it's 400GB of available bandwidth isn't able to match the 6700XT hashrate (and it has less bandwidth). I point out the 6700XT as it has half the WGPs of the 6900XT yet the ETH hashrate is like 80% of the 6900XT. There has to be a reason why.

quarkysg · Nov 11, 2021

diamond.g said:
Actually that brings up a good question, Ethereum is memory bandwidth sensitive (it doesn't actually care as much about compute power), yet the M1 Max with it's 400GB of available bandwidth isn't able to match the 6700XT hashrate (and it has less bandwidth). I point out the 6700XT as it has half the WGPs of the 6900XT yet the ETH hashrate is like 80% of the 6900XT. There has to be a reason why.

I would think the mining software is not optimised for the M1.

The standard PC architecture is bottlenecked by the PCIe bus, so I'm thinking that will play a part between the 6700XT and 6900XT hash rate, i.e. getting diminishing returns with ever greater compute horsepower.

The way Apple is re-architecting the Macs is a break away from the modular approach, which allows them to remove some of the bottlenecks.

diamond.g · Nov 11, 2021

quarkysg said:
I would think the mining software is not optimised for the M1.

The standard PC architecture is bottlenecked by the PCIe bus, so I'm thinking that will play a part between the 6700XT and 6900XT hash rate, i.e. getting diminishing returns with ever greater compute horsepower.

The way Apple is re-architecting the Macs is a break away from the modular approach, which allows them to remove some of the bottlenecks.

Ehhhh the 3090 has almost double the memory bandwidth of the 6900XT and gets about double the hashrate, so it scales fairly linearly as far as I can tell.

quarkysg · Nov 11, 2021

diamond.g said:
Ehhhh the 3090 has almost double the memory bandwidth of the 6900XT and gets about double the hashrate, so it scales fairly linearly as far as I can tell.

How about the hashrate between the 3080 and 3090?

diamond.g · Nov 11, 2021

quarkysg said:
How about the hashrate between the 3080 and 3090?

~100 Mh/s from ~760 GB/s of bandwidth. The Ti should get closer to the 3090 since the memory bandwidth is the same ~~if you could bybass the LHR mode~~.
If it were not a LHR card. All original 30-series cards (except the 3060) are "full hashrate" cards. The newer ti's are all supposed to be LHR cards, though I have seen that you can sorta work around the limits (usually be dual mining).

For reference the 3090 should get ~125 Mh/s from ~936 GB/s of bandwidth.

JMacHack · Nov 11, 2021

theorist9 said:
Then what advantage does DDR5 give over LPDDR5 that would cause a PC maker to choose the less efficient DDR5 for its desktops? Is it simply that DDR5 is less expensive and/or more available?

Well it lowers manufacturing costs for OEMs and Mobo manufacturers since they don’t have to buy the RAM modules and solder them to the board themselves, and they don’t need to make many different SKUs for different RAM combinations.

OEMs just buy the DIMMs separately, and DIYers buy as much as they please. It’s why cheaper laptops and desktops often have socketed components, they can make many SKUs from the parts bin. (This isn’t to say that non-socketed components are more “premium,” just the economics of building systems).

And of course, since it’s a desktop, power efficiency is less of a concern over raw performance. So LPDDR loses it’s advantage there.

JMacHack · Nov 11, 2021

quarkysg said:
How about the hashrate between the 3080 and 3090?

Speaking of hash rates, this is a little off-topic but I want to throw in my obsession with Vega and GCN.

NVIDIA Geforce RTX 3090 24GB Mining Hashrate - Perfect Hashrate

NVIDIA RTX 3090 24GB mining hashrate for each algorithm : [ Power Consumption 285 Watts/Hour ]: DaggerHashimoto [ EtHash : (ETH) & (ETC) ] Ethereum Mining Hashrate : 120 MH/s Octopus Mining Hashrate : 85.3 MH/s Kawpow Mining Hashrate 54 MH/s BeamV3 Mining Hashrate : 57 Sol/s GrinCuckatoo32...

perfecthashrate.com

Vs

AMD Radeon VII 16GB Mining Hashrate - Perfect Hashrate

AMD RADEON VII 16GB mining hashrate for each algorithm : [ Power Consumption 290 Watts/Hour ] DaggerHashimoto [ EtHash : (ETH) & (ETC) ] Ethereum Mining Hashrate : 90.56 MH/s Keccak Mining Hashrate : 0.81 MH/s Decred (DCR) Mining Hashrate : 2.35 GH/s Lbry ( LBC ) Mining Hashrate : 0.28 GH/s...

perfecthashrate.com

GCN was crazy good for compute despite being crap for graphics.

And, completely off-topic, but @cmaier what would I have to study to learn about designing circuits for processors. I find it interesting but I doubt I could hack it at the math.

JMacHack · Nov 11, 2021

accidentally double posted

leman · Nov 11, 2021

theorist9 said:
Then what advantage does DDR5 give over LPDDR5 that would cause a PC maker to choose the less efficient DDR5 for its desktops? Is it simply that DDR5 is less expensive and/or more available?

Desktop PC makers usually don't care about efficiency that much. SO-DIMM DDR is more ubiquitous, modular and cheaper. Historically, it was also faster, but the recent iterations of LPDDR have caught up in performance (and sometimes even overtook the desktop DDR).

What's also important is that LPDDR is a more sophisticated device that has additional features. This allows it to be much more power efficient, but also increases the latency by a good amount (seems to be around 20-30% for modern implementations).

cmaier · Nov 11, 2021

JMacHack said:
Speaking of hash rates, this is a little off-topic but I want to throw in my obsession with Vega and GCN.

NVIDIA Geforce RTX 3090 24GB Mining Hashrate - Perfect Hashrate

NVIDIA RTX 3090 24GB mining hashrate for each algorithm : [ Power Consumption 285 Watts/Hour ]: DaggerHashimoto [ EtHash : (ETH) & (ETC) ] Ethereum Mining Hashrate : 120 MH/s Octopus Mining Hashrate : 85.3 MH/s Kawpow Mining Hashrate 54 MH/s BeamV3 Mining Hashrate : 57 Sol/s GrinCuckatoo32...

perfecthashrate.com

Vs

AMD Radeon VII 16GB Mining Hashrate - Perfect Hashrate

AMD RADEON VII 16GB mining hashrate for each algorithm : [ Power Consumption 290 Watts/Hour ] DaggerHashimoto [ EtHash : (ETH) & (ETC) ] Ethereum Mining Hashrate : 90.56 MH/s Keccak Mining Hashrate : 0.81 MH/s Decred (DCR) Mining Hashrate : 2.35 GH/s Lbry ( LBC ) Mining Hashrate : 0.28 GH/s...

perfecthashrate.com

GCN was crazy good for compute despite being crap for graphics.

And, completely off-topic, but @cmaier what would I have to study to learn about designing circuits for processors. I find it interesting but I doubt I could hack it at the math.

It depends on whether you really mean circuits or something else. Hennessy and Patterson is the ”bible” for computer microarchitecture - it’s the textbook that I assume is still used, and was used at least as far back as the mid 1990’s. They actually have a bunch of books, but “Computer Architecture A Quantitive Approach” is the one everyone knows.

Also, Tannenbaum had a book, Structured Computer Organization, that I recall being pretty good.

If you want to get below the level of computer architecture and microarchitecture, the next level would be logic design - Boolean logic, figuring out how to convert an architecture into NAND/NOR gates and flip-flops, etc. I don’t know of a modern book on that topic - last time I picked up a text on that stuff was decades ago - now it’s all in my head

If you want to get below that level, and learn about designing the actual circuits at the transistor level, that’s a whole other can of worms, and my recommendation would depend on what kind of background you have. At least some electrical engineering (circuit analysis) or physics would be necessary to get started, I think. Then you have physical design, the actual polygons, and that requires less math, probably, but it wouldn’t do you much good to learn if you don’t understand the circuits.

diamond.g · Nov 11, 2021

JMacHack said:
Speaking of hash rates, this is a little off-topic but I want to throw in my obsession with Vega and GCN.

NVIDIA Geforce RTX 3090 24GB Mining Hashrate - Perfect Hashrate

NVIDIA RTX 3090 24GB mining hashrate for each algorithm : [ Power Consumption 285 Watts/Hour ]: DaggerHashimoto [ EtHash : (ETH) & (ETC) ] Ethereum Mining Hashrate : 120 MH/s Octopus Mining Hashrate : 85.3 MH/s Kawpow Mining Hashrate 54 MH/s BeamV3 Mining Hashrate : 57 Sol/s GrinCuckatoo32...

perfecthashrate.com

Vs

AMD Radeon VII 16GB Mining Hashrate - Perfect Hashrate

AMD RADEON VII 16GB mining hashrate for each algorithm : [ Power Consumption 290 Watts/Hour ] DaggerHashimoto [ EtHash : (ETH) & (ETC) ] Ethereum Mining Hashrate : 90.56 MH/s Keccak Mining Hashrate : 0.81 MH/s Decred (DCR) Mining Hashrate : 2.35 GH/s Lbry ( LBC ) Mining Hashrate : 0.28 GH/s...

perfecthashrate.com

GCN was crazy good for compute despite being crap for graphics.

And, completely off-topic, but @cmaier what would I have to study to learn about designing circuits for processors. I find it interesting but I doubt I could hack it at the math.

Yeah CDNA is based on GCN but with like none of the rasterization bits. I do wonder what the hashrates of the new MI200 series would be.

theorist9 · Nov 11, 2021

leman said:
Desktop PC makers usually don't care about efficiency that much. SO-DIMM DDR is more ubiquitous, modular and cheaper. Historically, it was also faster, but the recent iterations of LPDDR have caught up in performance (and sometimes even overtook the desktop DDR).

What's also important is that LPDDR is a more sophisticated device that has additional features. This allows it to be much more power efficient, but also increases the latency by a good amount (seems to be around 20-30% for modern implementations).

In a desktop AS implementation, would DDR's ubiquity, modularity*, reduced latency, and reduced cost provide enough benefit to outweigh LPDDR's efficiency and whatever other benefits it might provide?

*I list DDR's modularity** as a benefit for desktop (e.g., Mac Pro) buyers, but I know you disagree on this. [I also think it would be a marketing benefit to Apple, since I'll bet they get a lot of negativity from the pro community if they introduce a MacPro with non-upgradeable RAM—unless they provide a compelling architectural explanation for why soldered-in RAM is necessary on a desktop machine.]

**From what I've subsquently learned, LPDDR can also be modular—though that is not its typical implementation; and this appears more complicated to implement than with DDR: https://news.ycombinator.com/item?id=18408496

Kpjoslee · Nov 11, 2021

theorist9 said:
In a desktop AS implementation, would DDR's ubiquity, modularity*, reduced latency, and reduced cost provide enough benefit to outweigh LPDDR's efficiency and whatever other benefits it might provide?

*I list DDR's modularity** as a benefit for desktop (e.g., Mac Pro) buyers, but I know you disagree on this. [I also think it would be a marketing benefit to Apple, since I'll bet they get a lot of negativity from the pro community if they introduce a MacPro with non-upgradeable RAM—unless they provide a compelling architectural explanation for why soldered-in RAM is necessary on a desktop machine.]

**From what I've subsquently learned, LPDDR can also be modular—though that is not its typical implementation; and this appears more complicated to implement than with DDR: https://news.ycombinator.com/item?id=18408496

Modular LPDDR can be done but I doubt Apple would even bother doing it, when there is more conventional and less costly option with just using either tradional DIMM or SO-DIMM DDR5. If they ever planning on making AS Mac Pro user-upgradable.
I would also assume DDR5 would be the only option if they are trying to match at least the maximum memory current Mac Pro is offering (1.5TB).

theorist9 · Nov 11, 2021

I like this thread, since I'm getting so many of my questions answered!

Given this, I'd like to ask my three I/O questions:

1) Apple specifies a limit of two external displays on the M1 Pro. Owners of the M1 Pro have confirmed this is a hard limit (there are probably workarounds that allow more, but I'm talking by direct connection).

With 3 x TB4 and 1 x HDMI 2.0, and a powerful GPU, this maximum of two external displays seems like a surprising limitation. Heck, my mid-2014 MBP, with 2 x TB2 and 1 x HDMI 1.4, can (and does) drive three external displays. I.e., it drives displays from each of its video-capable ports.

What is it about the M1 Pro's architecture that explains this limitation? And likewise for the M1 Max, which is limited to three external displays (rather than being able to drive displays from all four of its video-capable ports, like the 16" Intel MBP could).

2) The TB4 standard calls for 40 Gb/s full duplex (bidirectionally). It also includes DisplayPort Alt Mode 2.0, which enables the interface to alternately support 80 Gb/s unidrectional transmission (see https://en.wikipedia.org/wiki/Thunderbolt_(interface)). However, from what poster Krevnik wrote below, it sounds like neither of these are available in practice, including on the M1 Pro/Max. Is there a consensus that this is the case?
And might we perhaps see one or both of these capabilities in the 2022 iMac Pro/Max's TB4 implementation?

Krevnik said:
First, Thunderbolt 3 and 4 as available on the market are a single 40Gbps full duplex connection. That’s it. While the cable was originally reported by anandtech as being able to carry two channels, it’s not used in either TB3 or 4. It’s possible that this second PHY channel is what will allow TB5 to deliver 80Gbps, but for now, both TB3 and 4 are a single connection. So for TrippLite to use the discussion they did, is misleading at best.

As for DisplayPort mode, things get a bit more complicated. For DisplayPort 2.0, it uses the Thunderbolt 3 PHY layer to reach the 80Gbps needed. Which means it has to be using both channels in the cable. However, Intel’s current TB4 controllers are DP1.4a compliant: https://www.intel.com/content/www/u...-thunderbolt-4-controller/specifications.html, meaning even though DP 2.0 can be carried over a Thunderbolt 3 active cable, the chips required to handle the alt mode aren’t (yet) here. But with Intel releasing GPUs with DP 2.0 support “Soon(tm)”, I suspect the controller chips can’t be far off either.

3) I've pasted, at the bottom, a nice explanation from usernames need to be uniq for why Apple limited the SD port to UHS-II (mainly, if they went with USH-III, then UHS-II cards would downgrade to UHS-I; plus UHS-III cards don't exist). But what's the explanation for why Apple decided to limit its HDMI port to 2.0? Was it a bandwidth limitation, a lack of reliable HDMI 2.1 controller chips, or something else?

usernames need to be uniq said:
Sony's A1 has a dual CFe Type A and UHS-ii SD card slot. An elegant solution but only Sony makes a CFE Type A card and they are expensive and limited in capacity compared to CFe Type B cards. "Most" A1 users are relying on only SD cards due to cost and no need for the higher bit rates and when it was released, they were hard to find.
UHS-iii cards don't exist. Even if they are, then UHS-iii is not backward compatible with UHS-ii speeds ie they fall back to UHS-i speed so it would have been worse for current users to put in a UHS-iii slot.
I don't expect UHS-iii cards to be commercially available. Maybe SD Express in a couple of years but CFe Type B will be well entrenched by then and available at a cheaper cost.

crazy dave · Nov 11, 2021

theorist9 said:
In a desktop AS implementation, would DDR's ubiquity, modularity*, reduced latency, and reduced cost provide enough benefit to outweigh LPDDR's efficiency and whatever other benefits it might provide?

*I list DDR's modularity** as a benefit for desktop (e.g., Mac Pro) buyers, but I know you disagree on this. [I also think it would be a marketing benefit to Apple, since I'll bet they get a lot of negativity from the pro community if they introduce a MacPro with non-upgradeable RAM—unless they provide a compelling architectural explanation for why soldered-in RAM is necessary on a desktop machine.]

**From what I've subsquently learned, LPDDR can also be modular—though that is not its typical implementation; and this appears more complicated to implement than with DDR: https://news.ycombinator.com/item?id=18408496

Kpjoslee said:
Modular LPDDR can be done but I doubt Apple would even bother doing it, when there is more conventional and less costly option with just using either tradional DIMM or SO-DIMM DDR5. If they ever planning on making AS Mac Pro user-upgradable.
I would also assume DDR5 would be the only option if they are trying to match at least the maximum memory current Mac Pro is offering (1.5TB).

The modularity is also an issue though. In order to get the full bandwidth you need a lot of DDR slots and all the slots have to be filled. You have to rely on the user to do that right. This is even more critical if it’s feeding the GPU as well as the CPU.

The current rumors are that Apple will still release a new Ice Lake Mac Pro system (there are references to such a system) and later the first AS Mac Pro will actually be a new mini Mac Pro.

Kpjoslee · Nov 11, 2021

crazy dave said:
The modularity is also an issue though. In order to get the full bandwidth you need a lot of DDR slots and all the slots have to be filled. You have to rely on the user to do that right. This is even more critical if it’s feeding the GPU as well as the CPU.

I think just filling up all the slots is going to be easier than dealing with this lol.

crazy dave said:
The current rumors are that Apple will still release a new Ice Lake Mac Pro system (there are references to such a system) and later the first AS Mac Pro will actually be a new mini Mac Pro.

I think Apple might have had Ice Lake Mac Pro planned initially, but may never see the light of the day.
AS "Mac Pro" could end being the entirely new system and might as well drop the Mac Pro name. They can just settle with either 256GB or 512GB(if they manage to double up the density next year) maximum memory configuration on 4x Jade-C

Intel Alder Lake vs. Apple M1

macrumors 6502a

macrumors 65816

Suspended

macrumors 65816

Suspended

macrumors 601

macrumors G4

macrumors 603

macrumors 65816

macrumors G4

macrumors 65816

macrumors G4

macrumors 65816

macrumors G4

Suspended

Suspended

Suspended

macrumors Core

Suspended

macrumors G4

macrumors 601

macrumors 6502

macrumors 601

macrumors 68000

macrumors 6502

Our Staff