M1 Max Quadro Mac Pro Leaks - The Alder Lake KILLER! 🤯

sunny5 · Nov 13, 2021

Analog Kid said:
Is it really even necessary to build out that much unified memory? If you have 1.6TB/s access to 256GB of RAM in package, why not page it in from a marginally slower external bus to as many DDR5 DIMMS as you care to load?

The number of workloads that will truly require random access from anywhere to anywhere at minimal latency have to be tiny. As it stands, the purpose of system RAM is to keep the caches full without needing the high bandwidth/low latency you have on chip. Why wouldn’t you follow a similar architecture to keep the on chip RAMs fresh without needing the moderate bandwidth/moderate latency you have to on package RAM?

There ARE some people using 1~2TB of RAM or more. It's not us to decide not to use them. Apple already added 1.5TB of RAM for Mac Pro so it's not even about WHY.

LonestarOne · Nov 13, 2021

Bandaman said:
This is such an awful YouTube channel that everyone seems to be linking to lately.

That’s the guy who promised he’d shave his beard if Apple didn’t introduce high-end MacBooks at WWDC this year. So, how come he’s still got the hipster beard?

Jorbanead · Nov 13, 2021

LonestarOne said:
That’s the guy who promised he’d shave his beard if Apple didn’t introduce high-end MacBooks at WWDC this year. So, how come he’s still got the hipster beard?

There’s two of them - and the one that said this did. Hair grows back?

TopToffee · Nov 13, 2021

theorist9 said:
A 4x Max Mac Pro would have some clear potential hardware advantages and disadvantages vs. a PC workstation.

The PC hardware advantages are in the upper-end configurations, and could be addressed if Apple offered add-on RAM and GPU modues. In deciding on this, Apple will certainly look at what percent of its current Mac Pro sales use the highest RAM and GPU configurations.

Hardware advantages, Mac Pro

Extraordinary efficiency

Quiet operation

Task-specific hardware acceleration, which makes those specific operations run unusually fast

High single-core speeds, especially during multi-core operation, when compared to high core-count Intel Xeon and AMD Threadripper chips (the latter need to have reduced clock speeds to avoid overheating, particularly when all cores are running; that's much less of an issue with AS). This would give much faster operation for multi-core apps that can only utilize a limited number of cores.

Unified memory gives the GPU access to unusually large amounts of RAM

Hardware advantages, PC workstation

Much higher maximum RAM (unless Apple offers add-on RAM modules). A 4X Max will have 256 GB; Ice Lake can have up to 2 TB. Not sure about Threadripper, but it looks like its max is 1 TB.

Much higher maximum GPU performance (unless Apple offers add-on GPU modules). A 4X Max should have performance about comparable to a single A6000 desktop chip. Current PC workstations can be configured with up to three of these.

There are also clear software advanatages and disadvantages to each, which aren't addressed here.

Out of curiosity, what are the use cases for 1-2TB of RAM?

Tagbert · Nov 13, 2021

TopToffee said:
Out of curiosity, what are the use cases for 1-2TB of RAM?

Large scale 3D rendering. Complex models can consume a lot of RAM.
Scientific modeling with large datasets.

I’m sure there are others

Analog Kid · Nov 13, 2021

sunny5 said:
There ARE some people using 1~2TB of RAM or more. It's not us to decide not to use them. Apple already added 1.5TB of RAM for Mac Pro so it's not even about WHY.

I think you missed my point. The question wasn’t whether it was necessary to have 2TB of RAM. The question was whether it was necessary to have 2TB of unified memory. Read the rest of my comment, it might make more sense:

Analog Kid said:
Is it really even necessary to build out that much unified memory? If you have 1.6TB/s access to 256GB of RAM in package, why not page it in from a marginally slower external bus to as many DDR5 DIMMS as you care to load?

The number of workloads that will truly require random access from anywhere to anywhere at minimal latency have to be tiny. As it stands, the purpose of system RAM is to keep the caches full without needing the high bandwidth/low latency you have on chip. Why wouldn’t you follow a similar architecture to keep the on package RAMs fresh without needing the moderate bandwidth/moderate latency you have to on package RAM?

Analog Kid · Nov 13, 2021

Bandaman said:
This is such an awful YouTube channel that everyone seems to be linking to lately.

I have to agree. I actually watched this one and now wish I had that 15min back. The strained breathless delivery, the 100% certainty of things that seem to be opinion in the end, the arbitrary “10% for overhead” hand waving nonsense, the “Twitter agrees” as supporting argument.

There wasn’t any fresh insight at all— mostly just regurgitating stuff others, particularly Gurman, has said.

I feel like there’s a lot more insight and expertise in these forums.

singhs.apps · Nov 13, 2021

DFP1989 said:
Referring to it as “Quadro” is just stupid and confusing too, since that is the name for nVidias workstation GPUs.

The video is just a long winded bunch of speculation.

Absolutely. I have seen some of his videos and the channel seems like an over eager amateur fanboy trying to appear sophisticated by pass on the stuff he would have read online here and there and creating video of a ‘leak’.

I laughed when he speculated about a single slot discreet GPU. Assuming full fat 128core GPU, the power draw should breach the 200w at a minimum.. perhaps even going as high as 280w (assuming the m1 max GPU ranges between 50-70w under sustained full load). No way will they be able to fit a single slot GPU in there with such power requirements

rgeneral · Nov 13, 2021

M1 ultimate

mtneer · Nov 13, 2021

UBS28 said:
Since the Alder Lake is only 40% - 50% faster than a M1 Max according to Geekbench, a 4 x M1 Max Pro would beat it.

It won’t beat an AMD EPYC in maximum performance. But it will win in single core performance tasks.

Not sure if performance scales so easily by daisy chaining processors. The chip to chip communication overhead will be a drag.

Adarna · Nov 13, 2021

Bandaman said:
This is such an awful YouTube channel that everyone seems to be linking to lately.

They're the only one talking about the rumors in a video format.

sunny5 · Nov 13, 2021

Analog Kid said:
I think you missed my point. The question wasn’t whether it was necessary to have 2TB of RAM. The question was whether it was necessary to have 2TB of unified memory. Read the rest of my comment, it might make more sense:

I think you are the one who missed my point. Unified memory is not magic and It still requires a lot of memory for heavy tasks. What you are saying is somewhat similar to this: 16GB Unified Memory is equal to 32GB RAM. What if the task size is huge which requires 32GB of memory? Also, it's better to have more than less. Since Mac Pro supports rack version, you also need to consider way more than 2TB of unified memory if the task size is huge.

Dont forget that unified memory also need to be used for GPU. You can not defy physics.

B-Mc-C · Nov 13, 2021

So many haters in this thread, lol. Apple Silicon is going to crush Intel, deal with it. Max Tech will do just fine with nearly a million subscribers that enjoy his content, compared with 10 or so people here who can’t stand it.

Jorbanead · Nov 13, 2021

sunny5 said:
I think you are the one who missed my point. Unified memory is not magic and It still requires a lot of memory for heavy tasks. What you are saying is somewhat similar to this: 16GB Unified Memory is equal to 32GB RAM. What if the task size is huge which requires 32GB of memory? Also, it's better to have more than less. Since Mac Pro supports rack version, you also need to consider way more than 2TB of unified memory if the task size is huge.

Dont forget that unified memory also need to be used for GPU. You can not defy physics.

You still missed their point. That’s not at all what they were saying. They’re not arguing if people need 2TB of memory. They’re arguing if all 2TB needs to be unified memory.

This is something I’ve thought about too as a 4x M1 Max chip would result in 128 GB of ram based off of what we know. So my theory is Apple would also use off-package ram (as is done with Intel and AMD chips) to achieve 2TB of ram capacity.

Boil · Nov 13, 2021

mtneer said:
Not sure if performance scales so easily by daisy chaining processors. The chip to chip communication overhead will be a drag.

I would calculate 10% for overhead... ;^p

sunny5 said:
I think you are the one who missed my point. Unified memory is not magic and It still requires a lot of memory for heavy tasks. What you are saying is somewhat similar to this: 16GB Unified Memory is equal to 32GB RAM. What if the task size is huge which requires 32GB of memory? Also, it's better to have more than less. Since Mac Pro supports rack version, you also need to consider way more than 2TB of unified memory if the task size is huge.

Dont forget that unified memory also need to be used for GPU. You can not defy physics.

What does a rack version of the Mac Pro have to do with more than 1.5TB of RAM, which is all either version of the 2019 Mc Pro ever supported...?!?

Jorbanead said:
You still missed their point. That’s not at all what they were saying. They’re not arguing if people need 2TB of memory. They’re arguing if all 2TB needs to be unified memory.

This is something I’ve thought about too as a 4x M1 Max chip would result in 128 GB of ram based off of what we know. So my theory is Apple would also use off-package ram (as is done with Intel and AMD chips) to achieve 2TB of ram capacity.

Any RAM beyond the UMA on the SoC / SiP / MCM would be secondary RAM, faster than swapping to the SSD, hereforth to be known as AppleCache...

crazy dave · Nov 13, 2021

Analog Kid said:
Is it really even necessary to build out that much unified memory? If you have 1.6TB/s access to 256GB of RAM in package, why not page it in from a marginally slower external bus to as many DDR5 DIMMS as you care to load?

The number of workloads that will truly require random access from anywhere to anywhere at minimal latency have to be tiny. As it stands, the purpose of system RAM is to keep the caches full without needing the high bandwidth/low latency you have on chip. Why wouldn’t you follow a similar architecture to keep the on package RAMs fresh without needing the moderate bandwidth/moderate latency you have to on package RAM?

Jorbanead said:
You still missed their point. That’s not at all what they were saying. They’re not arguing if people need 2TB of memory. They’re arguing if all 2TB needs to be unified memory.

This is something I’ve thought about too as a 4x M1 Max chip would result in 128 GB of ram based off of what we know. So my theory is Apple would also use off-package ram (as is done with Intel and AMD chips) to achieve 2TB of ram capacity.

Boil said:
Any RAM beyond the UMA on the SoC / SiP / MCM would be secondary RAM, faster than swapping to the SSD, hereforth to be known as AppleCache...

While I don’t know what Apple will do, the memory doesn’t *need* to be integrated or on-package to be part of a unified memory system. Integrating lpDDR5 on-package lowers the power cost of the memory, but doesn’t intrinsically change anything else with respect to unified memory. Standard DDR5 modules could service a unified memory architecture. It’s just there would have to be a very large number of slots and they’d all have to be filled to make sure the bandwidth is high enough especially to run a GPU off of it. So there are downsides and again I don’t know what Apple will do, but DDR5 doesn’t *have* to be used as a separate pool in the memory hierarchy or forgone altogether.

sunny5 · Nov 13, 2021

Jorbanead said:
You still missed their point. That’s not at all what they were saying. They’re not arguing if people need 2TB of memory. They’re arguing if all 2TB needs to be unified memory.

This is something I’ve thought about too as a 4x M1 Max chip would result in 128 GB of ram based off of what we know. So my theory is Apple would also use off-package ram (as is done with Intel and AMD chips) to achieve 2TB of ram capacity.

I dont understand. Why not? Also, what if the task itself requires a lot of space then?

sunny5 · Nov 13, 2021

Boil said:
What does a rack version of the Mac Pro have to do with more than 1.5TB of RAM, which is all either version of the 2019 Mc Pro ever supported...?!?

It can be used as a rendering farm, server, and more with tons of Mac Pro.

crazy dave · Nov 13, 2021

sunny5 said:
I dont understand. Why not? Also, what if the task itself requires a lot of space then?

The idea is that the RAM would be split in two: 1) unified memory either lpDDR5 or HBM which has a smaller pool but larger bandwidth and feeds the SOC directly and 2) standard DRAM with a larger pool that could go to that 1.5-2 TB limit and would act like a cache for the huge data set. Technically not *necessary* but is an approach Apple *could* adopt.

Analog Kid · Nov 13, 2021

Boil said:
Any RAM beyond the UMA on the SoC / SiP / MCM would be secondary RAM, faster than swapping to the SSD, hereforth to be known as AppleCache...

crazy dave said:
The idea is that the RAM would be split in two: 1) unified memory either lpDDR5 or HBM which has a smaller pool but larger bandwidth and feeds the SOC directly and 2) standard DRAM with a larger pool that could go to that 1.5-2 TB limit and would act like a cache for the huge data set. Technically not *necessary* but is an approach Apple *could* adopt.

This is where I was going, though whether the on package or off package memory is the cache is probably a matter of whether you view it like a processor cache or a disk cache.

Either way, this was essentially my point. The on package RAM has a number of key benefits-- power certainly, but more important in a desktop is the fact that this memory is lower latency than going out to big bus of DIMMs, and it is equally accessible to all the on-chip computation units so there's no need to push GPU or Neural data across a PCIe bus or whatever, and then pull the results back.

If you try to pool external RAM into the Unified pool, then you'll have address dependent access profiles which really complicates how the system will handle memory.

It seems more sensible to use off-package RAM as a distinct pool in the virtual memory stack. There's essentially two ways to look at it: you can think of package RAM as an L4 cache (or whatever) or you can think of the off package RAM essentially as a RAM-disk for swap. In the end, the difference is academic-- I'd imagine that the package would page less used memory off package, and page needed memory into the package.

This seems simpler than @crazy dave 's idea of massive bandwidth to the off package RAM which would be necessary to prevent slowing down the workload depending on address if it was all treated as a massive unified pool. The cores would execute against the on-package RAM at all times, it would retain unified access to all cores. The cores wouldn't have direct access to off-package RAM, it would be paged into the package before it was accessed.

You might see a difference if you set all cores to calculating a parallel running sum of the entire address space, but I'd suspect that 256GB is a sufficiently large local workspace to not be a major bottleneck for most workloads.

crazy dave · Nov 13, 2021

Analog Kid said:
The on package RAM has a number of key benefits-- power certainly, but more important in a desktop is the fact that this memory is lower latency than going out to big bus of DIMMs, and it is equally accessible to all the on-chip computation units so there's no need to push GPU or Neural data across a PCIe bus or whatever, and then pull the results back.

Just to push back at one aspect of this: on-package memory doesn’t necessarily have lower latency than normal DRAM, just much lower power for that latency.

Boil · Nov 13, 2021

Analog Kid said:
This is where I was going, though whether the on package or off package memory is the cache is probably a matter of whether you view it like a processor cache or a disk cache.

Either way, this was essentially my point. The on package RAM has a number of key benefits-- power certainly, but more important in a desktop is the fact that this memory is lower latency than going out to big bus of DIMMs, and it is equally accessible to all the on-chip computation units so there's no need to push GPU or Neural data across a PCIe bus or whatever, and then pull the results back.

So, Unified Memory Architecture...? ;^p

Analog Kid said:
If you try to pool external RAM into the Unified pool, then you'll have address dependent access profiles which really complicates how the system will handle memory.

It seems more sensible to use off-package RAM as a distinct pool in the virtual memory stack. There's essentially two ways to look at it: you can think of package RAM as an L4 cache (or whatever) or you can think of the off package RAM essentially as a RAM-disk for swap. In the end, the difference is academic-- I'd imagine that the package would page less used memory off package, and page needed memory into the package.

^^^ This, to avoid any issues with overworked SSDs down the road...?

theorist9 · Nov 13, 2021

I wonder if ~1 TB of unified memory, which would make an unusual amount of RAM accesible to the GPU, would allow coders to accomplish things not commonly possible with a workstation. That's even more than the 640 GB of GPU RAM in an entire $200k NVIDIA DGX A100 supercomputer module, which contains 8x NVIDIA A100 80 GB GPUs.

Analog Kid · Nov 13, 2021

crazy dave said:
Just to push back at one aspect of this: on-package memory doesn’t necessarily have lower latency than normal DRAM, just much lower power for that latency.

Is that true? I assumed when the bus starts getting longer than the clock wavelength, and then the buffers and drivers at both ends, there’s going to be a prop delay associated.

throAU · Nov 13, 2021

theorist9 said:
The PC hardware advantages are in the upper-end configurations, and could be addressed if Apple offered add-on RAM and GPU modues. In deciding on this, Apple will certainly look at what percent of its current Mac Pro sales use the highest RAM and GPU configurations.

Not so much.

Part of Apple's performance advantage is that the RAM is so close to the SOC. Slots will negate that - the longer data path will make it hard to push the RAM so hard.

There's been talk of moving processing to the RAM chips for years; apple is halfway to doing this by putting the RAM on the same package as the SOC.

Apple aren't in the business of custom min/max tweaked configurations - they just work out a few options that make sense and you pick the closest one to your use case.

Is it "optimal"? No, but it does mean that Apple can get economy of scale for manufacturing which can help control their costs, and can help less informed consumers with decision making. There's no need to do a massive amount of analysis - you simply work out your constraint (ram or CPU or GPU) and buy a configuration that includes enough of THAT. It also gives the end user a more balanced machine, and enables apple to know what baselines are out there come software upgrade time.

M1 Max Quadro Mac Pro Leaks - The Alder Lake KILLER! 🤯

macrumors 68000

macrumors 65816

macrumors 65816

macrumors 65816

macrumors 603

macrumors G3

macrumors G3

macrumors 6502a

macrumors 6502

macrumors 68040

Suspended

macrumors 68000

macrumors 6502

macrumors 65816

macrumors 68040

macrumors 65816

macrumors 68000

macrumors 68000

macrumors 65816

macrumors G3

macrumors 65816

macrumors 68040

macrumors 68040

macrumors G3

macrumors G3

Our Staff