Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

sunny5

macrumors 68000
Jun 11, 2021
1,838
1,706
Is it really even necessary to build out that much unified memory? If you have 1.6TB/s access to 256GB of RAM in package, why not page it in from a marginally slower external bus to as many DDR5 DIMMS as you care to load?

The number of workloads that will truly require random access from anywhere to anywhere at minimal latency have to be tiny. As it stands, the purpose of system RAM is to keep the caches full without needing the high bandwidth/low latency you have on chip. Why wouldn’t you follow a similar architecture to keep the on chip RAMs fresh without needing the moderate bandwidth/moderate latency you have to on package RAM?
There ARE some people using 1~2TB of RAM or more. It's not us to decide not to use them. Apple already added 1.5TB of RAM for Mac Pro so it's not even about WHY.
 

TopToffee

macrumors 65816
Jul 9, 2008
1,070
992
A 4x Max Mac Pro would have some clear potential hardware advantages and disadvantages vs. a PC workstation.

The PC hardware advantages are in the upper-end configurations, and could be addressed if Apple offered add-on RAM and GPU modues. In deciding on this, Apple will certainly look at what percent of its current Mac Pro sales use the highest RAM and GPU configurations.

Hardware advantages, Mac Pro
  • Extraordinary efficiency
  • Quiet operation
  • Task-specific hardware acceleration, which makes those specific operations run unusually fast
  • High single-core speeds, especially during multi-core operation, when compared to high core-count Intel Xeon and AMD Threadripper chips (the latter need to have reduced clock speeds to avoid overheating, particularly when all cores are running; that's much less of an issue with AS). This would give much faster operation for multi-core apps that can only utilize a limited number of cores.
  • Unified memory gives the GPU access to unusually large amounts of RAM

Hardware advantages, PC workstation
  • Much higher maximum RAM (unless Apple offers add-on RAM modules). A 4X Max will have 256 GB; Ice Lake can have up to 2 TB. Not sure about Threadripper, but it looks like its max is 1 TB.
  • Much higher maximum GPU performance (unless Apple offers add-on GPU modules). A 4X Max should have performance about comparable to a single A6000 desktop chip. Current PC workstations can be configured with up to three of these.

There are also clear software advanatages and disadvantages to each, which aren't addressed here.
Out of curiosity, what are the use cases for 1-2TB of RAM?
 

Analog Kid

macrumors G3
Mar 4, 2003
9,360
12,603
There ARE some people using 1~2TB of RAM or more. It's not us to decide not to use them. Apple already added 1.5TB of RAM for Mac Pro so it's not even about WHY.
I think you missed my point. The question wasn’t whether it was necessary to have 2TB of RAM. The question was whether it was necessary to have 2TB of unified memory. Read the rest of my comment, it might make more sense:

Is it really even necessary to build out that much unified memory? If you have 1.6TB/s access to 256GB of RAM in package, why not page it in from a marginally slower external bus to as many DDR5 DIMMS as you care to load?

The number of workloads that will truly require random access from anywhere to anywhere at minimal latency have to be tiny. As it stands, the purpose of system RAM is to keep the caches full without needing the high bandwidth/low latency you have on chip. Why wouldn’t you follow a similar architecture to keep the on package RAMs fresh without needing the moderate bandwidth/moderate latency you have to on package RAM?
 
Last edited:

Analog Kid

macrumors G3
Mar 4, 2003
9,360
12,603
This is such an awful YouTube channel that everyone seems to be linking to lately.
I have to agree. I actually watched this one and now wish I had that 15min back. The strained breathless delivery, the 100% certainty of things that seem to be opinion in the end, the arbitrary “10% for overhead” hand waving nonsense, the “Twitter agrees” as supporting argument.

There wasn’t any fresh insight at all— mostly just regurgitating stuff others, particularly Gurman, has said.

I feel like there’s a lot more insight and expertise in these forums.
 

singhs.apps

macrumors 6502a
Oct 27, 2016
660
400
Referring to it as “Quadro” is just stupid and confusing too, since that is the name for nVidias workstation GPUs.

The video is just a long winded bunch of speculation.
Absolutely. I have seen some of his videos and the channel seems like an over eager amateur fanboy trying to appear sophisticated by pass on the stuff he would have read online here and there and creating video of a ‘leak’.

I laughed when he speculated about a single slot discreet GPU. Assuming full fat 128core GPU, the power draw should breach the 200w at a minimum.. perhaps even going as high as 280w (assuming the m1 max GPU ranges between 50-70w under sustained full load). No way will they be able to fit a single slot GPU in there with such power requirements
 

mtneer

macrumors 68040
Sep 15, 2012
3,183
2,715
Since the Alder Lake is only 40% - 50% faster than a M1 Max according to Geekbench, a 4 x M1 Max Pro would beat it.

It won’t beat an AMD EPYC in maximum performance. But it will win in single core performance tasks.

Not sure if performance scales so easily by daisy chaining processors. The chip to chip communication overhead will be a drag.
 
  • Like
Reactions: rhysmorgan

sunny5

macrumors 68000
Jun 11, 2021
1,838
1,706
I think you missed my point. The question wasn’t whether it was necessary to have 2TB of RAM. The question was whether it was necessary to have 2TB of unified memory. Read the rest of my comment, it might make more sense:
I think you are the one who missed my point. Unified memory is not magic and It still requires a lot of memory for heavy tasks. What you are saying is somewhat similar to this: 16GB Unified Memory is equal to 32GB RAM. What if the task size is huge which requires 32GB of memory? Also, it's better to have more than less. Since Mac Pro supports rack version, you also need to consider way more than 2TB of unified memory if the task size is huge.

Dont forget that unified memory also need to be used for GPU. You can not defy physics.
 
  • Like
Reactions: T'hain Esh Kelch

Jorbanead

macrumors 65816
Aug 31, 2018
1,209
1,438
I think you are the one who missed my point. Unified memory is not magic and It still requires a lot of memory for heavy tasks. What you are saying is somewhat similar to this: 16GB Unified Memory is equal to 32GB RAM. What if the task size is huge which requires 32GB of memory? Also, it's better to have more than less. Since Mac Pro supports rack version, you also need to consider way more than 2TB of unified memory if the task size is huge.

Dont forget that unified memory also need to be used for GPU. You can not defy physics.
You still missed their point. That’s not at all what they were saying. They’re not arguing if people need 2TB of memory. They’re arguing if all 2TB needs to be unified memory.

This is something I’ve thought about too as a 4x M1 Max chip would result in 128 GB of ram based off of what we know. So my theory is Apple would also use off-package ram (as is done with Intel and AMD chips) to achieve 2TB of ram capacity.
 
Last edited:

Boil

macrumors 68040
Oct 23, 2018
3,478
3,173
Stargate Command
Not sure if performance scales so easily by daisy chaining processors. The chip to chip communication overhead will be a drag.

I would calculate 10% for overhead... ;^p

I think you are the one who missed my point. Unified memory is not magic and It still requires a lot of memory for heavy tasks. What you are saying is somewhat similar to this: 16GB Unified Memory is equal to 32GB RAM. What if the task size is huge which requires 32GB of memory? Also, it's better to have more than less. Since Mac Pro supports rack version, you also need to consider way more than 2TB of unified memory if the task size is huge.

Dont forget that unified memory also need to be used for GPU. You can not defy physics.

What does a rack version of the Mac Pro have to do with more than 1.5TB of RAM, which is all either version of the 2019 Mc Pro ever supported...?!?

You still missed their point. That’s not at all what they were saying. They’re not arguing if people need 2TB of memory. They’re arguing if all 2TB needs to be unified memory.

This is something I’ve thought about too as a 4x M1 Max chip would result in 128 GB of ram based off of what we know. So my theory is Apple would also use off-package ram (as is done with Intel and AMD chips) to achieve 2TB of ram capacity.

Any RAM beyond the UMA on the SoC / SiP / MCM would be secondary RAM, faster than swapping to the SSD, hereforth to be known as AppleCache...
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
Is it really even necessary to build out that much unified memory? If you have 1.6TB/s access to 256GB of RAM in package, why not page it in from a marginally slower external bus to as many DDR5 DIMMS as you care to load?

The number of workloads that will truly require random access from anywhere to anywhere at minimal latency have to be tiny. As it stands, the purpose of system RAM is to keep the caches full without needing the high bandwidth/low latency you have on chip. Why wouldn’t you follow a similar architecture to keep the on package RAMs fresh without needing the moderate bandwidth/moderate latency you have to on package RAM?

You still missed their point. That’s not at all what they were saying. They’re not arguing if people need 2TB of memory. They’re arguing if all 2TB needs to be unified memory.

This is something I’ve thought about too as a 4x M1 Max chip would result in 128 GB of ram based off of what we know. So my theory is Apple would also use off-package ram (as is done with Intel and AMD chips) to achieve 2TB of ram capacity.

Any RAM beyond the UMA on the SoC / SiP / MCM would be secondary RAM, faster than swapping to the SSD, hereforth to be known as AppleCache...

While I don’t know what Apple will do, the memory doesn’t *need* to be integrated or on-package to be part of a unified memory system. Integrating lpDDR5 on-package lowers the power cost of the memory, but doesn’t intrinsically change anything else with respect to unified memory. Standard DDR5 modules could service a unified memory architecture. It’s just there would have to be a very large number of slots and they’d all have to be filled to make sure the bandwidth is high enough especially to run a GPU off of it. So there are downsides and again I don’t know what Apple will do, but DDR5 doesn’t *have* to be used as a separate pool in the memory hierarchy or forgone altogether.
 

sunny5

macrumors 68000
Jun 11, 2021
1,838
1,706
You still missed their point. That’s not at all what they were saying. They’re not arguing if people need 2TB of memory. They’re arguing if all 2TB needs to be unified memory.

This is something I’ve thought about too as a 4x M1 Max chip would result in 128 GB of ram based off of what we know. So my theory is Apple would also use off-package ram (as is done with Intel and AMD chips) to achieve 2TB of ram capacity.
I dont understand. Why not? Also, what if the task itself requires a lot of space then?
 

sunny5

macrumors 68000
Jun 11, 2021
1,838
1,706
What does a rack version of the Mac Pro have to do with more than 1.5TB of RAM, which is all either version of the 2019 Mc Pro ever supported...?!?
It can be used as a rendering farm, server, and more with tons of Mac Pro.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
I dont understand. Why not? Also, what if the task itself requires a lot of space then?

The idea is that the RAM would be split in two: 1) unified memory either lpDDR5 or HBM which has a smaller pool but larger bandwidth and feeds the SOC directly and 2) standard DRAM with a larger pool that could go to that 1.5-2 TB limit and would act like a cache for the huge data set. Technically not *necessary* but is an approach Apple *could* adopt.
 

Analog Kid

macrumors G3
Mar 4, 2003
9,360
12,603
Any RAM beyond the UMA on the SoC / SiP / MCM would be secondary RAM, faster than swapping to the SSD, hereforth to be known as AppleCache...
The idea is that the RAM would be split in two: 1) unified memory either lpDDR5 or HBM which has a smaller pool but larger bandwidth and feeds the SOC directly and 2) standard DRAM with a larger pool that could go to that 1.5-2 TB limit and would act like a cache for the huge data set. Technically not *necessary* but is an approach Apple *could* adopt.

This is where I was going, though whether the on package or off package memory is the cache is probably a matter of whether you view it like a processor cache or a disk cache.

Either way, this was essentially my point. The on package RAM has a number of key benefits-- power certainly, but more important in a desktop is the fact that this memory is lower latency than going out to big bus of DIMMs, and it is equally accessible to all the on-chip computation units so there's no need to push GPU or Neural data across a PCIe bus or whatever, and then pull the results back.

If you try to pool external RAM into the Unified pool, then you'll have address dependent access profiles which really complicates how the system will handle memory.

It seems more sensible to use off-package RAM as a distinct pool in the virtual memory stack. There's essentially two ways to look at it: you can think of package RAM as an L4 cache (or whatever) or you can think of the off package RAM essentially as a RAM-disk for swap. In the end, the difference is academic-- I'd imagine that the package would page less used memory off package, and page needed memory into the package.

This seems simpler than @crazy dave 's idea of massive bandwidth to the off package RAM which would be necessary to prevent slowing down the workload depending on address if it was all treated as a massive unified pool. The cores would execute against the on-package RAM at all times, it would retain unified access to all cores. The cores wouldn't have direct access to off-package RAM, it would be paged into the package before it was accessed.

You might see a difference if you set all cores to calculating a parallel running sum of the entire address space, but I'd suspect that 256GB is a sufficiently large local workspace to not be a major bottleneck for most workloads.
 
Last edited:

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
The on package RAM has a number of key benefits-- power certainly, but more important in a desktop is the fact that this memory is lower latency than going out to big bus of DIMMs, and it is equally accessible to all the on-chip computation units so there's no need to push GPU or Neural data across a PCIe bus or whatever, and then pull the results back.

Just to push back at one aspect of this: on-package memory doesn’t necessarily have lower latency than normal DRAM, just much lower power for that latency.
 
  • Like
Reactions: Argoduck

Boil

macrumors 68040
Oct 23, 2018
3,478
3,173
Stargate Command
This is where I was going, though whether the on package or off package memory is the cache is probably a matter of whether you view it like a processor cache or a disk cache.

Either way, this was essentially my point. The on package RAM has a number of key benefits-- power certainly, but more important in a desktop is the fact that this memory is lower latency than going out to big bus of DIMMs, and it is equally accessible to all the on-chip computation units so there's no need to push GPU or Neural data across a PCIe bus or whatever, and then pull the results back.

So, Unified Memory Architecture...? ;^p

If you try to pool external RAM into the Unified pool, then you'll have address dependent access profiles which really complicates how the system will handle memory.

It seems more sensible to use off-package RAM as a distinct pool in the virtual memory stack. There's essentially two ways to look at it: you can think of package RAM as an L4 cache (or whatever) or you can think of the off package RAM essentially as a RAM-disk for swap. In the end, the difference is academic-- I'd imagine that the package would page less used memory off package, and page needed memory into the package.

^^^ This, to avoid any issues with overworked SSDs down the road...?
 
  • Like
Reactions: Argoduck

theorist9

macrumors 68040
May 28, 2015
3,881
3,060
I wonder if ~1 TB of unified memory, which would make an unusual amount of RAM accesible to the GPU, would allow coders to accomplish things not commonly possible with a workstation. That's even more than the 640 GB of GPU RAM in an entire $200k NVIDIA DGX A100 supercomputer module, which contains 8x NVIDIA A100 80 GB GPUs.
 
  • Like
Reactions: Argoduck

Analog Kid

macrumors G3
Mar 4, 2003
9,360
12,603
Just to push back at one aspect of this: on-package memory doesn’t necessarily have lower latency than normal DRAM, just much lower power for that latency.
Is that true? I assumed when the bus starts getting longer than the clock wavelength, and then the buffers and drivers at both ends, there’s going to be a prop delay associated.
 

throAU

macrumors G3
Feb 13, 2012
9,204
7,354
Perth, Western Australia
The PC hardware advantages are in the upper-end configurations, and could be addressed if Apple offered add-on RAM and GPU modues. In deciding on this, Apple will certainly look at what percent of its current Mac Pro sales use the highest RAM and GPU configurations.

Not so much.

Part of Apple's performance advantage is that the RAM is so close to the SOC. Slots will negate that - the longer data path will make it hard to push the RAM so hard.

There's been talk of moving processing to the RAM chips for years; apple is halfway to doing this by putting the RAM on the same package as the SOC.

Apple aren't in the business of custom min/max tweaked configurations - they just work out a few options that make sense and you pick the closest one to your use case.

Is it "optimal"? No, but it does mean that Apple can get economy of scale for manufacturing which can help control their costs, and can help less informed consumers with decision making. There's no need to do a massive amount of analysis - you simply work out your constraint (ram or CPU or GPU) and buy a configuration that includes enough of THAT. It also gives the end user a more balanced machine, and enables apple to know what baselines are out there come software upgrade time.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.