Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

BenTested

macrumors newbie
Original poster
Jan 18, 2021
3
2
Screen Shot 2021-01-18 at 4.45.18 PM.png
 
  • Haha
Reactions: joanbcn91
If you go small or cube size you only need one handle in the middle.
 
Not right now, but once M1s support eGPUs they can support GPU cards which will likely happen with the M# Pro release.

It is very unlikely that Apple Silicon Macs will ever support eGPUs or any third-party GPUs at all. How do you imagine that happening? Who will write the respective drivers? And why would Apple want to allow third-party GPUs with a programming model that is so much different from their own?

Apple GPUs bring simplicity, stability and predictability. You can write software knowing that your set of assumptions will be correct for any device, from the iPhone to the Mac Pro, and you can explicitly make use of the low-latency CPU/GPU communication in your pro-level apps. And of course, using their own GPUs only allow Apple to further optimize Metal to the hardware, giving the developer console-like level of control over the GPU. Third-party GPUs break this system, for very little practical gain.
 
  • Like
Reactions: Jouls
It is very unlikely that Apple Silicon Macs will ever support eGPUs or any third-party GPUs at all. How do you imagine that happening? Who will write the respective drivers? And why would Apple want to allow third-party GPUs with a programming model that is so much different from their own?

Apple GPUs bring simplicity, stability and predictability. You can write software knowing that your set of assumptions will be correct for any device, from the iPhone to the Mac Pro, and you can explicitly make use of the low-latency CPU/GPU communication in your pro-level apps. And of course, using their own GPUs only allow Apple to further optimize Metal to the hardware, giving the developer console-like level of control over the GPU. Third-party GPUs break this system, for very little practical gain.
I thought Dr. Su said that AMD was still working with Apple. If not on GPUs and drivers, what then? The quote, "The M1 is more about how much processing and innovation there is in the market. This is an opportunity to innovate more, both in hardware and software, and it goes beyond the ISA. From our standpoint, there is still innovation in the PC space – we have lots of choices and people can use the same processors in a lot of different environments. We expect to see more specialization as we go forward over the next couple of years, and it enables more differentiation. But Apple continues to work with us as their graphics partner, and we work with them."

Edit: Source AMD CEO: Interview on 2021...
 
Who will write the respective drivers?
Uhmm... guess who is writing GPU drivers for Intel Macs... it's Apple! So they either re-introduce AMD support for ARM at some point (which I consider as unlikely) or they go fully custom with their own dedicated GPUs (more likely). I guess with the first big Apple Silicon Mac Pro we will see PCIe graphics cards with Apple's own dedicated GPU. Those would of course be usable inside an eGPU case given it has fitting power delivery for it (MPX). We might also see updated Blackmagic eGPUs with Apple's silicon inside.
 
Uhmm... guess who is writing GPU drivers for Intel Macs... it's Apple! So they either re-introduce AMD support for ARM at some point (which I consider as unlikely) or they go fully custom with their own dedicated GPUs (more likely). I guess with the first big Apple Silicon Mac Pro we will see PCIe graphics cards with Apple's own dedicated GPU. Those would of course be usable inside an eGPU case given it has fitting power delivery for it (MPX). We might also see updated Blackmagic eGPUs with Apple's silicon inside.

eGPU or third party graphics card wouldn’t be able to use unified memory architecture as the bandwidth of Thunderbolt or an internal PCIe slot would be too slow.

For Apple Silicon to use a third party graphics card, UMA would have to be disabled for the Apple GPU in the Mx SoC.

It’s a better option to scale up Apple’s CPU and GPU cores and maintain UMA instead of using traditional bottlenecked graphics over PCIe/Thunderbolt.
 
Of course a dedicated GPU won't participate in UMA. But why should UMA be disabled when using a dedicated GPU? Doesn't make sense.

Apple will use a similar set-up to what they did on Intel Macs with dedicated GPUs. They will use both an integrated GPU and a dedicated GPU. IGPU will be part of UMA, dGPU not. The main difference probably will be that the IGPU will be fully usable when the dGPU is active, contrary to Intel's Quick Sync where it only is used for video acceleration. This way they can assign tasks that benefit from large bandwidth to the UMA attached IGPU and tasks that benefit more from raw processing power to the dGPU.

I also seriously doubt that future Apple Silicon Mac Pros will solely rely on the UMA. We might still see some on-package RAM but I bet there will be classic DIMM slots as well. Imagine how large the SoC package would be with 128GB and upwards on it.
 
  • Like
Reactions: jdb8167
Of course a dedicated GPU won't participate in UMA. But why should UMA be disabled when using a dedicated GPU? Doesn't make sense.

I said only for graphics. All people need to look at for an early example is Silicon Graphics Wintel workstations that had UMA. With SGI’s graphics chip they had full UMA. With a third party Quadro card no UMA for graphics.

That meant using third party graphics gave no benefit over a much cheaper PC. They realized how bad that was for business and quickly went back to MIPS workstations and SGI graphics solutions only (but by then it was too late for them).

To maintain good profit margins and avoid comparisons with PCs, Apple should go fully proprietary.

A Cube sized model with a 12 core CPU and 12 core GPU.

A tower model starting with a 16 core CPU and 32 core GPU. The Apple GPU could be on a upgradable daughterboard with UMA support via a proprietary high bandwidth interconnect.

BTW, a 72 core Apple GPU on the 3nm process would be a smaller chip than Nvidia’s 3080 and draw about 100w less power with similar performance.
 
Last edited:
It is very unlikely that Apple Silicon Macs will ever support eGPUs or any third-party GPUs at all.

Sounds plausible, but how about Apple-designed GPUs? I could imagine an extension card containing, let‘s say, 32 extra GPU cores, or 64, that wouldn‘t fit onto the M# chip.

Maybe not only GPU but a variety of options? 64 extra neural cores? Or a mix of 16 CPU, 16 GPU, 16 neural cores?
 
I thought Dr. Su said that AMD was still working with Apple. If not on GPUs and drivers, what then? The quote, "The M1 is more about how much processing and innovation there is in the market. This is an opportunity to innovate more, both in hardware and software, and it goes beyond the ISA. From our standpoint, there is still innovation in the PC space – we have lots of choices and people can use the same processors in a lot of different environments. We expect to see more specialization as we go forward over the next couple of years, and it enables more differentiation. But Apple continues to work with us as their graphics partner, and we work with them."

Edit: Source AMD CEO: Interview on 2021...

My understanding for this would be that Apple is still currently shipping AMD graphics in some of their Intel machines, and depending on Apple's tradition timeline, I wouldn't be surprised if some Intel get new AMD GPUs this years before moving on to Apple Silicon next year. Note how vague her language is, it's not clear whether she is talking about the current products or the future products.

Anyhow, of course all I say is a mere speculation on my part, but it's based on Apple's developer documentation and the information they have released during the WWDC. Specifically, they have repeatedly mentioned this in regards to the Apple Silicon transition:

- Apple Silicon Macs will use UMA (unified memory architecture)
- Apple Silicon Macs will use Apple GPUs
- Apple Silicon Macs will use TBDR GPUs

And of course, there are always interpretations. One might say that they were talking about M1 Macs only and that future Apple Silicon Macs won't have these properties. I do not favor this interpretation as Apple's goal seems to be the simplification and streamlining of the programing model. I just don't see them retaining the heterogenous hardware and capabilities, it's agains their design goal.

I guess with the first big Apple Silicon Mac Pro we will see PCIe graphics cards with Apple's own dedicated GPU. Those would of course be usable inside an eGPU case given it has fitting power delivery for it (MPX). We might also see updated Blackmagic eGPUs with Apple's silicon inside.

I suppose this is something we disagree about. You see, I don't ever see Apple making a "traditional" dedicated GPU. They have been advertising UMA as one of the main selling points of Apple Silicon Macs and a dGPU just throw that out of the window. Apple Silicon architecture, with it's many heterogeneous processors, needs low-latency inter-processor communication to fully work it's magic. And given that UMA is particularly valuable for professional workflows, I would expect high-end Macs to fully embrace it. And finally, UMA is what truly makes Apple Silicon special. Not so much on the M1 — after all it's not much different from what Intel and AMD have been doing for years — but on a high-end Mac? Just imagine a Mac Pro with 1024-bit DDR5 interface: aggregate bandwidth of 800GB/s available to the CPU/GPU/ML accelerators, low RAM latency, zero-copy data sharing... there is nobody on the market who would be able to compete with that, except maybe AMD who have the tech to pul it off.

To reiterate, I believe that Apple is targeting a streamlined programming model and a common set of hardware guarantees. That's what makes Apple Silicon uniquely attractive from the developer's perspective, you don't need to think much about different hardware. You can run, deploy, and test the same code on the iPhone and the Mac Pro — knowing that the only major difference is performance. Having some machines with UMA and some without completely destroys this utopia. I just don't see Apple being lazy with this stuff, they really have the shot here on making a qualitative change to high-performance computing. It's what technical folks Apple have been dreaming for decades, they are not going to throw this chance away.

The main difference probably will be that the IGPU will be fully usable when the dGPU is active, contrary to Intel's Quick Sync where it only is used for video acceleration.

Just a quick not, but you can use the iGPU and dGPU simultaneously on Intel Macs. Metal exposes each graphical device separately and you can target it directly. I have done it myself.

This way they can assign tasks that benefit from large bandwidth to the UMA attached IGPU and tasks that benefit more from raw processing power to the dGPU.

Or they can just have one big GPU on their chip interconnect and benefit from both high performance and low latency.

I also seriously doubt that future Apple Silicon Mac Pros will solely rely on the UMA. We might still see some on-package RAM but I bet there will be classic DIMM slots as well. Imagine how large the SoC package would be with 128GB and upwards on it.

I am curious to see how they will solve it. There are few options I envision. For example, there is no technical reason why the memory has to be on-package. They could also use traditional DRAM for the Mac Pro. Current Pro has 12 RAM slots — give each of those an independent memory channel and you got yourself a 768-bit memory bus. With DDR5 this gives 600 GB/s of bandwidth without sacrificing latency. Not quite at the level of a 3090 Ti, but fast enough for pro-level stuff.

Overall, I would expect to see some sort of NUMA-based implementation on the Mac Pro, where you can combine different extension modules, each with their own CPU/GPU/RAM. They introduced basis of NUMA APIs with Metal some time ago, where devices are assigned into groups that share the same physical memory.
 
It looks like you are considering UMA as something very special, groundbreaking and new. It's not! UMA is what is known shared memory in the PC world. The only difference in Apple's implementation is that there are more components accessing the same memory and the lack of a fixed memory assignment to those components. The performance gains are coming from the on-package memory which dramatically reduces latency and in turn causes real-life data throughput being much closer to the theoretical values (LP)DDR RAM is capable of. As soon as Apple pulls that memory off package, the performance benefit is mostly gone. That's why I am 100% sure we will see a mix of on-package UMA plus DIMM slots for expansion in Apple Silicon Mac Pros. It will obviously need some sophisticated memory management to get the most out of that.

Just a quick not, but you can use the iGPU and dGPU simultaneously on Intel Macs. Metal exposes each graphical device separately and you can target it directly. I have done it myself.

It depends on how the IGPU is implemented. On a MacBook where both the IGPU and dGPU is usable for graphics output it is possible. On a Mac where the IGPU is only used for Quick Sync (iMac) it is not possible without tinkering because Metal acceleration is not enabled for IGPUs that are configured with a connectorless ig-platform-id. You can work around that using Whatevergreen Kext tho.
 
Last edited:
  • Like
Reactions: jdb8167
It looks like you are considering UMA as something very special, groundbreaking and new. It's not! UMA is what is known shared memory in the PC world.

I am well aware of that. That's also why I wrote that M1's memory implementation is not any different from what Intel and AMD have been doing for years. But shared memory is practically non-existent in the high-performance segment (save for some specialized HPC clusters), because high-performance shared memory is expensive and prevents modular mixing and matching of components. By going full custom, Apple is in position to bring UMA and it's many benefits to the high-end.

The performance gains are coming from the on-package memory which dramatically reduces latency and in turn causes real-life data throughput being much closer to the theoretical values (L)PDDR RAM is capable of.

I am not sure that on-package LPDDR4X in M1 actually offers that much of a performance benefits. It's still a dual-channel LPDDR4X (with 4 16-bit subchannels per chip), and it's bandwidth as well as latency is not any different from a Tiger Lake system (Anandtech did a detailed review of both systems). My guess is that on-package RAM is more of a power+cost-saving optiomoatiop at this point. What is more important for performance is that a single Apple CPU can saturate the full available bandwidth on it's own, while on Tiger Lake you need to use multiple cores simultaneously.

But this I where Apple's UMA implementation only starts being interesting. If they continue on their path of providing a separate memory channel for each RAM chip, they will get linear scaling, essentially building a "poor man's HMB" without sacrificing the latency. The implications for HPC are substantial. Computer system memory is traditionally split into "low latency, low bandwidth" on the CPU side, and "high latency, low bandwidth" on the GPU, with extremely slow data transfer between the two. Apple has the chance of streamlining it to "low latency, high bandwidth" across the board, with zero-cost data transfer. This will allow principally new applications and new levels of performance. And frankly, this is essential to their heterogenous computing paradigm.

On a Mac where the IGPU is only used for Quick Sync (iMac) it is not possible without tinkering because Metal acceleration is not enabled for IGPUs that are configured with a connectorless ig-platform-id.

Ah, that's what you mean. Well, I didn't even consider that case since the iGPU on Mac desktops is essentially disabled.
 
I am not sure that on-package LPDDR4X in M1 actually offers that much of a performance benefits. It's still a dual-channel LPDDR4X (with 4 16-bit subchannels per chip), and it's bandwidth as well as latency is not any different from a Tiger Lake system (Anandtech did a detailed review of both systems). My guess is that on-package RAM is more of a power+cost-saving optimisation at this point.
This is true.
I would be surprised if Apple went with a very wide DIMM-based memory system for the higher end desktops, even though it would be technically possible. I would find it preferable (and more in line with Apples stated goals) to compromise a bit on maximum RAM (since the systems would have high bandwidth SSDs connected to PCIe-4 or better), and prioritise high RAM bandwidth instead. The HBM route, basically.
The less exciting, more constrained, but a bit cheaper alternative would be to use a wide path to LPDDR5, which would actually make a lot of sense for anything but the highest performance tier. After all, a 256-bit interface to LPDDR5 would increase bandwidth a factor of three over the M1 which would go a long way. It would still be limited compared to the dedicated GPUs however, but I'm not sure how invested Apple is in competing with high-end GPUs directly.

Not having to pay Apple prices for RAM would be nice. But if we have to do that, I sure would appreciate if we at least got our moneys worth in terms of speed.
 
I would be surprised if Apple went with a very wide DIMM-based memory system for the higher end desktops, even though it would be technically possible. I would find it preferable (and more in line with Apples stated goals) to compromise a bit on maximum RAM (since the systems would have high bandwidth SSDs connected to PCIe-4 or better), and prioritise high RAM bandwidth instead. The HBM route, basically.

HBM is a very wide-bus DDR memory. It's the width of the bus that allows for high bandwidths (HBM uses a 1024 or even 2048-bit bus). If you switch 16 DDR5 chips in "parallel", giving each one it's own memory controller, you will get a 1024-bit bus for example with a very healthy bandwidth. If I understand it correctly, the difficulty is actually implementing such a wide interface on a logic board since the physical space is limited. Apple already kind of uses it's own version of the interposer technology (the package). So yeah, I am also skeptical whether slotted DIMMS are going to be realistic. But some sort on-package of DDR stacking with many chips should be doable, even if the package itself ends up being humongous.

Please keep in mind that I am a total noob when talking about memory so I might be getting all of this terribly wrong... but that's how I understand stuff. Apple needs high-bandwidth memory to feed their processors, and the only way to achieve it without sacrificing energy efficiency in my understanding are wide interfaces.

The less exciting, more constrained, but a bit cheaper alternative would be to use a wide path to LPDDR5, which would actually make a lot of sense for anything but the highest performance tier. After all, a 256-bit interface to LPDDR5 would increase bandwidth a factor of three over the M1 which would go a long way.

Yeah, that's what I would expect them to do on laptops. I would bet that M1X will use four LPDDR4X (or even LPDDR5) RAM modules instead of two, with a 256-bit bus.

It would still be limited compared to the dedicated GPUs however, but I'm not sure how invested Apple is in competing with high-end GPUs directly.

Apple GPUs need less bandwidth for graphics, but they still need bandwidth for compute work. If Apple is serious about replacing all the Vega Duos in the MP with their own solution, they will need really fast memory.

Not having to pay Apple prices for RAM would be nice. But if we have to do that, I sure would appreciate if we at least got our moneys worth in terms of speed.

I agree. Actually, if memory serves me right I was predicting that Apple will eventually use HBM (or something similar) as their main RAM couple of years ago. Using custom hardware finally opens up that path for them and their CPUs seem to be designed to take full advantage of high memory bandwidth, unlike the competition.
 
The initial hypothesis of the thread was that the small Mac Pro might use an ITX form factor. Which would fit the size indicated (less than half of the vanilla Pro, whatever that would be), but I wonder if the concept would be the same as the typical ITX Windows build.

There have also been rumours about a "Mac Mini Pro", and I wonder if that machine and the "Mac Pro Mini" might not be one and the same. If so, we might be talking about a machine that's more a beefy Mini than a cut down Pro. Basically, something without PCIe-slot expandability, and the larger-than-Mini form factor allowing good quiet cooling. Think an Xbox sx, but cut down to a cube and cooling below 100W.

It will be interesting to see, regardless. But as you, judging from what was said at WWDC, I doubt that it would allow plugging in dGPUs. In which case, an ITX form factor is a bit pointless.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.