Explore the new system architecture of Apple Silicon Macs

johngwheeler · Sep 25, 2020

rafark said:
This may sound super funny to the people with knowledge, but if they increase the size of the chip they should get more power, right? Like make it 3 times bigger than an iPhone A chip and get 3x the performance of the smaller chip?

There are diminishing returns to increasing chip size, and you still need to deal with the heat they generate (made worse by high transistor density with smaller fabrication sizes).

With more area you can:

1) Add more CPU cores - useful for some apps for not all
2) Add more GPU cores - same as above - useful, but not for every task.
3) Add more on-chip memory / cache - has a beneficial effect for most apps.
4) Add more custom features (e.g. Machine Learning, image processing etc. ) - useful for specific apps.

Performance increases will depend on the application, and probably won't be linear.

deconstruct60 · Sep 25, 2020

Realityck said:
I still haven't seen just how many cores are needed in Apple Silicon to approximate the same GPU experience of the Radeon Pro 5700 XT in the high end iMac?

Is Apple even trying to cover the mid-high end discrete , desktop GPU space? It really doesn't make much sense. There are two major paths to covering that area. One is very straightforwardly add x16 PCI-e v4 controller to the SoC. Don't have to fabricate any massive die whatsoever. That PCI-e controller is a small fraction of that size.
The second path is creating a GPU die that is larger then your whole SoC. So twice about much work as designing your SoC was.

So what is the pay off. Half of the Mac line up doesn't have any dGPU at all. So lots of work for zero Macs augmented there. The MBP 16" , iMac 21-24" are substantially smaller dGPUs. If Apple managed to squeeze all of the 3rd party GPU options out of the Mac laptops that would be a huge volume reduction of component parts acquistion. How much of an impact it would be in the iMac space depends on the ratio of the 21-24" vs the 27" models is. Even if the smaller iMac is 30% of iMac sales that still would be a sizable chunk of money shifted to Apple from the 3rd party GPU component spend. ( The biggest revenue (and unit volume ) hit here is Intel which is loosing out on all of the iGPUs which are the large bulk of all Mac sales. )

What Apple needs is something in the range of a AMD 555X up to the Vega 20 ( the small iMac space. That would cover the MBP 16" also.) . the Pro Vega 20 has 20 CUs ( 1280 ) shares. The 555X has 12CU ( 768 shader cores )

I suspect Apple's "big" GPU is more so in the 555X zone. First, if doggedly sticking to what Apple has on there WWDC 2020 slides often mentions the beniftis of their "Unified Memory". All the current Metal drivers for Apple GPU are written with unified memory thoroughly implicit in the code. To do a discrete GPU with its own non unified VRAM there would have to be a substantively different Metal family for that for Apple GPU. That is missing just as much in the chart as AMD/Intel/Nvidia were missing on the Apple Silicon side. It isn't permanent, but it also likely isn't short term either. ( WWDC 2021 may bring changes on that front just as it probably does for the 3rd party GPU. on the Apple Silicon side. ). Apple probably wants developers to optimize to the maximum extent for Apple's new iGPU solution as possible before adding any distractions. ( also makes running the iPhone apps in 'emulation' mode much easier too. )

The GPUs for the top end of the iMac 27" build to order options , iMac Pro , and Mac Pro are probably collectively relatively low volume with respective to the rest of the Mac line up sold per quarter (and per year). Almost certainly the iMac Pro and Mac Pro are. There is about zero need for Apple to chase that small space if there are viable 3rd party options available to work with. Also gets farther and farther away from the "Unified Memory" model that is the primary basis for their A-series efforts.

I can see Apple doing a roughtly 555X sized chiplet that still used the main RAM as primary store but perhaps a "largish" HBM cache (only) to hold framebuffers and larger textures. That would take some of the bandwidth pressure off the main RAM controllers but still not have to change the Metal family model much.

Their full size card says 2560 cores. I know the iMac is using the mobile edition, still it makes one question the simplicity of graphic related cores in iPad Pro's ARM compared to high end iMacs processor and GPU. I am not being negative here, just questioning what it takes to be equivalent? 🙂

It isn't just a matter of "cut and pasting" more cores. At the 2000 core level going to be extremely hard to still have "Unified Memory". GPU will probably at that point need its own primary memory store. Which means a GDDR6 or HBM full memory controllers. An flexible interconnect to the CPU SoC package. etc.

The Apple GPU cores are simpler because they just don't have some features. FP64. ( not a big ratio ... just not there. ). No memory controller. In the iPad SoC No way to drive 4-6 monitors. etc.

But are Apple’s ARM chips actually powerful enough now to replace the likes of Intel and AMD? That’s still an open question — because at Apple’s 2020 Worldwide Developers Conference (WWDC), the company shied away from giving us any definitive answers.

In the laptop space? That has pretty much been demonstrated. Up through the MBP 16" space it is more a matter of Apple just executing a straightfoward evolution. That pretty much would cover the Mac Mini also ( although Mini has more I/O (more ports) demands and some more lattitude on thermals. ) .

Top end Mac Pro would probably be a large departure from what Apple has done to date. Wouldn't be surprising at all though that it too had some sort of iGPU build into the die . ( or at least some chiplet that presented very similar to a iGPU at the driver level. ) . That would be the lowest common denominator GPU. There would be MPX slots for those that needed a "big" GPU.
[automerge]1601094619[/automerge]

deconstruct60 · Sep 25, 2020

johngwheeler said:
There is a rumor of a dGPU code-named "Lifuka". Whether this is a separate package or an optional GPU chipset that is part of the Apple Silicon SoC remains to be seen. I would expect any future ASi Mac Pro to have separate PCIe boards with some kind of dGPU or accelerator like the Afterburner.

Lifuka is an island that is part of Tonga. The rumors on the Internet labeled the A14/A14X class SoCs as Tonga. It is not clear at all that Lifuka is a dGPU or just another variant of the smaller SoC ( A14X) with a much larger GPU attached. Add some more memory channels/controllers and much bigger caches and it all still could be an Apple iGPU.

By "iMac" also they could be just talking about the zone that the iMac 21-24" plays in. Not the top end of the 27" GPU options.
[automerge]1601094898[/automerge]

deconstruct60 · Sep 25, 2020

leman said:
It's most likely an IP that can be packaged in multiple silicon formats. For higher-end applications (such as the Mac Pro) I expect the to use a NUMA hierarchy with multiple CPU+GPU boards and fast interconnect. Really curious to see how they solve it anyway.

macOS generally doesn't deal that well with NUMA. There is some modest, legacy support there but it isn't optimized in a relatively significant way ( compared to Linux or Windows Server or Solaris ). That is one of the primarily reasons why Apple goes on and on about how great their Unified Memory SoC solution are because that is basically the opposite of NUMA. That is where Apple talks the most about going to. (at least over the immediate term. )

How will Mac Pro will talk to GPU boards ? PCI-e v4 ( or perhaps v5 if it takes the maximal amount of time to arrive) . Putting aside the Mac Pro 2013, the Mac Pro is primarily defined around having PCI-e slots. There was no sense to go back to an "overkill" in that aspect ( 8 slots up from 4 ) if Apple Silicon was going to turn right around and go in the opposite direction.

The Mac Pro probably will have an iGPU ( or at least an embedded GPU on package) in the SoC. The other new thing Apple is keen on is running iOS / iPad OS apps on the new Macs. That's going to be substantively easier if there is an Apple GPU that looks like a iPad/iOs there when the apps run ( since that is what the programmers has written the code to. )

Given how much more power-efficient Apple cores are on lower frequencies, outperforming the Xeons is only the question of core packaging and interconnect (neither of which are trivial of course). But assuming Apple can solve the practical issues —

Lots more cores and no corresponding robust I/O infrastructure to go along with them is only going to be Xeon on largely "cache only" benchmarks.

and they have enough talent and money to do that —

The money isn't infinite. Spending a large chunk of the Scrooge McDuck money pit extremely likley isn't going happen. Mac Pro's don't make that much money for Apple. As talent they can only spread their talent so thin. There is a finite number of chip designers they have and several more SoCs to do that are more strategic.

An Apple workstation CPU with 32 cores — assuming Apple can make one — would be far beyond Intel's ability.
[/QUOTE

Errrr. For a desktop, which is plugged into the wall, it won't be "far beyond" . Apple has skipped around lots of issues that will kick in with Amdalh's law when they move to larger problems and far more robust I/O demands.
AMD and Intel's 32+ CPUs are tuned to different workloads than Apple has chased up to date. Apple will probably do something that is competitive for the most prominent and common workloads for the classic Mac Pro , but that isn't really a huge overlap with what AMD/Intel are doing.

[automerge]1601097239[/automerge]

leman · Sep 26, 2020

johngwheeler said:
2) Add more GPU cores - same as above - useful, but not for every task.

Increasing the amount of GPU cores is the most straightforward way of increasing GPU performance, especially when we are talking about GPUs as small as the A series. As GPUs are massively parallel processors, increasing the number of individual processors always works (assuming the memory subsystem can cope with it).

deconstruct60 said:
macOS generally doesn't deal that well with NUMA.

How would you even know? Apple never used NUMA architecture in their computers. At any rate, these are things that can be tweaked in.

deconstruct60 said:
That is one of the primarily reasons why Apple goes on and on about how great their Unified Memory SoC solution are because that is basically the opposite of NUMA. That is where Apple talks the most about going to. (at least over the immediate term. )

Frankly, I don't see any way for them to support unified memory on a Mac Pro class computer without adopting some variant of NUMA. Making a SoC powerful enough is not feasible (at least not on higher performance levels). Maintaining the modular design while also keeping unified memory is not feasible. This is why I believe we will see modules that contain CPU + GPU clusters, with shared on-board memory. These devices will be organized into "local groups" of some sort and you will have APIs to discover such groups and schedule work on them. Transferring data between groups is going to be slower.

Apple already has elements of such API with their Metal Peer Groups for multi-GPU systems.

deconstruct60 said:
How will Mac Pro will talk to GPU boards ?

Why do you think Apple developed MPX modules in the first place? They are free to implement whatever interface they see fit. Also check out how they implement Infinity Link connections — future Apple Silicon Mac Pro could use something similar to provide fast data transfer between modules.

deconstruct60 said:
Lots more cores and no corresponding robust I/O infrastructure to go along with them is only going to be Xeon on largely "cache only" benchmarks.

Of course they would need more robust I/O. Why do you assume that they won't have it? This is not something that was important for the mobile SoC, but it is essential with the Macs. We already pretty much know that Apple is working on a custom I/O controller that provides better memory isolation to connected devices (it's in the video linked by the OP), and we know that Apple Silicon Macs will support high-performance external devices such as SCSI controllers.

deconstruct60 said:
I suspect Apple's "big" GPU is more so in the 555X zone. First, if doggedly sticking to what Apple has on there WWDC 2020 slides often mentions the beniftis of their "Unified Memory". All the current Metal drivers for Apple GPU are written with unified memory thoroughly implicit in the code. To do a discrete GPU with its own non unified VRAM there would have to be a substantively different Metal family for that for Apple GPU.

[...]

It isn't just a matter of "cut and pasting" more cores. At the 2000 core level going to be extremely hard to still have "Unified Memory". GPU will probably at that point need its own primary memory store. Which means a GDDR6 or HBM full memory controllers. An flexible interconnect to the CPU SoC package. etc.

You don't need dedicated GPU memory to have a high-performance GPU. As Apple's graphics guy Gokhan Avkarogullari says "bandwidth is the function of ALU power". You need a certain amount of bandwidth to increase your GPU performance. And there are multiple ways of getting there. You can use faster system RAM, more caches, memory compression, etc. LPDDR5 already offers over 100Gb/s of bandwidth — this is competitive with GDDR5. For higher-end applications, Apple will need to use something faster or utilize more channels. Since they control the platform, there are many paths they can take here.

And let's not forget that Apple GPUs need significantly less bandwidth to provide the same level of performance.

deconstruct60 said:
The Apple GPU cores are simpler because they just don't have some features. FP64. ( not a big ratio ... just not there. ). No memory controller. In the iPad SoC No way to drive 4-6 monitors. etc.

Why pay for features that don't make any sense? No current GPU has good support for FP64 anyway, not that it's needed. If you need more precision — roll your own extended precision data structure. It's still going to be faster than native FP64 on most hardware and you can optimize it for your need. Memory controllers — why have a separate one on the GPU if you could have a multi-channel one on the SoC level? Things like multi-monitor support can be easily added depending on the platform needs.

deconstruct60 said:
The money isn't infinite. Spending a large chunk of the Scrooge McDuck money pit extremely likley isn't going happen. Mac Pro's don't make that much money for Apple. As talent they can only spread their talent so thin. There is a finite number of chip designers they have and several more SoCs to do that are more strategic.

This is all very true, but the tasks they have to solve are overlapping. A scalable interconnect technology for example could be applied to all levels of Apple products — from the iPhone to the Mac Pro. Current A series chips are already quite complex when you consider all the cache hierarchies and the interplay of different components. Similarly, they only need one scalable GPU — with the ability to make it "bigger" for higher end applications. My point: they don't have to design a completely different system. They can reuse their IP across the board, scaling number of processing clusters and adding or removing hardware components as needed.

EntropyQ3 · Sep 26, 2020

There is no question that Apple can provide better graphics performance than the 5700xt in their systems, because the upcoming consoles will do exactly that with SoCs built on a less dense process and at a $400-500 price point with a 1TB fast SSD.

The question is if Apple wants to.

I’d argue that they should look at the cost of their intel CPU and AMD GPU and provide something better. Delivering the same thing (or less) than they already provide, only later when the x86 market has moved on, isn’t going to have customers applauding even if the new machines run iPhone apps natively. Apple needs to impress, and they only have have one shot at making a first impression with computer buyers. That impression had better not be "meh".

Joelist · Sep 26, 2020

Remember that at the CPU level they already outperform both Intel and AMD on a per core basis. Hence in the CPU world they can (if they wish) just ramp up the core count, scaling up the cache appropriately. Currently A14 has 2 big and 4 little cores in the CPU blocks of the SOC - so maybe in their Mac family they have 8 big and 6-8 little cores instead.

GPU is a different animal, and to be truthful we don't know as much about the GPU architecture on A14 as we do the CPU piece. Remember, however, that iPad Pro which is rocking the A12Z already is faster and more capable on graphics tasks than the majority of laptops out there. So since their Apple Silicon start point is the laptop line maybe they will upscale this solution too.

ChrisA · Sep 26, 2020

guzhogi said:
I'm curious to see the future of Mac Pros if this is the case? Apple introduced MPX slots with the latest Mac Pro. I wonder how that will transition to Apple Silicon? Also, I wonder if Apple will create something like Nvidia's NVLink? I don't see it as that's probably too niche, but who knows?

Maybe there is no furure? How much money does Apple make from Mac Pro sales? I'm serious. Look what they did with Aperture. They abandoned the entire profesional photography industry and left it to Adobe. Apple had an inovative product and then Adobe picked up on that idea, Apple did not invest in keeping up with Adobe's "lightroom" and then abandoned the market.

Apple might do the same with the video editing market which is really the only use for the Mac Pro. They could drop Final Cut Pro and leave the market to Adobe, again.

Doing this would save Apple the large cost of trying to develop an ARM based Mac that outperforms a 12-core Xeon. Would they every recover that investment? Maybe better to drop it.

If Aperture is anything to go by, Apple will promote the MP and FCPX agressivily untill the day before they anounce they will drop it.

So it just might be that there is never an ARM-based Mac Pro.

theorist9 · Sep 26, 2020

deconstruct60 said:
macOS can’t deal with more than 64 cores .

Interesting. Do you have a reference for that? And is that a limit on the number of cores per CPU, or the total number of cores?

As an interesting historical note, MacOS Server did support at least 2200 cores:

System X - Wikipedia

en.wikipedia.org

This was the third-fastest supercomputer in the world at the time it was introduced, and cost far less than other supercomputers ($5.2M) because of the use of off-the-shelf parts.

I myself used a Mac-based cluster in my research, though it was a smaller one, consisting of 128 Xserve G5's.

guzhogi · Sep 26, 2020

ChrisA said:
Maybe there is no furure? How much money does Apple make from Mac Pro sales? I'm serious. Look what they did with Aperture. They abandoned the entire profesional photography industry and left it to Adobe. Apple had an inovative product and then Adobe picked up on that idea, Apple did not invest in keeping up with Adobe's "lightroom" and then abandoned the market.

Apple might do the same with the video editing market which is really the only use for the Mac Pro. They could drop Final Cut Pro and leave the market to Adobe, again.

Doing this would save Apple the large cost of trying to develop an ARM based Mac that outperforms a 12-core Xeon. Would they every recover that investment? Maybe better to drop it.

If Aperture is anything to go by, Apple will promote the MP and FCPX agressivily untill the day before they anounce they will drop it.

So it just might be that there is never an ARM-based Mac Pro.

That's what I think, too. I have to wonder if/when Apple will port Xcode to iPad? If they do, I wonder what will happen to their Mac products? If any company has the resources to make multiple product lines, it's Apple.

Unregistered 4U · Sep 26, 2020

mj_ said:
dedicated GPUs will meet the same fate that serial controller cards, dedicated storage controllers, and network controllers met many years ago.

AND sound cards

ChrisA said:
Doing this would save Apple the large cost of trying to develop an ARM based Mac that outperforms a 12-core Xeon.

Outperforming a 12-core Xeon for running macOS and macOS applications is likely not a huge feat. Even if you’re looking at clock for clock, any Intel processor still has several decoders that takes time to do their work. Having no such complex decoder in an Apple Silicon processor (plus the fact that the chip will be custom built to execute macOS and macOS app code) means running FCPX or Logic Pro X on the first professional Apple Silicon system should meet or exceed the fastest Mac Pro when it ships.

Cross-platform apps not specifically coded for Apple Silicon? It’s likely those will still be faster on some Intel solution for a little while.

Jorbanead · Sep 26, 2020

ChrisA said:
it just might be that there is never an ARM-based Mac Pro.

Why would Apple invest so much R&D into the new Mac Pro design, including MPX modules, and create a pro apps team within the last 2-3 years? Why would they stress at WWDC the power of FCPX and Logic Pro X and other pro apps on Apple silicon? FCPX is not the only reason for the MP.

There was a period where it felt like apple gave up on the pro community. I felt it deeply and bought a PC because of it. But the last two years or so have really changed my perspective and it feels like they have a newfound commitment to the pro community. Apple already had their chance to abandon the pro market a few years ago, but the last few years don’t show that anymore.

thingstoponder · Sep 27, 2020

leman said:
That has been official since June... I mean, did people only now start looking at the documentation that has been available for months?

It’s not official. Clickbait YouTubers just saw that video and ran with it.

There’s no way dGPUs won’t work with Apple Silicon Macs and there’s no way at the very least the Mac Pro won’t use dGPUs.

rafark said:
This may sound super funny to the people with knowledge, but if they increase the size of the chip they should get more power, right? Like make it 3 times bigger than an iPhone A chip and get 3x the performance of the smaller chip?

More or less, yes. Although it depends on what metric you’re measuring. Making a GPU bigger basically adds linear performance gains because GPUs tasks are massively parallel. Adding more CPU cores will also scale pretty linearly in multi core performance for benchmarks and tasks like video encoding. General use cases that utilize one core will not see linear performance gains. In terms of single core speed you will not see much more performance other than clock speed increases which might add about 50 percent, who knows, but not 3x. The actual core size and design will not change as the iPhone cores are already “desktop” class and very large as is. You run into diminishing returns making CPU cores bigger arbitrarily, they start to get slower at a certain point.

leman · Sep 27, 2020

thingstoponder said:
There’s no way dGPUs won’t work with Apple Silicon Macs and there’s no way at the very least the Mac Pro won’t use dGPUs.

It doesn't seem like Big Sur contains drivers for non-Apple GPUs. And Apple's rhetoric so far strongly suggests that they are not planning to use GPUs that have their own dedicated memory. Shared memory architecture will make a lot of sense for the Mac Pro, it's just unclear how the are going to deliver it. No, I think that days of dGPUs — at least in the traditional sense of the word — on Apple platforms are counted. GPUs will be just one of the many asymmetrical processors that share the same memory hierarchy.

leman · Sep 27, 2020

ChrisA said:
Maybe there is no furure? How much money does Apple make from Mac Pro sales?

Mac Pro is not something to make money off. Mac Pro is a symbol. It does't matter how many people actually use a Mac Pro — it is there to reinforce the "Apple is for Pros" brand. It means prestige. Abandon that and you are feeding the "Macs are just gadgets for hipsters with too much disposable income". It's all a psychological thing. I think Apple realizes this very much, otherwise they wouldn't go though the trouble of building the new pro hardware. At any rate, they have more then enough money to both subsidize the Mac Pro development and use approaches that other companies might consider not economically viable, and they have scalable CPU, GPU and cache technology (their memory level parallelism in particular is off the charts) to build powerful chips. The last bit of the puzzle they need is a fast interconnect technology, and given the fact that Apple has even aggressively hiring interconnect engineers for a while, it's almost certain that an Apple Silicon Mac Pro is in the pipeline.

thingstoponder · Sep 27, 2020

leman said:
It doesn't seem like Big Sur contains drivers for non-Apple GPUs. And Apple's rhetoric so far strongly suggests that they are not planning to use GPUs that have their own dedicated memory. Shared memory architecture will make a lot of sense for the Mac Pro, it's just unclear how the are going to deliver it. No, I think that days of dGPUs — at least in the traditional sense of the word — on Apple platforms are counted. GPUs will be just one of the many asymmetrical processors that share the same memory hierarchy.

Well Big Sur supports Intel so it definitely supports dGPUs.

You can go into an Apple store today and buy an eGPU. In the near future you will also be able to by an Apple Silicon Mac that Apple has said will support thunderbolt. You would have a situation where you could buy a dGPU that would work on every Mac except the Apple Silicon one, which the average consumer wouldn’t even know is different from an Intel one. It would be a disaster.

I just don’t think a shared memory architecture for the built in GPU precludes an option for external GPUs for accelerating certain tasks. We have drivers in macOS betas for future AMD graphics. The Mac Pro was just redesigned to support many GPUs.

I just don’t see it, but we’ll see. It’s interesting times for technology.

johngwheeler · Sep 27, 2020

leman said:
It doesn't seem like Big Sur contains drivers for non-Apple GPUs. And Apple's rhetoric so far strongly suggests that they are not planning to use GPUs that have their own dedicated memory. Shared memory architecture will make a lot of sense for the Mac Pro, it's just unclear how the are going to deliver it. No, I think that days of dGPUs — at least in the traditional sense of the word — on Apple platforms are counted. GPUs will be just one of the many asymmetrical processors that share the same memory hierarchy.

Big Sur contains drivers for non-Apple GPUs because it will run on existing Intel-based Macs with AMD GPUs.

Unless you are referring to an ARM-only release of MacOS? i.e. from the DTK?

Will MacOS itself be a fat-binary with both Intel and ARM binaries?

As regards the future of dGPUs on Apple Silicon....it's hard to tell. Apple Silicon will almost certainly support PCIe (for storage if not GPUs), so there is nothing inherently stopping the use of an Apple PCIe dGPU.

The recent game-console releases have demonstrated that quite powerful GPUs can be integrated into SoCs, so this may be the direction that Apple takes. I'm not sure if these are the same silicon die, or as separate "chiplets" on the SoC package.

fokmik · Sep 27, 2020

Again for those who say no dGPU...what they mean? no AMD or nvidia dgpu? because its clear that Apple is making an custom GPU (you can call it dgpu in these days) just for their own macs...especially for the bigger macs, like the 16" macbook pro, mac pro , probably the big imac also
So, yes, we will have everything under A series, including igpu, but for those bigger macs that needs a lot more raw graphcs power, apple is developing their own custom made gpu that will not be integrated in their A series chip

leman · Sep 27, 2020

thingstoponder said:
Well Big Sur supports Intel so it definitely supports dGPUs.

johngwheeler said:
Big Sur contains drivers for non-Apple GPUs because it will run on existing Intel-based Macs with AMD GPUs.

Apologies for being unclear. I meant Big Sur for Apple Silicon. At least the version shipped with the DTK doesn't seem to contain any AMD drivers.

thingstoponder said:
You can go into an Apple store today and buy an eGPU. In the near future you will also be able to by an Apple Silicon Mac that Apple has said will support thunderbolt. You would have a situation where you could buy a dGPU that would work on every Mac except the Apple Silicon one, which the average consumer wouldn’t even know is different from an Intel one. It would be a disaster.

I agree with you. It would be a weird experience. At the same time, non-Apple GPUs are a big challenge for Apple Silicon Macs (see below).

thingstoponder said:
I just don’t think a shared memory architecture for the built in GPU precludes an option for external GPUs for accelerating certain tasks. We have drivers in macOS betas for future AMD graphics. The Mac Pro was just redesigned to support many GPUs.

All very true, but there is a slight problem. One of the main benefits of Apple Silicon Macs is a unified GPU capability and programming model. You can program a GPU pipeline in a specific way and expect it to run with known performance characteristics on any Apple device. Non-Apple GPUs make this more tricky as they complicate memory management and don't support many of the TBDR-specific features of Apple family.

Of course, as a developer you need to make sure that your software runs with any GPU — at least for the forceable future, where both Intel-based and ARM-based Macs exist. Still, allowing non-Apple dGPUs on Apple Silicon Macs might make software development quite awkward, at least for graphics.

Using a large eGPU as a pure headless number-cruncher? Yes, that I can definitely see.

johngwheeler said:
I'm not sure if these are the same silicon die, or as separate "chiplets" on the SoC package.

The specific layout doesn't matter much, what matters is the topology. The CPU and GPU don't have to be loaded at the same physical chip. The question is where they are located in respect to the memory hierarchy. Which brings me to a reply to @fokmilk below

fokmik said:
Again for those who say no dGPU...what they mean? no AMD or nvidia dgpu? because its clear that Apple is making an custom GPU (you can call it dgpu in these days) just for their own macs...especially for the bigger macs, like the 16" macbook pro, mac pro , probably the big imac also

The issue is that everyone has a different idea what "dGPU" means. I have the impression that many forum users use "dGPU" as an alias for "fast GPU" and "iGPU" as "slow GPU".

But these things have a very clear definition in the GPU industry. A dedicated GPU is a GPU that has it's own (usually high-bandwidth-optimized) memory and communicates with the rest of the system via a (slow) dedicated bus. It is a logically separate device that needs to copy data from the main system in order to work on it. An integrated GPU is a GPU that does not have its own memory and has to share memory bandwidth with the main system. It does't matter at all where these components are located, they can be on one chip, on separate chips or spread around on multiple chips. Integrated GPUs are usually integrated into the same chip where the last level cache and the memory controller is located, since that makes the most sense performance wise.

Apple has been very loud with their message about Apple Silicon Macs featuring unified memory. This means that CPU and GPU share the same physical memory. This means that the GPU, no matter how it is implemented physically, won't be a dGPU in the traditional sense. Personally, I expect chipset designs with complex cache hierarchy, not unlike where AMD is going with their professional platform. I also expect Apple to aggressively use high-bandwidth memory in their designs — they are one of the few companies that can afford it.

fokmik · Sep 27, 2020

Agree leman..but even if the gpu share the memory, still Apple will have so called "igpu" macs, and so called separate gpu from their soc called "apple custom gpu"
But you , maybe tell them not on everyone taste/language...i tried to tell them with "todays naming" so that all will understand what we will get
I will be very interesting to see the W and tdp on those custom made gpu
I already saw what A12Z does and consume under MacOS big sur on "my for now" dev kit mac mini...so an igpu

thingstoponder · Sep 27, 2020

fokmik said:
Agree leman..but even if the gpu share the memory, still Apple will have so called "igpu" macs, and so called separate gpu from their soc called "apple custom gpu"
But you , maybe tell them not on everyone taste/language...i tried to tell them with "todays naming" so that all will understand what we will get
I will be very interesting to see the W and tdp on those custom made gpu
I already saw what A12Z does and consume under MacOS big sur on "my for now" dev kit mac mini...so an igpu

Does the a12Z consume any more power on the DTK than an iPad Pro?

leman said:
Apologies for being unclear. I meant Big Sur for Apple Silicon. At least the version shipped with the DTK doesn't seem to contain any AMD drivers.

I agree with you. It would be a weird experience. At the same time, non-Apple GPUs are a big challenge for Apple Silicon Macs (see below).

All very true, but there is a slight problem. One of the main benefits of Apple Silicon Macs is a unified GPU capability and programming model. You can program a GPU pipeline in a specific way and expect it to run with known performance characteristics on any Apple device. Non-Apple GPUs make this more tricky as they complicate memory management and don't support many of the TBDR-specific features of Apple family.

Of course, as a developer you need to make sure that your software runs with any GPU — at least for the forceable future, where both Intel-based and ARM-based Macs exist. Still, allowing non-Apple dGPUs on Apple Silicon Macs might make software development quite awkward, at least for graphics.

Using a large eGPU as a pure headless number-cruncher? Yes, that I can definitely see.

The specific layout doesn't matter much, what matters is the topology. The CPU and GPU don't have to be loaded at the same physical chip. The question is where they are located in respect to the memory hierarchy. Which brings me to a reply to @fokmilk below

The issue is that everyone has a different idea what "dGPU" means. I have the impression that many forum users use "dGPU" as an alias for "fast GPU" and "iGPU" as "slow GPU".

But these things have a very clear definition in the GPU industry. A dedicated GPU is a GPU that has it's own (usually high-bandwidth-optimized) memory and communicates with the rest of the system via a (slow) dedicated bus. It is a logically separate device that needs to copy data from the main system in order to work on it. An integrated GPU is a GPU that does not have its own memory and has to share memory bandwidth with the main system. It does't matter at all where these components are located, they can be on one chip, on separate chips or spread around on multiple chips. Integrated GPUs are usually integrated into the same chip where the last level cache and the memory controller is located, since that makes the most sense performance wise.

Apple has been very loud with their message about Apple Silicon Macs featuring unified memory. This means that CPU and GPU share the same physical memory. This means that the GPU, no matter how it is implemented physically, won't be a dGPU in the traditional sense. Personally, I expect chipset designs with complex cache hierarchy, not unlike where AMD is going with their professional platform. I also expect Apple to aggressively use high-bandwidth memory in their designs — they are one of the few companies that can afford it.

Thanks for the reply.

Whatever they do it’s sure to be interesting. I’ve been wondering for months now. They’ll certainly have a lot of budget left over from those thousands of dollar Xeon chips to cook up some exotic solution for the Mac Pro. Things like HBM are seen as too expensive for consumer GPUs but they can do whatever they want really price wise and they’re not in the low margin OEM PC market and aren’t buying off the shelf parts for their processors. AMD and Nvidia are both rumored for use chiplets on next Gen GPUs so we’ll see.

leman · Sep 27, 2020

fokmik said:
Agree leman..but even if the gpu share the memory, still Apple will have so called "igpu" macs, and so called separate gpu from their soc called "apple custom gpu"

No, no, I get what you are saying. I just think we should stop using that terminology. iGPU, dGPU, doesn't matter, what matters is performance and features. People tend to get suck in their labels instead of looking at the technology in detail.

I am quite sure what Apple will do with Appel Silicon is offer different performance levels. Just like with A12 vs A12X. Same IP, but more cores, more cache, faster RAM... stuff like that.

thingstoponder said:
Things like HBM are seen as too expensive for consumer GPUs but they can do whatever they want really price wise and they’re not in the low margin OEM PC market and aren’t buying off the shelf parts for their processors.

Exactly my point. Someone like Intel or AMD that have to sell their CPUs to OEMs need to cater to lowest common denominator. But Apple now controls their entire hardware, top to bottom, and they can afford extravaganza. Not to mention that being fancy will likely help their brand. Paying $3k for a laptop with a soldered-on RAM leaves a bitter taste in the mouth of an average customer. But what if that RAM is five times faster than any competitor's?

deconstruct60 · Sep 27, 2020

leman said:
....

The issue is that everyone has a different idea what "dGPU" means. I have the impression that many forum users use "dGPU" as an alias for "fast GPU" and "iGPU" as "slow GPU".
....

"slow GPU" would be sGPU. How does an 'i' a short from of the word 'slow'? Same for 'f' and 'd' which are different.

This is largely lots of nonsense. dGPU and iGPU are abbreviations for 'discrete GPU' and 'integrated GPU'. That some people have mapped dubious extra connotations to those two implementation approaches does NOT mean the meanings are fuzzy. You are trying to throw the baby out with the bath water. The only thing necessarily is to throw out the poopy bath water connotations. Not the words themselves. They are more than abundantly clear if stop taking typing shortcuts and putting "definitions" out of thin air.

Integrated means just that integrated. If the GPU is in the CPU die then it is integrate. If CPU and GPU shared the exact same pool of system RAM resources then again integrated (not separate or segregated). Discrete means seperated from. So not on the die and not sharing the same resources. Historically it highly likely also means that it can be replaced also. Apple has a high tendency to use dGPUs are embedded processors ( meaning soldered and non replaceable on the logic board with the CPU). But the embedded solutions still have a seperate primary memory store resources (VRAM).

Apple is quite clear about what they mean by Unified memory also.

"... All iOS and tvOS devices have a unified memory model in which the CPU and the GPU share system memory. However, CPU and GPU access to that memory depends on the chosen storage mode for your resources. The MTLStorageModeShared mode defines system memory accessible to both the CPU and the GPU, whereas the MTLStorageModePrivate mode defines system memory accessible only to the GPU. ...
"

Choosing a Resource Storage Mode for Apple GPUs | Apple Developer Documentation

Select an appropriate storage mode for your textures and buffers on Apple GPUs.

developer.apple.com

Sharing a single system memory pool is a dual edge sword. It has upsides in perhaps cutting down on the number of data copying actions a developer needs to do. The other upside is that is "cheap" both in space costs ( only one set of RAM packages ) and materials costs (typically buying less RAM system wide.). It has downsides in that the CPU (and other processors) have to share bandwidth for the GPU. Pragmatically that puts a limit on parallism and concurrency ( too many consumers of memory bandwdith and not enough producers then will run into Amdahl's Law effects. ). If make a copy then it is not necessarily a cache coherence problem (and can get rid of that overhead also) .

the load versus compute time ratio has input as to which side of the dual edges mostly end up on. If the load time is short and compute time high then making a copy probably is going to be pay quite well on embarrassingly parallel workloads. If the load time is high ( or both the CPU and GPU have to claim lots of cache coherence 'locks'/exchanges ) and the compute time is relatively shorter then not.

The folks who have succumbed to the "sweeping generalization" cognitive bias (e.g., "I once used a Intel iGPU at some point so all iGPUs have to be slow") are highly likely to do the exact same thing again if you change the name. Switch to uGPU ( unified memory GPU) and nuGPU ( non unified memory GPU) as so as they are exposed to gaps between the two then the same thing will re-occur. Changing names doesn't address the root cause issue so it is extremely likely not going to go away. You might get a temporary reprieve while the number of sweeping inferences take a while to reform, but they are still going to be sweeping generalizations.
[automerge]1601236038[/automerge]

deconstruct60 · Sep 27, 2020

theorist9 said:
Interesting. Do you have a reference for that? And is that a limit on the number of cores per CPU, or the total number of cores?

Total number of cores in the macOS instance is running on. macOS uses a 64-bit word to keep track of the processors. There is one place in the bitmap per core for a maximum of 64. To get past 64 cores you would have to change the kerneral scheduler and process context data structure. It is a very substantive refactoring of the kerneral. Linux had to do it . Windows 10 Server had to do it. Those are primarily server oriented operating systems so this limit get hit over a decade ago ( or more). MacOS is not a server OS. Even when Apple had a "MacOS Server" product it was the same exact kernel running the same scheduler. The MacOS central kernel is being almost fully shared across iOS, tvOS , iPad OS , watchOS. None of those are going to get anywhere close to cracking the 64 core barrier for decades (if ever. e.g. tvOS ). There isn't even a "MacOS server OS distribution anymore" ( there is a thing called server but it is simply some apps on the same OS. Which really is pragmatically about the same as before; just with a price tag not as widely gapped over the regular version and not much "artifical" market segmentation in branding. ) .

The non Mac Pro and iMac Pro , Mac systems aren't anywhere close to pushing the 32 core limit either (let alone 64). The iMac just got to 10 cores in 2020. Even if it increased by 10 course each decade going forward it would be another 20 years before got close to 30. Really talking about fork the OS scheduler and data structures for 2-4% of the Mac market. The Mac market is 6% of the overall classic form factor PC market. 4% of 6% is vanishingly small for a major software fork. Very highly likely not going to happen. The Mac Pro is going to leave insatiable core count lovers behind. Already has on x86-64. That will just get wider on Apple Silicon. As long as mac SoCs are bowaving on iPad Pro SoCs which are bowaving on iPhone SoCs there is little upside for chasing maximal possible core counts.

As an interesting historical note, MacOS Server did support at least 2200 cores:

System X - Wikipedia

en.wikipedia.org

It did not. If simply read the first paragraph there. Namely:

..."It was originally composed of 1,100 Apple Power Mac G5 computers[2] with dual 2.0 GHz processors .... "

G5 computers. Plural. No mac OS instance spanned past any of those individual computers. Those are dual (2) processor computers. That is 62 cores away from getting to 64; let alone to 100 or 1000 cores.

That was a virtual supercomputer run on a cluster of computers linked with Infiniband. The applications that run on those macOS instances all used something like MPI to make calls into a virtual shared address space between the computers. MacOS isn't directly involved in that at all. It is seperate third party library that isn't part of the OS.

To make the point even more clear, that cluster didn't even run macOS on each node.

".. The system was partitioned into 832 nodes running OS X and 192 nodes running Yellow Dog Linux (YHPC 1.1). ..."

https://arc.vt.edu/system-x-linuxos-x/

[ essentially the same thing as the web archive linked as reference 3 on the Wikipedia.org page. ]

How could MacOS be running on all the cores when some nodes aren't even running macOS ? [ There were some things that Linux could do better. One was run higher end Infiniband interfaces. Also some software porting issues. ] With the cmpute batch job system layered on top they could invoke a TOP500 workload on all of the nodes to qualify as a whole system but macOS didn't solely deliver that during a substantive span of System X's working lifecycle.

This was the third-fastest supercomputer in the world at the time it was introduced, and cost far less than
other supercomputers ($5.2M) because of the use of off-the-shelf parts.

Before it got Xserve nodes and ECC RAM that was far more bluster than usable. By the time ran each compute job twice to account for errors the pragmatic throughput time wasn't quite really 3rd fastest. Ditto for workloads that produced and consumed large data sets (the Inifiband had not quite so super bisection bandwidth) .

Apple walked away from the cluster computer space about a decade ago. They only have a substantive , narrow niche in the specialty virtualization space due to the macOS licensing restrictions on running on top of Apple labeled/built hardware. Apple Silicon has a decent chance of just making that incrementally smaller ( depending upon just how big the hassles and roadblocks they throw in front of VMware ESXi and some of the other hypervisors. )
[automerge]1601240029[/automerge]

leman · Sep 27, 2020

deconstruct60 said:
"slow GPU" would be sGPU. How does an 'i' a short from of the word 'slow'? Same for 'f' and 'd' which are different.

This is largely lots of nonsense. dGPU and iGPU are abbreviations for 'discrete GPU' and 'integrated GPU'. That some people have mapped dubious extra connotations to those two implementation approaches does NOT mean the meanings are fuzzy. You are trying to throw the baby out with the bath water. The only thing necessarily is to throw out the poopy bath water connotations. Not the words themselves. They are more than abundantly clear if stop taking typing shortcuts and putting "definitions" out of thin air.

Integrated means just that integrated. If the GPU is in the CPU die then it is integrate. If CPU and GPU shared the exact same pool of system RAM resources then again integrated (not separate or segregated). Discrete means seperated from. So not on the die and not sharing the same resources. Historically it highly likely also means that it can be replaced also. Apple has a high tendency to use dGPUs are embedded processors ( meaning soldered and non replaceable on the logic board with the CPU). But the embedded solutions still have a seperate primary memory store resources (VRAM).

Apple is quite clear about what they mean by Unified memory also.

"... All iOS and tvOS devices have a unified memory model in which the CPU and the GPU share system memory. However, CPU and GPU access to that memory depends on the chosen storage mode for your resources. The MTLStorageModeShared mode defines system memory accessible to both the CPU and the GPU, whereas the MTLStorageModePrivate mode defines system memory accessible only to the GPU. ...
"

Choosing a Resource Storage Mode for Apple GPUs | Apple Developer Documentation

Select an appropriate storage mode for your textures and buffers on Apple GPUs.

developer.apple.com

Sharing a single system memory pool is a dual edge sword. It has upsides in perhaps cutting down on the number of data copying actions a developer needs to do. The other upside is that is "cheap" both in space costs ( only one set of RAM packages ) and materials costs (typically buying less RAM system wide.). It has downsides in that the CPU (and other processors) have to share bandwidth for the GPU. Pragmatically that puts a limit on parallism and concurrency ( too many consumers of memory bandwdith and not enough producers then will run into Amdahl's Law effects. ). If make a copy then it is not necessarily a cache coherence problem (and can get rid of that overhead also) .

the load versus compute time ratio has input as to which side of the dual edges mostly end up on. If the load time is short and compute time high then making a copy probably is going to be pay quite well on embarrassingly parallel workloads. If the load time is high ( or both the CPU and GPU have to claim lots of cache coherence 'locks'/exchanges ) and the compute time is relatively shorter then not.

The folks who have succumbed to the "sweeping generalization" cognitive bias (e.g., "I once used a Intel iGPU at some point so all iGPUs have to be slow") are highly likely to do the exact same thing again if you change the name. Switch to uGPU ( unified memory GPU) and nuGPU ( non unified memory GPU) as so as they are exposed to gaps between the two then the same thing will re-occur. Changing names doesn't address the root cause issue so it is extremely likely not going to go away. You might get a temporary reprieve while the number of sweeping inferences take a while to reform, but they are still going to be sweeping generalizations.
[automerge]1601236038[/automerge]

Thats why I am saying that we should get rid of these labels altogether and instead focus on performance and implementation characteristics of the processors themselves.

Explore the new system architecture of Apple Silicon Macs

macrumors 6502a

macrumors G5

macrumors G5

macrumors G5

macrumors Core

macrumors 6502a

macrumors 6502

macrumors G5

macrumors 601

macrumors 68040

macrumors G4

macrumors 65816

macrumors 6502a

macrumors Core

macrumors Core

macrumors 6502a

macrumors 6502a

Suspended

macrumors Core

Suspended

macrumors 6502a

macrumors Core

macrumors G5

macrumors G5

macrumors Core

Our Staff