Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

leman

macrumors Core
Original poster
Oct 14, 2008
19,516
19,664
To get laptop-competitive performance from the A14 cores, Apple will want to clock them up as high as they can. Many people don't realize that Apple already pushes the limits quite a bit with the iPhone. I think they will be clocking them to the point where running all 8 Firestorm cores and the (presumably) 8 cores at the iGPU on the same time is just right out. That's why the Icestorm cores are there: they take over for four of the Firestorm cores when the iGPU is on. So I expect them to include a dGPU for pretty much all models down to the 14" in order to prevent the Firestorm cores from getting thermal constrained. Since the heat is now one inch that way it's a lot less likely to turn into a silicon puddle.

That is also something I was thinking about. But the interesting thing is that Apple was stressing over and over again during the WWDC that their new Macs will use SoCs with unified memory between CPU and GPU. They literally repeated a number of times "you don't need to copy the data between CPU and GPU since they can access the same memory".

That said, your point is very valid and I am also wondering how they are going to solve the heat issues with a large chip. It is possible that they might break the CPU and GPU into physically separate chips while retaining them on the same package with shared cache and memory controllers. Then again, if one can transfer heat fast enough, cooling a 80Watt+ SoC should be realistically possible. I mean, there are 200+ watts chips out there and they work...
 

Pressure

macrumors 603
May 30, 2006
5,178
1,544
Denmark
To get laptop-competitive performance from the A14 cores, Apple will want to clock them up as high as they can.

A13 is already notebook competitive with its 6-core arrangement unless we are talking desktop replacement. It obviously suffers in GPU performance compared to the A12X/Z.
 

Krevnik

macrumors 601
Sep 8, 2003
4,101
1,312
There is nothing stopping a SoC from performing like a high-end graphic card.

Except die space. An SoC has to leave space for the CPU cores, and other components. AMD and NVidia can use the space the CPU eats for more CUs. You can still get good performance out of it than Intel, like the Tegra, but it’s not magic. Thermal and space limits still apply.

Going with a bigger die drops the yield, so at some point you will just start to break components off the die to avoid yields getting too low. At which point you are starting to look very similar to either AMD’s chiplet design, or a more standard CPU/dGPU setup.

I’d rather that Apple focus on making good SoCs that can replace the Intel kit first, and make the iGPU perf on devices like the 13” MBP and Mac mini not suck, then figure out if they want to replace AMD entirely.
 

sublunar

macrumors 68020
Jun 23, 2007
2,311
1,680
Apple didn't announce that they will deprecate support for eGPUs in MacOS. So there is no reason to believe that Macs will no longer support AMD GPUs which always have their own dedicated VRAM.

Apple have to support intel macs for the next several years so eGPU support is not going away in macOS. They might not have to support it with ARM although it doesn’t stop Apple from plotting the addition of thunderbolt 3 and eGPU support for ARM either.
 
  • Like
Reactions: miscend

Pressure

macrumors 603
May 30, 2006
5,178
1,544
Denmark
Except die space. An SoC has to leave space for the CPU cores, and other components. AMD and NVidia can use the space the CPU eats for more CUs. You can still get good performance out of it than Intel, like the Tegra, but it’s not magic. Thermal and space limits still apply.

Going with a bigger die drops the yield, so at some point you will just start to break components off the die to avoid yields getting too low. At which point you are starting to look very similar to either AMD’s chiplet design, or a more standard CPU/dGPU setup.

I’d rather that Apple focus on making good SoCs that can replace the Intel kit first, and make the iGPU perf on devices like the 13” MBP and Mac mini not suck, then figure out if they want to replace AMD entirely.

Die space is just a matter of cost and wafer yields. The reticle limit for TSMC 7nm node is ~830mm².

A12X/Z122mm²
A1398mm²
Ryzen 4800H156mm²
Radeon 5700 XT (Navi)251mm²
Radeon VII (Vega20)331mm²
NVidia A100 (Ampere)826mm²
 

Joelist

macrumors 6502
Jan 28, 2014
463
373
Illinois
This will be interesting as to my knowledge no one has ever before custom designed a SOC for a full size computer. This means they have both a bigger thermal overhead and more physical space and power to play with. So for example they could easily create their own version of SLI and put two of their A12Z class GPUs in every laptop and still be good on heat and power.
 
  • Like
Reactions: leman and jdb8167

Boil

macrumors 68040
Oct 23, 2018
3,477
3,172
Stargate Command
Except die space. An SoC has to leave space for the CPU cores, and other components. AMD and NVidia can use the space the CPU eats for more CUs. You can still get good performance out of it than Intel, like the Tegra, but it’s not magic. Thermal and space limits still apply.

Going with a bigger die drops the yield, so at some point you will just start to break components off the die to avoid yields getting too low. At which point you are starting to look very similar to either AMD’s chiplet design, or a more standard CPU/dGPU setup.

I would think Apple would do increasingly larger SoCs depending on the resources needed. Mac Pro line-up would have Threadripper-sized SoCs. Whether monolithic or chiplets, who knows (well, Apple has an idea)?

Apple have to support intel macs for the next several years so eGPU support is not going away in macOS. They might not have to support it with ARM although it doesn’t stop Apple from plotting the addition of thunderbolt 3 and eGPU support for ARM either.

I am going to say Apple does what they do & goes to USB4 & 10Gb Ethernet for I/O.

Die space is just a matter of cost and wafer yields. The reticle limit for TSMC 7nm node is ~830mm².

A12X/Z122mm²
A1398mm²
Ryzen 4800H156mm²
Radeon 5700 XT (Navi)251mm²
Radeon VII (Vega20)331mm²
NVidia A100 (Ampere)826mm²

Apple should be moving to the 5nm node at TSMC for their Apple Silicon?

This will be interesting as to my knowledge no one has ever before custom designed a SOC for a full size computer. This means they have both a bigger thermal overhead and more physical space and power to play with. So for example they could easily create their own version of SLI and put two of their A12Z class GPUs in every laptop and still be good on heat and power.

I know some looking at this have already seen other versions of my fever dream Pro-level Apple silicon SoCs, but here we go again! I think Apple might do something like this:

32 P cores / 4 E cores / 48 GPU cores / 32GB HBM2e UMA
48 P cores / 6 E cores / 64 GPU cores / 48GB HBM2e UMA
64 P cores / 8 E cores / 80 GPU cores / 64GB HBM2e UMA

Originally I have been specifying DDR5 RAM on the logic board, but if Apple is really "All UMA", will we even get that option? I dunno how things work, I just speculate from a poorly informed birds eye view? ;^p

As for Apple GPUs, I see it more as Apple Silicon-derived Compute GPGPUs, like the Threadripper-sized SoCs above, but all GPU cores & HBM2e & whatever else is needed to run them. With such cards for the iMac Pro & the (hopefully forthcoming) Mac Pro Cube, there would be a MXM type card with up to 80 GPU cores & up to 64GB HBM2e. For MPX modules, just double that because it would use two of the 'SoCs'.

And if the above idea for 'Apple GPUs' is what happens, that would be hilarious, since it would basically be how Apple looked at Pro GPUs with the trashcan Mac Pro!
 
  • Like
Reactions: macsplusmacs

leman

macrumors Core
Original poster
Oct 14, 2008
19,516
19,664
This will be interesting as to my knowledge no one has ever before custom designed a SOC for a full size computer. This means they have both a bigger thermal overhead and more physical space and power to play with. So for example they could easily create their own version of SLI and put two of their A12Z class GPUs in every laptop and still be good on heat and power.

Right? PC market so far has emphasized modularity, since you have many components to mix and match. SoC-like approaches were reserved for the low end. But there are a lot of advantages with using a system in a chip. I am really looking forward to this autum.
 

Yebubbleman

macrumors 603
May 20, 2010
6,024
2,616
Los Angeles, CA
Can you elaborate? Unified memory means unified memory. What other interpretation do you have?

The Intel chips used on Macs are not Systems on a Chip (SoC). They do not use unified memory. The IGPs on an Intel processor are using system memory (memory whose primary purpose is for the system and not the graphics processor).

It is my understanding that the Apple Silicon SoCs use memory that, by virtue of the entire system being on the same chip, can be employed just as well to the CPU as it can to the GPU. Also that the drawbacks that we've seen on low-end Macs during the Intel era (most Mac minis, some 17"/20"/21.5" iMacs, 13" MacBook Pros, all MacBooks, and all MacBook Airs) are not drawbacks that we'll see with this new system architecture. There are WWDC videos that explain this way better than I can.

What I am not sure of is whether or not we'll still see a separate GPU in something like a desktop class system (high-end iMac or Mac Pro) for a performance boost. I'd think that Apple would try to avoid putting something like that in an ARM-based 16" MacBook Pro type laptop given that it would negate much of the thermal/energy savings of the switch to ARM. But I can't say for sure. eGPUs also seem like a wild card here, given the big deal that was made about them just last year.
 
  • Like
Reactions: Macintosh IIcx

leman

macrumors Core
Original poster
Oct 14, 2008
19,516
19,664
The Intel chips used on Macs are not Systems on a Chip (SoC). They do not use unified memory. The IGPs on an Intel processor are using system memory (memory whose primary purpose is for the system and not the graphics processor).

I suppose it depends on what exactly one means by system on a chip. Intel CPU and iGPU reside on the same chip and access the system RAM through the same memory controller and the LLC cache. This ticks all definitions of unified memory architecture in my book. A non-unified memory architecture is basically any dGPU, since the CPU and the GPU use different physical RAM.

Anyway, you don't have to believe me, you can have it from Intel itself:


Sections 3 and 4.5.2 are most relevant.

Anyway, intel has been describing their CPUs as a System on a Chip with unified memory since at least Haswell.

Also that the drawbacks that we've seen on low-end Macs during the Intel era (most Mac minis, some 17"/20"/21.5" iMacs, 13" MacBook Pros, all MacBooks, and all MacBook Airs) are not drawbacks that we'll see with this new system architecture.

The primary drawbacks of these Macs were that the old Intel GPU were dead slow. Current Intel iGPUs are not too bad. Apple iGPUs are better.

What I am not sure of is whether or not we'll still see a separate GPU in something like a desktop class system (high-end iMac or Mac Pro) for a performance boost.

A separate GPU would mean abandoning the unified architecture. Depending on what "separate" means. You can certainly have CPU and GPU on two different dies, but still connected by a fast bus to a common memory controller and common RAM (this would probably be a bad idea though since it will slow everything down). You see, this is the reason why I think that the notions of "integrated" and "discrete" GPU are misnomers. These words by themselves can mean anything and most people are just using them as "slow" vs "fast". What we should be talking instead is: what is the memory architecture of the system? how are components interconnected? where are they physically located (same chip, different chips)? what are the performance characteristics? etc.
 
Last edited:
  • Love
Reactions: pldelisle

Waragainstsleep

macrumors 6502a
Oct 15, 2003
612
221
UK
So for example they could easily create their own version of SLI and put two of their A12Z class GPUs in every laptop and still be good on heat and power.

From what I understand, the CPU and GPU are sharing the same memory so unlike dGPUs, data doesn't need to be copied from the system memory into the video memory.
I have no idea how complex that sharing is. Is it just a matter of dynamically reallocating which processor has 'authority' over every given memory address? Doesn't sound like it would be too complicated to add another GPU in that case. Would this be any different from just adding more GPU cores? I have no idea.


Apple should be moving to the 5nm node at TSMC for their Apple Silicon?

Yes I think so.

Anyway, intel has been describing their CPUs as a System on a Chip with unified memory since at least Haswell.

I think once you move the Northbridge logic onto the CPU die you have a SoC. Anything beyond that is just architecture specific details.

A separate GPU would mean abandoning the unified architecture.

It might mean giving up some aspects/benefits of it. Might.
Current Macs can switch quite dynamically between iGPU and dGPU as their workload requires. I see no reason Apple Silicon couldn't do the same as longs it supports PCI-E.
Whether it offloads everything to the dGPU is another matter. I don't see why Apple couldn't share the load between multiple GPUs just like the current Mac Pros do with their dual dual GPUs and Afterburner cards.
 

Yebubbleman

macrumors 603
May 20, 2010
6,024
2,616
Los Angeles, CA
I suppose it depends on what exactly one means by system on a chip. Intel CPU and iGPU reside on the same chip and access the system RAM through the same memory controller and the LLC cache. This ticks all definitions of unified memory architecture in my book. A non-unified memory architecture is basically any dGPU, since the CPU and the GPU use different physical RAM.

Anyway, you don't have to believe me, you can have it from Intel itself:


Sections 3 and 4.5.2 are most relevant.

Anyway, intel has been describing their CPUs as a System on a Chip with unified memory since at least Haswell.

It seems we're having a semantics debate here...

There are plenty of system components and controllers present on an Apple SoC that are simply not present on an Intel processor.

Even if Intel's marketing team wants to call their Haswell and newer chips SoC's, they are not by any common definition.

Anyway, "unified memory architecture" is a broad term as you're using it here. It's like saying that a Hyundai Elantra and a Tesla Model X are both cars.

Your original post is looking at the Apple Silicon GPU's use of shared memory as though its implementation of UMA is the same as that of an Intel integrated graphics processor. It's not. It's probably more similar to that of an AMD APU than an Intel CPU, but even that's not a like-for-like comparison to be making here.



The primary drawbacks of these Macs were that the old Intel GPU were dead slow. Current Intel iGPUs are not too bad. Apple iGPUs are better.

That's a bit of an oversimplification. Yes, Intel's IGPs sucked, on the whole. You'd be fooling yourself if you said that AMD's were much better. They weren't. Yes, you're dealing with a weaker GPU (that has to share the die with the CPU). The NVIDIA IGPs that were used in Macs from 2008-2011 drastically improved things over the Intel GMA X3100 of that era, but they still paled in comparison to any discrete GPU. It's not that Apple's IGPs are better than Intel. It's that Apple's SoC system architecture allows for IGPs to share memory with the system and not sacrifice performance as a result.

Any Intel Mac with a discrete GPU will always have better graphics performance than any Intel Mac with any integrated graphics processor whether it's made by Intel, AMD, or NVIDIA. Period.

Apple's architecture, while yes, employing shared RAM between the GPU and CPU does this differently. Don't ask me how (I'm not even sure Apple has fully revealed this yet; but if they have, it'll be in a WWDC 2020 video). But it does. This is how their GPU can be compared favorably to that of the AMD GPU in the Xbox One S.





A separate GPU would mean abandoning the unified architecture. Depending on what "separate" means. You can certainly have CPU and GPU on two different dies, but still connected by a fast bus to a common memory controller and common RAM (this would probably be a bad idea though since it will slow everything down). You see, this is the reason why I think that the notions of "integrated" and "discrete" GPU are misnomers. These words by themselves can mean anything and most people are just using them as "slow" vs "fast". What we should be talking instead is: what is the memory architecture of the system? how are components interconnected? where are they physically located (same chip, different chips)? what are the performance characteristics? etc.

You don't abandon the UMA by adding a discrete GPU, especially if that discrete GPU can be offloaded however many tasks from the rest of the SoC. It's no different than having a Mac that sends its complex rendering jobs to a dedicated render farm.

And yes, the "Integrated Graphics" of the Intel era is NOT the "Integrated Graphics" of the Apple Silicon era, but that's also why I originally said that you're looking at it (the whether or not Apple uses HMB2 vs. GDDR6 for VRAM) the wrong way. Comparing Apple's graphics architecture with any graphics architecture that we've seen on any Intel or PowerPC Mac up to this point is the epitome of comparing Apples to Oranges. The secret sauce in Apple's GPUs appears to be tile-based differed rendering. The following videos should give you as good of an idea as anyone can have of this until the new Macs see the light of day:

 

diamond.g

macrumors G4
Mar 20, 2007
11,435
2,658
OBX
It seems we're having a semantics debate here...

There are plenty of system components and controllers present on an Apple SoC that are simply not present on an Intel processor.

Even if Intel's marketing team wants to call their Haswell and newer chips SoC's, they are not by any common definition.

Anyway, "unified memory architecture" is a broad term as you're using it here. It's like saying that a Hyundai Elantra and a Tesla Model X are both cars.

Your original post is looking at the Apple Silicon GPU's use of shared memory as though its implementation of UMA is the same as that of an Intel integrated graphics processor. It's not. It's probably more similar to that of an AMD APU than an Intel CPU, but even that's not a like-for-like comparison to be making here.





That's a bit of an oversimplification. Yes, Intel's IGPs sucked, on the whole. You'd be fooling yourself if you said that AMD's were much better. They weren't. Yes, you're dealing with a weaker GPU (that has to share the die with the CPU). The NVIDIA IGPs that were used in Macs from 2008-2011 drastically improved things over the Intel GMA X3100 of that era, but they still paled in comparison to any discrete GPU. It's not that Apple's IGPs are better than Intel. It's that Apple's SoC system architecture allows for IGPs to share memory with the system and not sacrifice performance as a result.

Any Intel Mac with a discrete GPU will always have better graphics performance than any Intel Mac with any integrated graphics processor whether it's made by Intel, AMD, or NVIDIA. Period.

Apple's architecture, while yes, employing shared RAM between the GPU and CPU does this differently. Don't ask me how (I'm not even sure Apple has fully revealed this yet; but if they have, it'll be in a WWDC 2020 video). But it does. This is how their GPU can be compared favorably to that of the AMD GPU in the Xbox One S.







You don't abandon the UMA by adding a discrete GPU, especially if that discrete GPU can be offloaded however many tasks from the rest of the SoC. It's no different than having a Mac that sends its complex rendering jobs to a dedicated render farm.

And yes, the "Integrated Graphics" of the Intel era is NOT the "Integrated Graphics" of the Apple Silicon era, but that's also why I originally said that you're looking at it (the whether or not Apple uses HMB2 vs. GDDR6 for VRAM) the wrong way. Comparing Apple's graphics architecture with any graphics architecture that we've seen on any Intel or PowerPC Mac up to this point is the epitome of comparing Apples to Oranges. The secret sauce in Apple's GPUs appears to be tile-based differed rendering. The following videos should give you as good of an idea as anyone can have of this until the new Macs see the light of day:

Bold by me, that is the part I am curious about. What metric is being used here? TFLOPS? Actual game output? Geometry output? Memory bandwidth? Apple says Xbox One S class graphics, but what does that mean?

As long as data can be streamed from storage really quickly the amount of video ram may not be as relevant anymore (note Sony and MS have new ways to stream data for instant (no) loading of assets for games). Is it safe to presume Apple has the same feature?
 

leman

macrumors Core
Original poster
Oct 14, 2008
19,516
19,664
There are plenty of system components and controllers present on an Apple SoC that are simply not present on an Intel processor.

I agree. But then again, Intel CPUs do fit the commonly used definition of an SoC. They do combine CPU, GPU, various caches, AI processing units, I/O circuitry, memory controllers... I do not think the difference is that dramatic.

Anyway, "unified memory architecture" is a broad term as you're using it here. It's like saying that a Hyundai Elantra and a Tesla Model X are both cars.


Again, I agree. This is why it's important to make it clear how the term is used. After studying technical documentation from both Apple and Intel, I believe that they use the term UML is exactly the same way: to describe a system that has only one physical memory interface which is accessed jointly by CPU and GPU though the shared memory controller and cache subsystem. I didn't read the AMD documentation, but I would suppose their APUs work similarly.

An example for a different kind of "UML" is Nvidia CUDA. What Nvidia calls "unified memory" actually involves transmitting data over the PCI-e bus between the CPU and the GPU storage. These copies happen transparent to the user, but they still happen. No such copy occurs in "true UML" as implemented by Apple/Intel etc. — there is no need to copy data because both components access the same primary RAM directly.



That's a bit of an oversimplification. Yes, Intel's IGPs sucked, on the whole. You'd be fooling yourself if you said that AMD's were much better. They weren't. Yes, you're dealing with a weaker GPU (that has to share the die with the CPU). The NVIDIA IGPs that were used in Macs from 2008-2011 drastically improved things over the Intel GMA X3100 of that era, but they still paled in comparison to any discrete GPU. It's not that Apple's IGPs are better than Intel. It's that Apple's SoC system architecture allows for IGPs to share memory with the system and not sacrifice performance as a result.

Apple's architecture, while yes, employing shared RAM between the GPU and CPU does this differently. Don't ask me how (I'm not even sure Apple has fully revealed this yet; but if they have, it'll be in a WWDC 2020 video). But it does. This is how their GPU can be compared favorably to that of the AMD GPU in the Xbox One S.

As I already stated, I do not believe that Apple's way of sharing RAM is fundamentally different from Intel's or any other "integrated" solution. Its as you say, a combination of factors:

- Apple GPUs are tile-based deferred renderers which makes them less reliant on bandwidth. Their memory access is more predictable (meaning it is more likely to hit in the cache), and they need less data bandwidth overall (because of
the perfect HSR — they only shade what is visible, while a forward renderer GPU might shade the same pixel multiple times)

- Apple GPUs contain a crapload of ultra-fast on-GPU cache, which Apple calls tile memory. Because of the GPU's TBDR architecture, a lot of expensive memory operations will fit entirely within this cache. Furthermore, Apple exposes a fine level of control over this cache to the developer with Metal. Using techniques like tile compute shaders, load and store actions and image-blocks, one can implement complex rendering techniques without ever needing needing a trip to the RAM. This is by the way where I believe Apple GPUs can dominate any forward renderer in games — because it can implement the same effects by literally doing 10x less work.

- Apple SoC contain a crapload of SoC-level cache. It is estimated that A13 has 16MB of shared last level cache. Intel Ice Lake in contrast has 8MB




Any Intel Mac with a discrete GPU will always have better graphics performance than any Intel Mac with any integrated graphics processor whether it's made by Intel, AMD, or NVIDIA. Period.

Sure, because why would one use a dGPU if it is slower than the iGPU? Anyway, AMD iGPUs are faster than entry-level discrete GPUs such as MX150. And the upcoming Tiger Lake iGPU is as well.

The following videos should give you as good of an idea as anyone can have of this until the new Macs see the light of day:


Yeah, I've watched all these and more, read the metal docs and worked with Metal for a while. White do you think I'm being so vocal about all this stuff. It's not just from reading forums ;)
 
  • Love
Reactions: pldelisle

entropi

macrumors 6502a
May 20, 2008
608
401
well, I will not miss the dGPU's - less heat and less energydrain can only be good...
(Logic Pro has so far worked fine without them anyway)
 

awesomedeluxe

macrumors 6502
Jun 29, 2009
262
105
That is also something I was thinking about. But the interesting thing is that Apple was stressing over and over again during the WWDC that their new Macs will use SoCs with unified memory between CPU and GPU. They literally repeated a number of times "you don't need to copy the data between CPU and GPU since they can access the same memory".

That said, your point is very valid and I am also wondering how they are going to solve the heat issues with a large chip. It is possible that they might break the CPU and GPU into physically separate chips while retaining them on the same package with shared cache and memory controllers. Then again, if one can transfer heat fast enough, cooling a 80Watt+ SoC should be realistically possible. I mean, there are 200+ watts chips out there and they work...
I think our heads are in about the same place. I don't think Apple would separate their APU into two chips like Kaby G does, but even if they did a tightly-knit package with a CPU, GPU, and however many stacks of HBM2 is going to be warm. Might be worth digging into the details. I'll try and make the case for a non-unified approach with some very rough numbers for a ballpark estimate of what is feasible. I'll be making a lot of assumptions in the process and would appreciate your thoughts. For my example I'll use a theoretical 14" device (MBP13 chassis or smaller) since I think it's well-suited to either approach.

We know the current 13" MBP uses a 28W SoC and has for some time. I'll use this as rough guidance for SoC thermal capacity as well as system battery life. Guessing on how much power our A14 cores use: we know that on the A13 the 2 high perf lightning cores and 4 GPU cores together use a maximum of
6.2W active power (internally-not including the screen). I'll roughly assume the high perf cores and GPU cores each use about 1W max, which I think is a conservative guess (I suspect the GPU cores use a little less but don't want to venture out too far).

A14 cores will be on TSMC's 5nm process, which gives us "15% higher clock speeds or 30% better performance/watt." I'm going to assume the clocks on the new firestorm cores are boosted as high as they can go to maximize single core performance, and are still 1W. But since GPU cores work better in parallel, I'm assuming they put everything in power savings to maximize core count, so I think each GPU core will be 0.7W. Lastly, we're putting HBM2E memory on the same package.
Each stack of HBM2E is 5W. Now let's build.

Scenario 1: Unified Approach - CPU, GPU, and Memory are Close to Minimize Latency

8W - 8x Firestorm Cores @ 3GHz
11.2W - 16x Apple GPU Cores
10W - 2x 16GB HBM2E stacks
---
29.2W SoC


Both Samsung and SK Hynix can make the HBM2E. Putting two stacks this close to our APU is constraining, but we get a total of 32GB high speed memory and this really is a gorgeous SoC I would be proud to own. Based on limited benchmarks comparing the 7 core
A12X GPU with the 64 EU Ice Lake GPU, I think this would trounce Intel's upcoming 96EU Tiger Lake part and then some.

Scenario 2: Separate Approach - CPU and GPU Apart to Minimize Thermal Constraints

I think Apple will want to reuse the same SoC for everything from the iPad 11 to the MBP 16 to save money. If parts with bad cores can be salvaged they'll go in iPads.

8W / 4W - 8x/4x Firestorm Cores @ 3GHz
0W / 5.6W - 0x/8x Apple GPU Cores
---
8W / 9.6W SoC

As mentioned earlier, I think this design avoids using all 8 Firestorm Cores and the iGPU at the same time. I'm guessing 8W when it's just using Firestorm cores (and the off-package dGPU) and 9.6W when it's using 4 Firestorm Cores, 4 Icestorm Cores, and the 8 core iGPU.

There's also 32GB LPDDR5 soldered onto the logic board somewhere, and while it's fast, it's not nearly as impressive as 32GB of on-package HBM2E. I think this trade-off is worth it for the GPU we can fit now.

16.8W - 24x Apple GPU Cores
5W - 1x 8GB HBM2E stack

---
21.8W GPU
+ 8W CPU

---
29.8W

This GPU is probably salvaged from higher core part designed for the MBP 16 that had some manufacturing defects. Again there is 32GB LPDDR5 unaccounted for somewhere, so at max power this drains the battery faster than the unified design. It's more scalable though, so I think overall battery life will be the same.

I think Scenario 2 is a good trade for both Apple and the consumer. Apple has to manufacture two parts, but they are both easier and cheaper to make than an SoC with two stacks of 8-Hi HBM2E and can be reused in lots of devices. The consumer gets 8 more GPU cores and 8GB more total memory, and the Firestorm cores can run faster for longer since the system is less thermal constrained. Some of the memory is slower and data has to be copied back and forth a lot, but on balance I think a separate GPU makes the most sense.

What do you think? In addition to having all your GPU cores on the same die, putting two stacks of HBM2E on package so it can be used by the CPU and GPU would generate a lot of heat. I think this is why the "Fusion" concept still hasn't taken off. It feels like we are really close but HBM2E is just not quite efficient enough for me to want 32GB on package.
 
Last edited:

Boil

macrumors 68040
Oct 23, 2018
3,477
3,172
Stargate Command
I think our heads are in about the same place. I don't think Apple would separate their APU into two chips like Kaby G does, but even if they did a tightly-knit package with a CPU, GPU, and however many stacks of HBM2 is going to be warm. Might be worth digging into the details. I'll try and make the case for a non-unified approach with some very rough numbers for a ballpark estimate of what is feasible. I'll be making a lot of assumptions in the process and would appreciate your thoughts. For my example I'll use a theoretical 14" device (MBP13 chassis or smaller) since I think it's well-suited to either approach.

We know the current 13" MBP uses a 28W SoC and has for some time. I'll use this as rough guidance for SoC thermal capacity as well as system battery life. Guessing on how much power our A14 cores use: we know that on the A13 the 2 high perf lightning cores and 4 GPU cores together use a maximum of
6.2W active power (internally-not including the screen). I'll roughly assume the high perf cores and GPU cores each use about 1W max, which I think is a conservative guess (I suspect the GPU cores use a little less but don't want to venture out too far).

A14 cores will be on TSMC's 5nm process, which gives us "15% higher clock speeds or 30% better performance/watt." I'm going to assume the clocks on the new firestorm cores are boosted as high as they can go to maximize single core performance, and are still 1W. But since GPU cores work better in parallel, I'm assuming they put everything in power savings to maximize core count, so I think each GPU core will be 0.7W. Lastly, we're putting HBM2E memory on the same package.
Each stack of HBM2E is 5W. Now let's build.

Scenario 1: Unified Approach - CPU, GPU, and Memory are Close to Minimize Latency

8W - 8x Firestorm Cores @ 3GHz
11.2W - 16x Apple GPU Cores
10W - 2x 16GB HBM2E stacks
---
29.2W SoC


Both Samsung and SK Hynix can make the HBM2E. Putting two stacks this close to our APU is constraining, but we get a total of 32GB high speed memory and this really is a gorgeous SoC I would be proud to own. Based on limited benchmarks comparing the 7 core
A12X GPU with the 64 EU Ice Lake GPU, I think this would trounce Intel's upcoming 96EU Tiger Lake part and then some.

Scenario 2: Separate Approach - CPU and GPU Apart to Minimize Thermal Constraints

I think Apple will want to reuse the same SoC for everything from the iPad 11 to the MBP 16 to save money. If parts with bad cores can be salvaged they'll go in iPads.

8W / 4W - 8x/4x Firestorm Cores @ 3GHz
0W / 5.6W - 0x/8x Apple GPU Cores
---
8W / 9.6W SoC

As mentioned earlier, I think this design avoids using all 8 Firestorm Cores and the iGPU at the same time. I'm guessing 8W when it's just using Firestorm cores (and the off-package dGPU) and 9.6W when it's using 4 Firestorm Cores, 4 Icestorm Cores, and the 8 core iGPU.

There's also 32GB LPDDR5 soldered onto the logic board somewhere, and while it's fast, it's not nearly as impressive as 32GB of on-package HBM2E. I think this trade-off is worth it for the GPU we can fit now.

16.8W - 24x Apple GPU Cores
5W - 1x 8GB HBM2E stack

---
21.8W GPU
+ 8W CPU

---
29.8W

This GPU is probably salvaged from higher core part designed for the MBP 16 that had some manufacturing defects. Again there is 32GB LPDDR5 unaccounted for somewhere, so at max power this drains the battery faster than the unified design. It's more scalable though, so I think overall battery life will be the same.

I think Scenario 2 is a good trade for both Apple and the consumer. Apple has to manufacture two parts, but they are both easier and cheaper to make than an SoC with two stacks of 8-Hi HBM2E and can be reused in lots of devices. The consumer gets 8 more GPU cores and 8GB more total memory, and the Firestorm cores can run faster for longer since the system is less thermal constrained. Some of the memory is slower and data has to be copied back and forth a lot, but on balance I think a separate GPU makes the most sense.

What do you think? In addition to having all your GPU cores on the same die, putting two stacks of HBM2E on package so it can be used by the CPU and GPU would generate a lot of heat. I think this is why the "Fusion" concept still hasn't taken off. It feels like we are really close but HBM2E is just not quite efficient enough for me to want 32GB on package.

Scenario 1 seems perfect for the Mac Pro line-up! Just much higher core counts & larger amounts of HBM2e:

32 P cores / 4 E cores / 48 GPU cores / 32GB HBM2e UMA / 100W
48 P cores / 6 E cores / 64 GPU cores / 48GB HBM2e UMA / 130W
64 P cores / 8 E cores / 80 GPU cores / 64GB HBM2e UMA / 170W

Threadripper-size SoCs? Something along the lines of the heat sink & fans in the current Mac Pro would cool any of the above APUzillas! The "Big Chungus" Mac Pro chassis could allow two APUzillas?

These same APUs could be modified for use as GPU (GPGPU) expansion. same GPU core counts as above; MXM-style cards for iMac Pro & Mac Pro Cube (yes, the Cube comes back), MPX modules for the Mac "Big Chungus" Pro, even dual APU versions (so 160 GPU cores & 128GB HBM2e per MPX module).

48 GPU cores / 32GB HBM2e / 70W
64 GPU cores / 48GB HBM2e / 85W
80 GPU cores / 64GB HBM2e / 110W

Dual 48s / 150W
Dual 64s / 180W
Dual 80s / 230W

Wow, just typing all that out shows how much lower power draw Apple Silicon could allow! a fully loaded Mac Pro could draw about 1000W, so could easily run off a Platinum-rated 1200W PSU!

Or a sweet loaded up DCC workstation Mac Pro Cube that draws 300W under full load!
 

Erehy Dobon

Suspended
Feb 16, 2018
2,161
2,017
No service
I suppose it depends on what exactly one means by system on a chip. Intel CPU and iGPU reside on the same chip and access the system RAM through the same memory controller and the LLC cache. This ticks all definitions of unified memory architecture in my book. A non-unified memory architecture is basically any dGPU, since the CPU and the GPU use different physical RAM.
This is the way SGI defined UMA in 1996 specifically with their O2 workstation product line.

Both the CPU and the GPU chipset accessed the same memory (up to the system maximum 1GB RAM). This allowed for very large textures including streaming video as a texture which was quite unusual at the time.
 

awesomedeluxe

macrumors 6502
Jun 29, 2009
262
105
Scenario 1 seems perfect for the Mac Pro line-up! Just much higher core counts & larger amounts of HBM2e:

32 P cores / 4 E cores / 48 GPU cores / 32GB HBM2e UMA / 100W
48 P cores / 6 E cores / 64 GPU cores / 48GB HBM2e UMA / 130W
64 P cores / 8 E cores / 80 GPU cores / 64GB HBM2e UMA / 170W

Threadripper-size SoCs? Something along the lines of the heat sink & fans in the current Mac Pro would cool any of the above APUzillas! The "Big Chungus" Mac Pro chassis could allow two APUzillas?

These same APUs could be modified for use as GPU (GPGPU) expansion. same GPU core counts as above; MXM-style cards for iMac Pro & Mac Pro Cube (yes, the Cube comes back), MPX modules for the Mac "Big Chungus" Pro, even dual APU versions (so 160 GPU cores & 128GB HBM2e per MPX module).

48 GPU cores / 32GB HBM2e / 70W
64 GPU cores / 48GB HBM2e / 85W
80 GPU cores / 64GB HBM2e / 110W

Dual 48s / 150W
Dual 64s / 180W
Dual 80s / 230W

Wow, just typing all that out shows how much lower power draw Apple Silicon could allow! a fully loaded Mac Pro could draw about 1000W, so could easily run off a Platinum-rated 1200W PSU!

Or a sweet loaded up DCC workstation Mac Pro Cube that draws 300W under full load!
Yeah, it works better with desktop-class cooling. But my guess would be a little different than what you've laid out here.

The single core performance of a Firestorm core at 3GHz isn't acceptable for a desktop. CPU clock speed and power consumption have an exponential relationship and our power efficiency is going to completely vanish on the way to 3.5GHz. Look at the A12:

a12-fvcurve.png


Apple already went farther along that power curve with the A13. We get a 15% boost from TSMC, and that's how we got to 3GHz. We can get to 3.5GHz by just eating it and quadrupling power consumption because this is a desktop and it has fans and stuff and we probably should.

That said, I would stop at 16 CPU cores. At higher clock speeds those extra cores are a liability not worth their weight. You can always get some bang for your buck increasing the GPU core count, but boosting the CPU core count is a game of diminishing returns.

Let's assume the GPU cores are now clocked up relative to what TSMC's 5nm process can easily allow and each consume 1W. And in taking the Firestorm cores from 3GHz to 3.5GHz, they now consume an egregious 4W (this might be too generous, but I don't feel like math right now).

64W - 16 Firestorm Cores
64W - 64 Apple GPU Cores
20W - 4x 16GB HBM2 (64GB)
---
148W

The APU might still melt. The CPU cores are hot and very close together and surrounded by GPU cores and HBM stacks that are also hot and very close together. But it will maybe be OK. Its biggest problem is the slew of 5GHz x86 processors that will run circles around it. Unified high speed memory will help with that a little.
 

leman

macrumors Core
Original poster
Oct 14, 2008
19,516
19,664
I think our heads are in about the same place. I don't think Apple would separate their APU into two chips like Kaby G does, but even if they did a tightly-knit package with a CPU, GPU, and however many stacks of HBM2 is going to be warm. Might be worth digging into the details. I'll try and make the case for a non-unified approach with some very rough numbers for a ballpark estimate of what is feasible. I'll be making a lot of assumptions in the process and would appreciate your thoughts. For my example I'll use a theoretical 14" device (MBP13 chassis or smaller) since I think it's well-suited to either approach.

We know the current 13" MBP uses a 28W SoC and has for some time. I'll use this as rough guidance for SoC thermal capacity as well as system battery life. Guessing on how much power our A14 cores use: we know that on the A13 the 2 high perf lightning cores and 4 GPU cores together use a maximum of
6.2W active power (internally-not including the screen). I'll roughly assume the high perf cores and GPU cores each use about 1W max, which I think is a conservative guess (I suspect the GPU cores use a little less but don't want to venture out too far).

A14 cores will be on TSMC's 5nm process, which gives us "15% higher clock speeds or 30% better performance/watt." I'm going to assume the clocks on the new firestorm cores are boosted as high as they can go to maximize single core performance, and are still 1W. But since GPU cores work better in parallel, I'm assuming they put everything in power savings to maximize core count, so I think each GPU core will be 0.7W. Lastly, we're putting HBM2E memory on the same package.
Each stack of HBM2E is 5W. Now let's build.

Scenario 1: Unified Approach - CPU, GPU, and Memory are Close to Minimize Latency

8W - 8x Firestorm Cores @ 3GHz
11.2W - 16x Apple GPU Cores
10W - 2x 16GB HBM2E stacks
---
29.2W SoC


Both Samsung and SK Hynix can make the HBM2E. Putting two stacks this close to our APU is constraining, but we get a total of 32GB high speed memory and this really is a gorgeous SoC I would be proud to own. Based on limited benchmarks comparing the 7 core
A12X GPU with the 64 EU Ice Lake GPU, I think this would trounce Intel's upcoming 96EU Tiger Lake part and then some.

Scenario 2: Separate Approach - CPU and GPU Apart to Minimize Thermal Constraints

I think Apple will want to reuse the same SoC for everything from the iPad 11 to the MBP 16 to save money. If parts with bad cores can be salvaged they'll go in iPads.

8W / 4W - 8x/4x Firestorm Cores @ 3GHz
0W / 5.6W - 0x/8x Apple GPU Cores
---
8W / 9.6W SoC

As mentioned earlier, I think this design avoids using all 8 Firestorm Cores and the iGPU at the same time. I'm guessing 8W when it's just using Firestorm cores (and the off-package dGPU) and 9.6W when it's using 4 Firestorm Cores, 4 Icestorm Cores, and the 8 core iGPU.

There's also 32GB LPDDR5 soldered onto the logic board somewhere, and while it's fast, it's not nearly as impressive as 32GB of on-package HBM2E. I think this trade-off is worth it for the GPU we can fit now.

16.8W - 24x Apple GPU Cores
5W - 1x 8GB HBM2E stack

---
21.8W GPU
+ 8W CPU

---
29.8W

This GPU is probably salvaged from higher core part designed for the MBP 16 that had some manufacturing defects. Again there is 32GB LPDDR5 unaccounted for somewhere, so at max power this drains the battery faster than the unified design. It's more scalable though, so I think overall battery life will be the same.

I think Scenario 2 is a good trade for both Apple and the consumer. Apple has to manufacture two parts, but they are both easier and cheaper to make than an SoC with two stacks of 8-Hi HBM2E and can be reused in lots of devices. The consumer gets 8 more GPU cores and 8GB more total memory, and the Firestorm cores can run faster for longer since the system is less thermal constrained. Some of the memory is slower and data has to be copied back and forth a lot, but on balance I think a separate GPU makes the most sense.

What do you think? In addition to having all your GPU cores on the same die, putting two stacks of HBM2E on package so it can be used by the CPU and GPU would generate a lot of heat. I think this is why the "Fusion" concept still hasn't taken off. It feels like we are really close but HBM2E is just not quite efficient enough for me to want 32GB on package.

Its always a pleasure to engage in some speculation with a fellow computing enthusiast :)

I agree with you that Apple is likely to continue using the 30W TDP (sustained load) target for their 13" laptop — they already have the cooling system, so why fix what's not broken.

I am skeptical that we will see on-package RAM this time, as you say, this will leave less thermal budget to the processing units and given the huge cache on Apple SoC, I am not sure that they would benefit that much from putting the RAM closer. Also, I don't think we will see HBM2 in the 13" model — its still too expensive and quad-core LPDDR5 will have more than enough bandwidth to allow competitive performance (especially when we consider that the GPU is already quite bandwidth efficient).

I think your power consumption figures are realistic, even if it is a bit difficult to speculate about these things given that Apple does not divulge much information about the specs of their chips. Going by conservative estimate, and assuming no major architectural changes from the A13, I would guess that the 30W envelope allow to maintain above 2.5 Ghz on 8 P-cores under sustained operation, or 3.5Ghz+ on one or two P-cores for burst (or single-threaded) operation, which would make this chip more then a match to any x86-64 CPU in the same power bracket. As to the GPU, I too think that we should expect 16 cores with an overall performance comparable to the GTX 1660. I don't dare to speculate how the power usage will be slit between the CPU and the GPU, but I would guess that it would be some sort of dynamic on-demand story.

Finally, I do not think that there will be any issues with cooling an SoC at these energy dissipation levels. There are large CPUs and GPUs that consist of dozens of power-hungry cores, with a power dissipation of over 200 watts, and they can be thermally managed without much issue. I don't think that a 30W or a 80W (for the 16") SoC will be a challenge in this regard. What I am wondering though is more high-end hardware (think Mac Pro) — I have difficulty seeing Apple building a SoC that provides CPU performance of a multi-core Xeon and a GPU performance of a large workstation card.
[automerge]1594244640[/automerge]
Its biggest problem is the slew of 5GHz x86 processors that will run circles around it.

A13 Thunderbolt core running at 3.5Ghz will outperform a Skylake core running at 5Ghz with a healthy margin (after all, iPhone 11 comes really close to single-threaded performance of a desktop 9900K according to SPEC!). Not to mention that Intel's frequencies are much more modest when running multi-core workflows.
 

Waragainstsleep

macrumors 6502a
Oct 15, 2003
612
221
UK
I think Apple will want to reuse the same SoC for everything from the iPad 11 to the MBP 16 to save money. If parts with bad cores can be salvaged they'll go in iPads.

I think maybe later they might put lower binned Mac SoCs into iPads but for now I expect the Mac to have at least one of its own if not several. Apple is interested in making compelling products and trying to shoehorn all their requirements into one piece of silicon isn't going to achieve that. Besides, they aren't afraid to make lots of different chips already. As well as the A series, there are M and T series chips based on tech inherited from earlier A series. Maybe it ends up being that iPads end up with lower clocked previous ten CPUs but failing to take advantage of the better cooling available in Macs for the sake of a few bucks would be daft. Especially since they are saving a buttload of cash they no longer have to hand to Intel.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.