Any rumors on M1X 16" dGPU?

jonbones · May 17, 2021

My aging 2013 13" MBPr is on its last legs and I'm eagerly awaiting the new miniLED M1X 16" MBP.

When looking at the current 16" model, I see they have a 5600M dGPU offering. Have we heard any rumors or what will be paired with the M1X?

I know it may have a 16 core GPU from Apple that should theoretically rival a 5600M, but I imagine there needs to be a more powerful dGPU option that would act as the generational leap over last years model.

leman · May 17, 2021

There have not been any credible leaks or rumours so far (we don't even know whether M1X is going to be a thing at all). Furthermore, as the very concept of a dGPU goes agains Apple's declared goals for Apple Silicon, I doubt that we will see any dGPUs on Apple Macs, ever. Instead, we are likely to see larger Apple-designed GPUs that share the last level cache and the memory subsystem (aka. unified memory) with all other processors. Technically, such GPUs should be described as iGPUs, but since that notion is usually associated with low-performance, low-cost systems (which Apple GPUs are not), I'd prefer the term "heterogenous compute platforms". Kind of like what is used (or planned to be used) for high-end supercomputing systems (not unlike Nvidia Grace or AMD CDNA).

jonbones · May 17, 2021

leman said:
There have not been any credible leaks or rumours so far (we don't even know whether M1X is going to be a thing at all). Furthermore, as the very concept of a dGPU goes agains Apple's declared goals for Apple Silicon, I doubt that we will see any dGPUs on Apple Macs, ever. Instead, we are likely to see larger Apple-designed GPUs that share the last level cache and the memory subsystem (aka. unified memory) with all other processors. Technically, such GPUs should be described as iGPUs, but since that notion is usually associated with low-performance, low-cost systems (which Apple GPUs are not), I'd prefer the term "heterogenous compute platforms". Kind of like what is used (or planned to be used) for high-end supercomputing systems (not unlike Nvidia Grace or AMD CDNA).

That makes a lot of sense, but I wonder if simply doubling the GPU cores would rival an AMD 5600M. Hopefully Apple swings for the fences if they don't include a third party GPU.

leman · May 17, 2021

jonbones said:
That makes a lot of sense, but I wonder if simply doubling the GPU cores would rival an AMD 5600M. Hopefully Apple swings for the fences if they don't include a third party GPU.

If they go for quad-channel LPDDR5 and 16 GPU cores (maybe slightly higher clocked), they should be able to reach a performance level of 5600M fairly easily, at around 25W power consumption. But that's about it IMO, if they want to get faster, they'd need more cores and more memory bandwidth, which will start being expensive. I mean, increasing the number of cores is not a big problem, but the memory interface is going to be a limiting factor, and I don't really see Apple going for wider than 256-bit on a prosumer laptop. Of course, they might surprise us all and offer some sort of HBM-like arrangement with 512 or 1024 bit memory interface...

P.S. One problem is that 5600M is not a benchmark to beat anymore. Apple would ideally want to match a 3060/6600 class GPUs…

quarkysg · May 18, 2021

leman said:
Of course, they might surprise us all and offer some sort of HBM-like arrangement with 512 or 1024 bit memory interface...

Well, the MacBook Pro 16" 5600M GPU already uses HBM2 memory, so I think it would not be outside the realm of possibilities.

leman · May 18, 2021

quarkysg said:
Well, the MacBook Pro 16" 5600M GPU already uses HBM2 memory, so I think it would not be outside the realm of possibilities.

The problem, however, is memory capacity. If I understand it correctly, HBM2 is capped at 24GB per stack and using multiple stacks will likely make the system cost prohibitive (not to mention the power consumption). The interesting thing is that Nvidia Grace is using LPDDR5 instead of HBM…

quarkysg · May 18, 2021

leman said:
The problem, however, is memory capacity. If I understand it correctly, HBM2 is capped at 24GB per stack and using multiple stacks will likely make the system cost prohibitive (not to mention the power consumption). The interesting thing is that Nvidia Grace is using LPDDR5 instead of HBM…

Yeah, maybe a 512-bit LPDDR5 or even a 1024-bit bus would be possible, tho. I can imagine how placing 16 RAM modules around the SoC and routing the 1024 copper traces, in addition to the address and control/power traces would be a nightmare.

joema2 · May 18, 2021

leman said:
...the very concept of a dGPU goes agains Apple's declared goals for Apple Silicon, I doubt that we will see any dGPUs on Apple Macs, ever. Instead, we are likely to see larger Apple-designed GPUs that share the last level cache and the memory subsystem (aka. unified memory) with all other processors...The problem, however, is memory capacity....

All excellent points. I don't see many commentators noticing this. If Apple holds firmly to unified memory as an architectural mandate, that would seem to constrain memory and GPU scalability in the near term, or else incur a significant increase in SoC fabrication cost.

Maybe eventually at 2 nanometers the extra transistor budget and power efficiency would make it possible to have a high-core-count CPU and GPU on the same die similar to M1, but Apple can't wait that long to make high-end Apple Silicon iMacs and Mac Pros.

Unified memory works really well and solves a lot of inefficiency issues with separate GPU VRAM. At current fabrication technology that works for an M1-class SoC. It might work for the hypothetical M1X with double the CPU & GPU cores and maybe 32GB RAM. The problem is double the M1 GPU cores (while fast) is still not sufficient GPU horsepower for higher-end applications.

If they did use a dGPU, how would unified memory be maintained? It cannot go on a traditional bus. I see several possibilities.

- Use a "chiplet" dGPU (without VRAM) which shares the same substrate as the CPU. That is essentially a partial die which has faster bandwidth to the CPU than a separate die: https://www.eetimes.com/chiplets-a-short-history/#

- Use a separate dGPU die (without VRAM) integrated on the same SoC, much like the HBM RAM. Bandwidth is slower than a chiplet approach but faster than external to the SoC. Intel used this approach for their first quad-core Q6600 "Kentsfield" CPU, before integration allowed putting four x86 cores on a single die.

- Use a truly discrete dGPU package soldered on the PCB and communicating via a new proprietary ultra-speed bus. No compatibility would be needed, as it wouldn't be upgradeable. It's unclear if the bandwidth and latency would be good enough for a "VRAM-less" dGPU. If instead it's a regular VRAM design, how could unified memory work? I don't see them doing that.

leman · May 18, 2021

joema2 said:
All excellent points. I don't see many commentators noticing this. If Apple holds firmly to unified memory as an architectural mandate, that would seem to constrain memory and GPU scalability in the near term, or else incur a significant increase in SoC fabrication cost.

Maybe eventually at 2 nanometers the extra transistor budget and power efficiency would make it possible to have a high-core-count CPU and GPU on the same die similar to M1, but Apple can't wait that long to make high-end Apple Silicon iMacs and Mac Pros.

Unified memory works really well and solves a lot of inefficiency issues with separate GPU VRAM. At current fabrication technology that works for an M1-class SoC. It might work for the hypothetical M1X with double the CPU & GPU cores and maybe 32GB RAM. The problem is double the M1 GPU cores (while fast) is still not sufficient GPU horsepower for higher-end applications.

If they did use a dGPU, how would unified memory be maintained? It cannot go on a traditional bus. I see several possibilities.

- Use a "chiplet" dGPU (without VRAM) which shares the same substrate as the CPU. That is essentially a partial die which has faster bandwidth to the CPU than a separate die: https://www.eetimes.com/chiplets-a-short-history/#

- Use a separate dGPU die (without VRAM) integrated on the same SoC, much like the HBM RAM. Bandwidth is slower than a chiplet approach but faster than external to the SoC. Intel used this approach for their first quad-core Q6600 "Kentsfield" CPU, before integration allowed putting four x86 cores on a single die.

- Use a truly discrete dGPU package soldered on the PCB and communicating via a new proprietary ultra-speed bus. No compatibility would be needed, as it wouldn't be upgradeable. It's unclear if the bandwidth and latency would be good enough for a "VRAM-less" dGPU. If instead it's a regular VRAM design, how could unified memory work? I don't see them doing that.

My bet is that they will use a separate CPU and GPU die, connected to a shared LLC/memory controller die. This is scalable, economic, allows low-latency communication and preserves all the benefits for unified memory. I suppose that this is your scenario one.

An important consideration is that for a unified memory to work at all, CPU and GPU need to share the memory controllers and at least some cache. "Unstacking" the SoC into separate dies is the most obvious way to solve the configurability problems while maintaining economy of scale and avoiding yield issues.

Lemon Olive · May 18, 2021

jonbones said:
My aging 2013 13" MBPr is on its last legs and I'm eagerly awaiting the new miniLED M1X 16" MBP.

When looking at the current 16" model, I see they have a 5600M dGPU offering. Have we heard any rumors or what will be paired with the M1X?

I know it may have a 16 core GPU from Apple that should theoretically rival a 5600M, but I imagine there needs to be a more powerful dGPU option that would act as the generational leap over last years model.

According to today's report from the Bloomberg the MBP will have a 10 core CPU and 16 and 32 core graphics options. I'm sure you'll live with a 32 core graphics chip from Apple.

jonbones · May 18, 2021

Lemon Olive said:
According to today's report from the Bloomberg the MBP will have a 10 core CPU and 16 and 32 core graphics options. I'm sure you'll live with a 32 core graphics chip from Apple.

Oh yes, I saw that. 32 core AS GPU sounds dope as ****.

Only thing I'm waiting on now is confirmation of miniLED!

UBS28 · May 19, 2021

Looks like it will be an integrated GPU but with more cores. I believe it was 16 core and a 32 core GPU option.

Homy · May 19, 2021

M2 GPU with 32 cores would be on par with 5700 XT, 2070 Super, 2080 or 1080 Ti in games like Borderlands 3 at 1440p Ultra.

CWallace · May 19, 2021

jonbones said:
When looking at the current 16" model, I see they have a 5600M dGPU offering. Have we heard any rumors or what will be paired with the M1X?

Apple is rumored to be making their own GPU with the codename Lifuka. All we know is it is said to be on the same 5nm process as the M1 SoC.

I am guessing that this will be an "external" GPU that will likely be on-package with the M SoC (like the DRAM with the M1) so it can take advantage of Unified Memory and the high-speed connections to the CPU and Neural Engine cores. New rumors say the Mac Pro will have up to 128 GPU cores and having that in a separate on-package chip makes more sense to me than trying to have it part of the main SoC (which would result in a huge die size and transistor count).

I think Apple could have the rumored 16 GPU cores for the MacBook Pro on the main SoC so I am again going to guess "Lifuka", if it exists, is meant either solely for the Mac Pro or could also be in the upcoming "iMac Pro" that is rumored to offer up to 32 cores (I could see "Lifuka" offered in 32, 64 and 128 core variants, though perhaps there will be a 16-core variant, as well).

joema2 · May 19, 2021

CWallace said:
...I am guessing that this will be an "external" GPU that will likely be on-package with the M SoC (like the DRAM with the M1) so it can take advantage of Unified Memory and the high-speed connections to the CPU and Neural Engine cores....

That might be. The M1 die shots imply there's not enough room using 5 nanometers to fit a 16-core GPU on the die, much less a 32-core. They could use a physically larger die but that typically hurts yields.

The M1 is about 16 billion transistors over about 120 square mm:

Apple M1 & A14 Die Shot Comparison Shows Differences in SoC Design

When Apple first announced the M1, questions arose about the differences between it and the A14 chip which both share many architectural features and are both manufactured on TSMC's 5 nm process. Semiconductor analysis firm TechInsights has recently published die photos of the two processors and...

www.techpowerup.com

By contrast Intel's i9-10900K is about 206 square mm using 14 nanometer fabrication, transistor count unknown.

Intel Core i9-10900K der8auer De-Lidding Reveals Accurate Die-Size Measurements

Professional overclocker and extreme cooling products developer der8auer de-lidded a Core i9-10900K 10-core processor to study the processor's behavior with various kinds of custom cooling setups. It was discovered that the 10-core "Comet Lake" die measures 206.1 mm² in die-area. It is 9.2 mm...

www.techpowerup.com

Intel 14 nm Node Compared to TSMC's 7 nm Node Using Scanning Electron Microscope

Currently, Intel's best silicon manufacturing process available to desktop users is their 14 nm node, specifically the 14 nm+++ variant, which features several enhancements so it can achieve a higher frequencies and allow for faster gate switching. Compare that to AMD's best, a Ryzen 3000 series...

www.techpowerup.com

With 10nm, Intel will probably achieve a gate density of 90 million transistors per square mm, compared to around 134 million per square mm on Apple/TSMC's 5nm process: https://en.wikipedia.org/wiki/5_nm_process

All this implies what you said -- it seems unlikely Apple/TSMC can cram 8 power cores and 16 GPU cores (much less 32) on a single die like the M1. Maybe in the future at 2 nanometers they could do this but probably not now. That in turn indicates they may use a dGPU integrated on the same SoC.

However they do it, roughly extrapolating from M1 benchmarks, it would appear an M1X (or whatever) with 8 power cores and 32 GPU cores would be significantly faster in single-core, multi-core and GPU perf. than an 18-core iMac Pro with a Vega 64 GPU. Rarely measured on benchmarks is video decode/encode performance. If the M1X has updated video accelerators it might be much faster on those tasks than the M1, which is already very speedy.

jdb8167 · May 20, 2021

joema2 said:
That might be. The M1 die shots imply there's not enough room using 5 nanometers to fit a 16-core GPU on the die, much less a 32-core. They could use a physically larger die but that typically hurts yields.

The M1 is about 16 billion transistors over about 120 square mm:

Apple M1 & A14 Die Shot Comparison Shows Differences in SoC Design

When Apple first announced the M1, questions arose about the differences between it and the A14 chip which both share many architectural features and are both manufactured on TSMC's 5 nm process. Semiconductor analysis firm TechInsights has recently published die photos of the two processors and...

www.techpowerup.com

By contrast Intel's i9-10900K is about 206 square mm using 14 nanometer fabrication, transistor count unknown.

Intel Core i9-10900K der8auer De-Lidding Reveals Accurate Die-Size Measurements

Professional overclocker and extreme cooling products developer der8auer de-lidded a Core i9-10900K 10-core processor to study the processor's behavior with various kinds of custom cooling setups. It was discovered that the 10-core "Comet Lake" die measures 206.1 mm² in die-area. It is 9.2 mm...

www.techpowerup.com

Intel 14 nm Node Compared to TSMC's 7 nm Node Using Scanning Electron Microscope

Currently, Intel's best silicon manufacturing process available to desktop users is their 14 nm node, specifically the 14 nm+++ variant, which features several enhancements so it can achieve a higher frequencies and allow for faster gate switching. Compare that to AMD's best, a Ryzen 3000 series...

www.techpowerup.com

With 10nm, Intel will probably achieve a gate density of 90 million transistors per square mm, compared to around 134 million per square mm on Apple/TSMC's 5nm process: https://en.wikipedia.org/wiki/5_nm_process

All this implies what you said -- it seems unlikely Apple/TSMC can cram 8 power cores and 16 GPU cores (much less 32) on a single die like the M1. Maybe in the future at 2 nanometers they could do this but probably not now. That in turn indicates they may use a dGPU integrated on the same SoC.

However they do it, roughly extrapolating from M1 benchmarks, it would appear an M1X (or whatever) with 8 power cores and 32 GPU cores would be significantly faster in single-core, multi-core and GPU perf. than an 18-core iMac Pro with a Vega 64 GPU. Rarely measured on benchmarks is video decode/encode performance. If the M1X has updated video accelerators it might be much faster on those tasks than the M1, which is already very speedy.

There is no reason to believe that Apple won’t use a significantly larger die. Yields on TSMC‘s 5nm are reportedly quite good. If Intel can produce a 200 mm² die there is no reason that TSMC can’t.

Serban55 · May 20, 2021

"For the new MacBook Pros, Apple is planning two different chips, codenamed Jade C-Chop and Jade C-Die: both include eight high-performance cores and two energy-efficient cores for a total of 10, but will be offered in either 16 or 32 graphics core variations."
So, it is hard not to believe this at this point...as other said...the 16 cores will be on par or slightly better than 5600M, but if you want more, you can choose the 32 cores , if its an option, because apple can bring the 16 core gpu only for the 14" and the 32 cores for the 16"
But the 14" to have a gpu power on par with 5600M..its significant , and for the 16" with 32 cores...even more impressive

CWallace · May 21, 2021

"Mt. Jade" is the rumored codename for the more powerful M SoC designed for the iMac (Pro) and MacBook Pro so this tracks with the SoC codenames being Jade C-Chop and Jade C-Die.

And yes, these might just be much larger SoCs than M1 with the 16/32 GPU cores being on-die if the yields warrant going that path and then "Lifuka" would be the off-die, but on-package 64/128 core GPU for the future Mac Pro.

leman · May 21, 2021

CWallace said:
"Mt. Jade" is the rumored codename for the more powerful M SoC designed for the iMac (Pro) and MacBook Pro so this tracks with the SoC codenames being Jade C-Chop and Jade C-Die.

And yes, these might just be much larger SoCs than M1 with the 16/32 GPU cores being on-die if the yields warrant going that path and then "Lifuka" would be the off-die, but on-package 64/128 core GPU for the future Mac Pro.

I saw some speculation that Jade C-Die is a large SoC, Jade C-Chop is the same thing with 16 GPU cores "cut off". The Mac Pro is then speculated to be built by putting up to 4 Jade C-Dies on a single package, which lines up with the rumor of a 40-core CPU + 128-core GPU. And Apple has some very interesting recent patents for interconnecting multiple chips on a die...

CWallace · May 21, 2021

Hmm, that's interesting, leman.

I am presuming the SoC with 32 GPU cores is "Jade-C Die" and then "Jade-C Chop" is the one with 16 cores, either by electrically disabling half the cores of the 32 model or using a different mask that only has 16 cores.

I would think Apple would not want to go multiple SoCs since it makes things more difficult, but the Mac Pro is meant to serve the "edgiest of edge cases" so for that specific application it might make sense - or at least more sense than trying to develop a 40-core SoC and a 128-core GPU.

dugbug · May 21, 2021

also this
2020 Apple Patent DOUBLE SIDE MOUNTED LARGE MCM PACKAGE WITH MEMORY CHANNEL LENGTH REDUCTION

Any rumors on M1X 16" dGPU?

macrumors member

macrumors Core

macrumors member

macrumors Core

macrumors 65816

macrumors Core

macrumors 65816

macrumors 68000

macrumors Core

Suspended

macrumors member

macrumors 68030

macrumors 68030

macrumors G5

macrumors 68000

macrumors 601

Suspended

macrumors G5

macrumors Core

macrumors G5

macrumors 68000

Our Staff