Will M3 Ultra be fused M3 Max, or?

ksj1 · Nov 27, 2023

Given the separate dies for the different M3's so far, what are the chances that the M3 Ultra is an entirely separate die as well?

Chuckeee · Nov 27, 2023

My guess 12.5% (1 out of 8)

bobcomer · Nov 27, 2023

Chuckeee said:
My guess 12.5% (1 out of 8)

That's a lot higher than I would have guessed.

casperes1996 · Nov 27, 2023

A chip as big as an ultra isn’t really viable to make as a monolith. So either it’s fused M3 Maxes or other chiplet like strategy. But most likely the fused M3Max. They already have Deep Fusion and would want ROI on that.

theorist9 · Nov 28, 2023

TLDR: They probably could make a monolithic M3 Ultra, but are extremely unlikely to do so.

DETAILS:
Ryan Smith of Anandtech estimates the M3 Max is ≈<400 mm^2, so a single-die (i.e., monolithic) M3 Ultra would be ≈< 800 mm^2. [ https://www.anandtech.com/show/2111...-family-m3-m3-pro-and-m3-max-make-their-marks ]

The reticle limit is the maximum chip size that can be etched. According to Anton Shilov of Anandtech, "The theoretical EUV reticle limit is 858 mm^2 (26 mm by 33 mm)". That would be enough for a monolithic M3 Ultra. [ https://www.anandtech.com/show/1887...ze-super-carrier-interposer-for-extreme-sips# ]

Indeed, we know dies >800 mm^2 size can be etched, since NVIDIA's (very expensive) GH100 GPU has a die size of 814 mm^2. [ https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/ ]

Even so, there are likely reasons Apple doesn't want to leverage this limit. E.g., it may be much more cost-effective to use already-designed Max chips, and link them together, than to design a separate chip just for the relatively low-volume Ultra Studio and Mac Pro. Plus, in addition to the added design costs, I suspect pushing the reticle limit could lead to yield issues and thus a very expensive chip.

My prediction is thus that the M3 Ultra with be 2x Max's, like the M1 and M2.

theorist9 · Nov 28, 2023

casperes1996 said:
A chip as big as an ultra isn’t really viable to make as a monolith. So either it’s fused M3 Maxes or other chiplet like strategy. But most likely the fused M3Max. They already have Deep Fusion and would want ROI on that.

Interestingly, it actually is viable (see my last post; I also thought it was too big to etch until I looked into it a bit further), but they won't do it b/c it's cost-prohibitive.

quarkysg · Nov 28, 2023

M3 Ultra could also come in the form of two dies: one with CPU, NPU cores, video CODECs, memory controller, etc connected to another die via UltraFushion, with GPU cores and memory controllers.

theorist9 · Nov 28, 2023

quarkysg said:
M3 Ultra could also come in the form of two dies: one with CPU, NPU cores, video CODECs, memory controller, etc connected to another die via UltraFushion, with GPU cores and memory controllers.

They could, but that's an expensive redesign just for the Ultra, which is their lowest-volume chip. It's possible Apple may do this at some point, but I think they would do so only if they could use this design approach across the Mac product line.

Pressure · Nov 28, 2023

theorist9 said:
TLDR: They probably could make a monolithic M3 Ultra, but are extremely unlikely to do so.

DETAILS:
Ryan Smith of Anandtech estimates the M3 Max is ≈<400 mm^2, so a single-die (i.e., monolithic) M3 Ultra would be ≈< 800 mm^2. [ https://www.anandtech.com/show/2111...-family-m3-m3-pro-and-m3-max-make-their-marks ]

The reticle limit is the maximum chip size that can be etched. According to Anton Shilov of Anandtech, "The theoretical EUV reticle limit is 858 mm^2 (26 mm by 33 mm)". That would be enough for a monolithic M3 Ultra. [ https://www.anandtech.com/show/1887...ze-super-carrier-interposer-for-extreme-sips# ]

Indeed, we know dies >800 mm^2 size can be etched, since NVIDIA's (very expensive) GH100 GPU has a die size of 814 mm^2. [ https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/ ]

Even so, there are likely reasons Apple doesn't want to leverage this limit. E.g., it may be much more cost-effective to use already-designed Max chips, and link them together, than to design a separate chip just for the relatively low-volume Ultra Studio and Mac Pro.

My prediction is thus that the M3 Ultra with be 2x Max's, like the M1 and M2.

A monolithic M3 Ultra wouldn't be straight up twice the transistor count though. A lot of the blocks can be space optimised or entirely removed as there will be duplicates. Also you remove the entire UltraFusion bridge from both dies and all things that enable them to communicate across the dies. Seeing as the CPU and GPU clusters are now in close proximity on the same die you can probably reduce some local caches or the system level caches. Chop off two memory channels (512-bit to 384-bit) and use LPDDR5X to offset the memory bandwidth loss. That would also reduce the number of RAM chips needed from 8 to 6 while still be able to provide up to 288GB of unified memory (6 x 48GB).

Obviously there are reasons not to. The advantages of the medium sized dies fused together is yields and much better thermal control as the surface area is twice the size and heat spots are separated over the combined area. Not to mention power in numbers with the separate die shipping in a much larger quantity of products compared to the desktop chips.

It would be nice with a new high-end SoC for desktop chips at around 600-650mm² offering even more compute.

I made this visualisation of the reticle limit and the 432mm² M1 Max die only using 50.35% of that limit to show there are be plenty of room to increase die sizes.

leman · Nov 28, 2023

There is no evidence for two separate M3 Max dies, it's likely binned/partially disabled die configurations. And there is also no reason to assume that M3 Ultra won't use the same dual-chip packaging as the Ultra chips before it.

Elusi · Nov 28, 2023

Yeah. The premise of the question is inaccurate. There is only one M3 Max die. They just sell a variant with large parts disabled, likely due to yield.

saintmac · Nov 28, 2023

Elusi said:
Yeah. The premise of the question is inaccurate. There is only one M3 Max die. They just sell a variant with large parts disabled, likely due to yield.

I don't think OP implies that there are separate M3 Max dies.
I think he points out that M3 Pro and M3 Max chips are now completely separate chips, whereas M1 Pro / M2 Pro were rather highly binned versions of M1 Max / M2 Max.

So now that with the M3 family Apple has shown (for the worst in the case of M3 Pro) that they are ready to introduce more chip designs, the question is relevant.

Pressure · Nov 28, 2023

saintmac said:
I don't think OP implies that there are separate M3 Max dies.
I think he points out that M3 Pro and M3 Max chips are now completely separate chips, whereas M1 Pro / M2 Pro were rather highly binned versions of M1 Max / M2 Max.

So now that with the M3 family Apple has shown (for the worst in the case of M3 Pro) that they are ready to introduce more chip designs, the question is relevant.

No they weren't.

They just shared a similar layout of the blocks. M1, M1 Pro and M2 Max were distinct separate dies. The same goes for the M2 family.

saintmac · Nov 28, 2023

Pressure said:
No they weren't.

They just shared a similar layout of the blocks. M1, M1 Pro and M2 Max were distinct separate dies. The same goes for the M2 family.

Maybe not "highly binned" but definitely a cut out version: https://architosh.com/2021/10/apples-new-m1-pro-is-chop-version-of-m1-max-die/

M1 vs M1Pro / M1Max on the other hand have very different designs.
Same for M2 vs M2Pro / M2Max
In the M3 Family the 3 designs are completely different.

leman · Nov 28, 2023

saintmac said:
Maybe not "highly binned" but definitely a cut out version: https://architosh.com/2021/10/apples-new-m1-pro-is-chop-version-of-m1-max-die/

M1 vs M1Pro / M1Max on the other hand have very different designs.
Same for M2 vs M2Pro / M2Max
In the M3 Family the 3 designs are completely different.

That might be so, but that's a very different concern from the actual chip production. Even though M1 Pro and Max share the same design, they are still two physically distinct chips with all that this entails. Just because Apple decided to separate the designs for M3 Pro and M3 Max does not mean that they will make a humongous monolithic chip.

All of the decisions until now were driven by financial optimisation. I don't really see Apple suddenly doing an U-turn here and building an outrageously expensive chip that will be plagued by low yields. Using two Max dies is more economical, and they already have the working technology for that.

saintmac · Nov 28, 2023

leman said:
That might be so, but that's a very different concern from the actual chip production. Even though M1 Pro and Max share the same design, they are still two physically distinct chips with all that this entails. Just because Apple decided to separate the designs for M3 Pro and M3 Max does not mean that they will make a humongous monolithic chip.

All of the decisions until now were driven by financial optimisation. I don't really see Apple suddenly doing an U-turn here and building an outrageously expensive chip that will be plagued by low yields. Using two Max dies is more economical, and they already have the working technology for that.

Well I agree with you on the rationale. But I was also pretty sure that M3Pro would be a cut version of M3Max because that also seemed to make sense from a financial optimisation point of view.

So Apple may surprise us again

iPadified · Nov 28, 2023

Pressure said:
It would be nice with a new high-end SoC for desktop chips at around 600-650mm² offering even more compute.

It would be nice especially if two were used in Ultra configuration. For whom is the problem? ROI will be difficult. We are not talking laptop sales here but Mac Studio and MP sales minus the Mac studio buyers for which the M3 Max is sufficient. That small community may not be sufficient to cover the development. Apple has not shown any tendency to enter a contest with AMD/NVIDIA/Intel to get attention via expensive marketing stunts. Would be fun if they did but I think "fun" has left Apple long time ago.

deconstruct60 · Nov 28, 2023

ksj1 said:
Given the separate dies for the different M3's so far, what are the chances that the M3 Ultra is an entirely separate die as well?

The M1 and M2 series were entirely separate dies for 'plain' , Mn Pro, and Mn Max . There is nothing particularly different now.

The previous Pro and Max shared a higher percentage of floorplan layout overhead costs , but they are different dies.

What may be true this time is that there are two different Max's. One with and one without Ultra Fusion ( that share an even higher percentage of floorplan layout overhead costs. )

P.S. as wafer costs go up is there is substantial amount of money being thrown away on wasted silicon. If the number of monotholithc Max deployments is 2x-4x as many as the Ultra that is a substantial amount of silicon that is completely and uttlerly useless to the folks buying those systems. In aggregate is wasted wafers in 100's , 1000's depending upon how high the monolithic Max die volume goes.

flybass · Nov 28, 2023

It would be interesting if Apple made an ultra with all performance cores. Does it make sense to have so many efficiency cores on a desktop?

This would also help with product differentiation to make the studio more relevant.

deconstruct60 · Nov 28, 2023

quarkysg said:
M3 Ultra could also come in the form of two dies: one with CPU, NPU cores, video CODECs, memory controller, etc connected to another die via UltraFushion, with GPU cores and memory controllers.

Decoupling the CPU/NPU cores from the GPU cores doesn't buy a whole lot if Apple is trying to get a high degree of reuse . It would make more sense to slice off the external (to package) I/O from the 'compute' context. Almost nobody needs 12 or 16 Thunderbolt ports , but a substantive number will want either more CPU or GPU cores.

Folks keep trying to 'invent' a backdoor reason for a dGPU. The way Apple has optimized things it doesn't make sense. Once get to the "max like' scale the GPU portion of the die is the bulk of the die. Only a subset of the memory controllers would be on the relatively much smaller 'CPU/NPU/ I/O ' slice. That is just going to make memory access all the more asymmetric ( not less). The more asymetric the more will strees the bandwidth and latency of the UltraFusion connection.

If doing a functional decomposition into chiplets , it only makes economic sense if can scale up the number of usages of the chiplets components. If Apple is sticking with just two die for the whole system then there is really no functional decomposition to do at all. It is really just a monolithic chip with a 'side car' tacked on to solely get to 'two chip' set up.

The more that analog and SRAM/cache scaling detaches from logic (compute circuit) scaling there will be more pressure to more the the memory controller (and tightly coupled Large System cache ) off onto I/O chiplet(s) also.
I suspect Apple will delay that for a long as can get away with though. But the 'common' I/O like PCI-e , USB , Thunderbolt , eDP ... that stuff 'needs' future bleeding edge fab processes like another hole in the head. And desktops need more of that than limited side edge space laptops do. There is the major disconnect with the Mac laptop line up. That's where the 'chiplet adjustment' should be.

deconstruct60 · Nov 28, 2023

flybass said:
It would be interesting if Apple made an ultra with all performance cores. Does it make sense to have so many efficiency cores on a desktop?

How much more are you going to pay? It is not like exchanging the same amount of die area for that substitution. An even more expensive package for an even smaller set of users is pretty likely a pricing death spiral.

In addition, primarily things will not work for a GUI operation system if it does not have any GPU cores to drive the GUI. All 'P cores' is not targeting macOS is any effective way at all.

E cores can be thrown at stuff like software RAID, network storage overhead, and other tasks on systems with even more 'background' overhead . ( open activity monitor and look at the number of processes on a typical macOS instance. there is lots more than the one app that might be running on the screen . ) The M3 Max 12-4 ratio ( 3:1) is actually higher than the M2 Max 8-4 ratio ( 2:1). So probably didn't really spend more die area budget on the E cores here at all.

There is substantive upside in putting 'scut work' stuff into the local L2 cache of the E cores that helps keep P core L2 caches 'cleaner'. Don't really want P cores on 'scut work' tasks.

Tossing out the E cores isn't going to save any substantive space (relative the other major core cluster types). Probably wouldn't even get you another display controller. So to go to all P cores would have to eat into something else's die area budget. The OS already (and a substantial number of apps) has a substantial amount of task delegation built into it so not like run into major problems trying to engage E cores when should not.

flybass said:
This would also help with product differentiation to make the studio more relevant.

The M3 Max is better than M2 Max. And the Ultra is extremely likely going to get the same multiple over the M2 version. That is more maketspace relevant. It doesn't get Apple a 'Threadripper killer' SoC , but Apple Mac Stduio doesn't need a Threadripper killer SoC.

leman · Nov 28, 2023

flybass said:
It would be interesting if Apple made an ultra with all performance cores. Does it make sense to have so many efficiency cores on a desktop?

This would also help with product differentiation to make the studio more relevant.

Ultra with all performance cores would likely be same speed or slower. E-cores are fairly small. Not to mention that it wouldn’t work with Apples cluster design.

Unregistered 4U · Nov 28, 2023

theorist9 said:
They could, but that's an expensive redesign just for the Ultra, which is their lowest-volume chip. It's possible Apple may do this at some point, but I think they would do so only if they could use this design approach across the Mac product line.

Especially when, as a feature set, ALL the Ultra has to be is “the fastest Mac”. Doubling up is a cost effective way of getting to “the fastest Mac”.

theorist9 · Nov 28, 2023

Pressure said:
A monolithic M3 Ultra wouldn't be straight up twice the transistor count though. A lot of the blocks can be space optimised or entirely removed as there will be duplicates. Also you remove the entire UltraFusion bridge from both dies and all things that enable them to communicate across the dies. Seeing as the CPU and GPU clusters are now in close proximity on the same die you can probably reduce some local caches or the system level caches. Chop off two memory channels (512-bit to 384-bit) and use LPDDR5X to offset the memory bandwidth loss. That would also reduce the number of RAM chips needed from 8 to 6 while still be able to provide up to 288GB of unified memory (6 x 48GB).

Obviously there are reasons not to. The advantages of the medium sized dies fused together is yields and much better thermal control as the surface area is twice the size and heat spots are separated over the combined area. Not to mention power in numbers with the separate die shipping in a much larger quantity of products compared to the desktop chips.

It would be nice with a new high-end SoC for desktop chips at around 600-650mm² offering even more compute.

I made this visualisation of the reticle limit and the 432mm² M1 Max die only using 50.35% of that limit to show there are be plenty of room to increase die sizes.

View attachment 2318183

That's a good point that the transistors used specifically to interface with the bridge wouldn't be needed. Though I don't know how much space that would save.

But I disagree with your proposed reduction in memory channels. As the Ultra has twice the GPU cores of the Max, it needs twice the bandwidth. Further, a key limitation of the Ultra is that it restricts Apple's top workstations to 192 GB RAM, which is not enough for some tasks. The switch to LPDDR5x should be used to increase both bandwidth and max RAM (and that's what I expect Apple to do), not as an opportuntity to reduce channels.

deconstruct60 · Nov 28, 2023

Pressure said:
A monolithic M3 Ultra wouldn't be straight up twice the transistor count though. A lot of the blocks can be space optimised or entirely removed as there will be duplicates. Also you remove the entire UltraFusion bridge from both dies and all things that enable them to communicate across the dies.

Not sure there is a real distinction between UltraFusion bridge and the inter-die communication. That is pretty much what UltraFusion does. It is a bit too narrow to label UltraFusion just as the static connector lanes/pads without the SERDES constructs and "internal network interface that go with them.

The 'redundant' blocks are the security enclase , SSD controller and a few other bits like that. (and perhaps the eDP connector for embedded display ... more vestigial than 'redundant' ). Most of the I/O is being used in Ultra also ( system port counts go up on Ultra Systems. I/O up. , etc. ).

Pressure said:
Seeing as the CPU and GPU clusters are now in close proximity on the same die you can probably reduce some local caches or the system level caches.

Not really at all. Those larger local caches go a long way in keeping the inter cluster communication at high bandwidth and low latency; even on the same die. Toss those caches and tossing top end performance also.
The caches are to primarily avoid going all the way out to DDR RAM ... not to avoid going to another die in a Ultra configuration. Remove the 'other die' and still dealing with same DDR RAM latencies and bandwidth constraints.

Pressure said:
Chop off two memory channels (512-bit to 384-bit) and use LPDDR5X to offset the memory bandwidth loss. That would also reduce the number of RAM chips needed from 8 to 6 while still be able to provide up to 288GB of unified memory (6 x 48GB).

Those memory channels are not 'redundant'. If chuck those from the die than chucking performance from the die also. It is already in the case folks hurl 'but it is not a 4090 or 7900XTX ' at the Apple offerings. Throwing more memory bandwidth out the window isn't going to help much. ( Yes Apple made some trade offs to make the M3 Pro do more with 'less' , but it isn't whipping the respective dGPUs it competes with either. )

Pressure said:
Obviously there are reasons not to. The advantages of the medium sized dies fused together is yields and much better thermal control as the surface area is twice the size and heat spots are separated over the combined area. Not to mention power in numbers with the separate die shipping in a much larger quantity of products compared to the desktop chips.

AT the prices Apple charges the yield difference probably don't matter much. What probably matters more is just getting the project funded in the first place. Apple is looking for margins. Extremely likely, they are not looking to make chips with run rates in the very low 100's of thousands ( or less).

The MBP 14"/16" Maxes help pay for the Max. That's way UltraFusion is a bit of 'side car' tacked onto a monolthic chip. All those folks who didn't use UltraFusion at all helped pay for UltraFusion development for the first two iterations.

Using the same die in multiple products is a primary method in Apple's silicon efforts. Economies of scale matters. That is more major problem with the Ultra is 'scale'. Forked completely off by itself there likely is no 'scale'. Apple likely would need some 'desktop Max' volume to go along with the 'Ultra' volume to have enough 'scale' to get something of a mild fork done.

Pressure said:
It would be nice with a new high-end SoC for desktop chips at around 600-650mm² offering even more compute.

I made this visualisation of the reticle limit and the 432mm² M1 Max die only using 50.35% of that limit to show there are be plenty of room to increase die sizes.

View attachment 2318183

A maximum of 4 thunderbolt ports probably isn't going to work for a MacPro / Ultra Studio.

Will M3 Ultra be fused M3 Max, or?

macrumors 6502

macrumors 68040

macrumors 601

macrumors 604

macrumors 68040

macrumors 68040

macrumors 65816

macrumors 68040

macrumors 603

macrumors Core

macrumors regular

macrumors member

macrumors 603

macrumors member

macrumors Core

macrumors member

macrumors 68020

macrumors G5

macrumors regular

macrumors G5

macrumors G5

macrumors Core

macrumors G4

macrumors 68040

macrumors G5

Our Staff