That's a lot higher than I would have guessed.My guess 12.5% (1 out of 8)
Interestingly, it actually is viable (see my last post; I also thought it was too big to etch until I looked into it a bit further), but they won't do it b/c it's cost-prohibitive.A chip as big as an ultra isn’t really viable to make as a monolith. So either it’s fused M3 Maxes or other chiplet like strategy. But most likely the fused M3Max. They already have Deep Fusion and would want ROI on that.
They could, but that's an expensive redesign just for the Ultra, which is their lowest-volume chip. It's possible Apple may do this at some point, but I think they would do so only if they could use this design approach across the Mac product line.M3 Ultra could also come in the form of two dies: one with CPU, NPU cores, video CODECs, memory controller, etc connected to another die via UltraFushion, with GPU cores and memory controllers.
A monolithic M3 Ultra wouldn't be straight up twice the transistor count though. A lot of the blocks can be space optimised or entirely removed as there will be duplicates. Also you remove the entire UltraFusion bridge from both dies and all things that enable them to communicate across the dies. Seeing as the CPU and GPU clusters are now in close proximity on the same die you can probably reduce some local caches or the system level caches. Chop off two memory channels (512-bit to 384-bit) and use LPDDR5X to offset the memory bandwidth loss. That would also reduce the number of RAM chips needed from 8 to 6 while still be able to provide up to 288GB of unified memory (6 x 48GB).TLDR: They probably could make a monolithic M3 Ultra, but are extremely unlikely to do so.
DETAILS:
Ryan Smith of Anandtech estimates the M3 Max is ≈<400 mm^2, so a single-die (i.e., monolithic) M3 Ultra would be ≈< 800 mm^2. [ https://www.anandtech.com/show/2111...-family-m3-m3-pro-and-m3-max-make-their-marks ]
The reticle limit is the maximum chip size that can be etched. According to Anton Shilov of Anandtech, "The theoretical EUV reticle limit is 858 mm^2 (26 mm by 33 mm)". That would be enough for a monolithic M3 Ultra. [ https://www.anandtech.com/show/1887...ze-super-carrier-interposer-for-extreme-sips# ]
Indeed, we know dies >800 mm^2 size can be etched, since NVIDIA's (very expensive) GH100 GPU has a die size of 814 mm^2. [ https://developer.nvidia.com/blog/nvidia-hopper-architecture-in-depth/ ]
Even so, there are likely reasons Apple doesn't want to leverage this limit. E.g., it may be much more cost-effective to use already-designed Max chips, and link them together, than to design a separate chip just for the relatively low-volume Ultra Studio and Mac Pro.
My prediction is thus that the M3 Ultra with be 2x Max's, like the M1 and M2.
I don't think OP implies that there are separate M3 Max dies.Yeah. The premise of the question is inaccurate. There is only one M3 Max die. They just sell a variant with large parts disabled, likely due to yield.
No they weren't.I don't think OP implies that there are separate M3 Max dies.
I think he points out that M3 Pro and M3 Max chips are now completely separate chips, whereas M1 Pro / M2 Pro were rather highly binned versions of M1 Max / M2 Max.
So now that with the M3 family Apple has shown (for the worst in the case of M3 Pro) that they are ready to introduce more chip designs, the question is relevant.
Maybe not "highly binned" but definitely a cut out version: https://architosh.com/2021/10/apples-new-m1-pro-is-chop-version-of-m1-max-die/No they weren't.
They just shared a similar layout of the blocks. M1, M1 Pro and M2 Max were distinct separate dies. The same goes for the M2 family.
Maybe not "highly binned" but definitely a cut out version: https://architosh.com/2021/10/apples-new-m1-pro-is-chop-version-of-m1-max-die/
M1 vs M1Pro / M1Max on the other hand have very different designs.
Same for M2 vs M2Pro / M2Max
In the M3 Family the 3 designs are completely different.
Well I agree with you on the rationale. But I was also pretty sure that M3Pro would be a cut version of M3Max because that also seemed to make sense from a financial optimisation point of view.That might be so, but that's a very different concern from the actual chip production. Even though M1 Pro and Max share the same design, they are still two physically distinct chips with all that this entails. Just because Apple decided to separate the designs for M3 Pro and M3 Max does not mean that they will make a humongous monolithic chip.
All of the decisions until now were driven by financial optimisation. I don't really see Apple suddenly doing an U-turn here and building an outrageously expensive chip that will be plagued by low yields. Using two Max dies is more economical, and they already have the working technology for that.
It would be nice especially if two were used in Ultra configuration. For whom is the problem? ROI will be difficult. We are not talking laptop sales here but Mac Studio and MP sales minus the Mac studio buyers for which the M3 Max is sufficient. That small community may not be sufficient to cover the development. Apple has not shown any tendency to enter a contest with AMD/NVIDIA/Intel to get attention via expensive marketing stunts. Would be fun if they did but I think "fun" has left Apple long time ago.It would be nice with a new high-end SoC for desktop chips at around 600-650mm² offering even more compute.
Given the separate dies for the different M3's so far, what are the chances that the M3 Ultra is an entirely separate die as well?
M3 Ultra could also come in the form of two dies: one with CPU, NPU cores, video CODECs, memory controller, etc connected to another die via UltraFushion, with GPU cores and memory controllers.
It would be interesting if Apple made an ultra with all performance cores. Does it make sense to have so many efficiency cores on a desktop?
This would also help with product differentiation to make the studio more relevant.
It would be interesting if Apple made an ultra with all performance cores. Does it make sense to have so many efficiency cores on a desktop?
This would also help with product differentiation to make the studio more relevant.
Especially when, as a feature set, ALL the Ultra has to be is “the fastest Mac”. Doubling up is a cost effective way of getting to “the fastest Mac”.They could, but that's an expensive redesign just for the Ultra, which is their lowest-volume chip. It's possible Apple may do this at some point, but I think they would do so only if they could use this design approach across the Mac product line.
That's a good point that the transistors used specifically to interface with the bridge wouldn't be needed. Though I don't know how much space that would save.A monolithic M3 Ultra wouldn't be straight up twice the transistor count though. A lot of the blocks can be space optimised or entirely removed as there will be duplicates. Also you remove the entire UltraFusion bridge from both dies and all things that enable them to communicate across the dies. Seeing as the CPU and GPU clusters are now in close proximity on the same die you can probably reduce some local caches or the system level caches. Chop off two memory channels (512-bit to 384-bit) and use LPDDR5X to offset the memory bandwidth loss. That would also reduce the number of RAM chips needed from 8 to 6 while still be able to provide up to 288GB of unified memory (6 x 48GB).
Obviously there are reasons not to. The advantages of the medium sized dies fused together is yields and much better thermal control as the surface area is twice the size and heat spots are separated over the combined area. Not to mention power in numbers with the separate die shipping in a much larger quantity of products compared to the desktop chips.
It would be nice with a new high-end SoC for desktop chips at around 600-650mm² offering even more compute.
I made this visualisation of the reticle limit and the 432mm² M1 Max die only using 50.35% of that limit to show there are be plenty of room to increase die sizes.
View attachment 2318183
A monolithic M3 Ultra wouldn't be straight up twice the transistor count though. A lot of the blocks can be space optimised or entirely removed as there will be duplicates. Also you remove the entire UltraFusion bridge from both dies and all things that enable them to communicate across the dies.
Seeing as the CPU and GPU clusters are now in close proximity on the same die you can probably reduce some local caches or the system level caches.
Chop off two memory channels (512-bit to 384-bit) and use LPDDR5X to offset the memory bandwidth loss. That would also reduce the number of RAM chips needed from 8 to 6 while still be able to provide up to 288GB of unified memory (6 x 48GB).
Obviously there are reasons not to. The advantages of the medium sized dies fused together is yields and much better thermal control as the surface area is twice the size and heat spots are separated over the combined area. Not to mention power in numbers with the separate die shipping in a much larger quantity of products compared to the desktop chips.
It would be nice with a new high-end SoC for desktop chips at around 600-650mm² offering even more compute.
I made this visualisation of the reticle limit and the 432mm² M1 Max die only using 50.35% of that limit to show there are be plenty of room to increase die sizes.
View attachment 2318183