M3 Chip Generation - Discussion Megathread

Jamie I · Nov 9, 2023

Explore GPU advancements in M3 and A17 Pro - Tech Talks - Videos - Apple Developer

Learn how Dynamic Caching, the next-generation shader core, hardware-accelerated ray tracing, and hardware-accelerated mesh shading of...

developer.apple.com

diamond.g · Nov 9, 2023

Jamie I said:
Explore GPU advancements in M3 and A17 Pro - Tech Talks - Videos - Apple Developer

Learn how Dynamic Caching, the next-generation shader core, hardware-accelerated ray tracing, and hardware-accelerated mesh shading of...

developer.apple.com

The RT stage seems very similar to what Nvidia is doing. @leman

They explicitly call out use of über shaders, lol.

Confused-User · Nov 9, 2023

AmazingTechGeek said:
A reminder that you are “expected” to be respectful on the forum regardless of what point your making.

Thank you.

Why are you resurrecting something that's been dead for an age (in internet time)?

But since you are, what exactly do you think is wrong with this? I am using the EXACT SAME WORDS as the person I replied to. This is a rhetorical device designed to show someone that their position of privilege is actually no more privileged than other positions. Notably, it's not passive-aggressive as the original post was. And there was a pretty broad consensus in support of this.

Homy · Nov 9, 2023

Apple M3 Pro Takes The Throne of The Fastest CPU In PassMark Single-Thread Benchmark, 1% Faster Than Intel Core i9-14900K

Latest benchmarks of Apple's M3 CPUs are in and the M3 Pro specifically has secured the top position in PassMark's single-thread benchmark.

wccftech.com

thenewperson · Nov 9, 2023

I'm really curious why didn't up the frequency of the Max (16" at least like last time). I figured they'd started segmenting products that way like the other players but maybe it was just a test?

Pressure · Nov 9, 2023

Homy said:
Apple M3 Pro Takes The Throne of The Fastest CPU In PassMark Single-Thread Benchmark, 1% Faster Than Intel Core i9-14900K

Latest benchmarks of Apple's M3 CPUs are in and the M3 Pro specifically has secured the top position in PassMark's single-thread benchmark.

wccftech.com

And no mention of power draw anywhere in the article. Looking down in the comments section, yikes 🙈

leman · Nov 9, 2023

thenewperson said:
I'm really curious why didn't up the frequency of the Max (16" at least like last time). I figured they'd started segmenting products that way like the other players but maybe it was just a test?

Maybe they felt that M2 Max needs to be faster to offer good value compared to other offerings in the market. And with M3 they are confident that the performance is sufficient. And it is, given the fact that even the base M3 makes a good figure next to the fastest currently available desktop CPU...

leman · Nov 9, 2023

BTW, since now we have a bunch of folks with the M3 machines, would anyone be so kind to run my power/frequency tester tool on M3? The link is here: https://github.com/mr-mobster/AppleSiliconPowerTest

Thanks in advance!

caribbeanblue · Nov 10, 2023

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Pressure · Nov 10, 2023

caribbeanblue said:
https://developer.apple.com/videos/play/tech-talks/111375 Apple released an explainer video for GPU advancements with the A17 Pro and M3. Haven't watched the whole video yet they explain the new architecture as the new "Apple family 9 GPU".

I believe he thinks the CPU architecture is different between the A17 Pro and the M3 series.

scottrichardson · Nov 10, 2023

Jamie I said:
Explore GPU advancements in M3 and A17 Pro - Tech Talks - Videos - Apple Developer

Learn how Dynamic Caching, the next-generation shader core, hardware-accelerated ray tracing, and hardware-accelerated mesh shading of...

developer.apple.com

Just looking at Balders Gate 3 comparison in the Apple video.... that's a fairly substantial performance uplift in that game from the top M2 Max to the top M3 Max. Seems like the current benchmarks don't utilise the new tech in the Shader Cores very well, unlike Balder's Gate which does appear to do so? Looking at like 50fps vs 78fps. Which is a solid improvement.

Pressure · Nov 10, 2023

scottrichardson said:
Just looking at Balders Gate 3 comparison in the Apple video.... that's a fairly substantial performance uplift in that game from the top M2 Max to the top M3 Max. Seems like the current benchmarks don't utilise the new tech in the Shader Cores very well, unlike Balder's Gate which does appear to do so? Looking at like 50fps vs 78fps. Which is a solid improvement.

Larian Studios' CEO has even announced optimisation breakthroughs for Baldur's Gate 3.

altaic · Nov 10, 2023

Pressure said:
Larian Studios' CEO has even announced optimisation breakthroughs for Baldur's Gate 3.

I hope they have more planned than that. There’s a lot of low hanging fruit now.

diamond.g · Nov 10, 2023

altaic said:
I hope they have more planned than that. There’s a lot of low hanging fruit now.

Unless they rehire the Mac porting team I’m not sure we will see these improvements in macOS (any time soon).

Xiao_Xi · Nov 10, 2023

altaic said:
I hope they have more planned than that. There’s a lot of low hanging fruit now.

Can you give some examples?

MRMSFC · Nov 10, 2023

Pressure said:
And no mention of power draw anywhere in the article. Looking down in the comments section, yikes 🙈

Wccftech’s comment section is something else, I assume there’s no moderation. I’ve seen some pretty heinous posts in there, including plenty of racism.

jdb8167 · Nov 10, 2023

MRMSFC said:
Wccftech’s comment section is something else, I assume there’s no moderation. I’ve seen some pretty heinous posts in there, including plenty of racism.

The spelling alone is atrocious. Better not mention the non-existent level of technical knowledge displayed all while claiming that Mac users aren't computer literate. Pretty sad.

T'hain Esh Kelch · Nov 10, 2023

Wccftechs comment section is the result of Twitters blue checkmark users having a child with 4chan and Truth Social. It is sick.

name99 · Nov 10, 2023

caribbeanblue said:
https://developer.apple.com/videos/play/tech-talks/111375 Apple released an explainer video for GPU advancements with the A17 Pro and M3. Haven't watched the whole video yet but they explain the new architecture as the new "Apple family 9 GPU".

"Apple family 9" is nothing new. We knew that like the day the A17 was announced.
You can see these terms in the Metal Feature Sets PDF (which gets updated at random times, but generally a few days after each new architecture is revealed).

https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf

bcortens · Nov 10, 2023

name99 said:
"Apple family 9" is nothing new. We knew that like the day the A17 was announced.
You can see these terms in the Metal Feature Sets PDF (which gets updated at random times, but generally a few days after each new architecture is revealed).

https://developer.apple.com/metal/Metal-Feature-Set-Tables.pdf

The deep dive on how it works is new today...

mr_roboto · Nov 10, 2023

name99 said:
Why do you need a "clean" way to perform the chop?
The goal of the chop is to save some area. A chop does that even if it doesn't occur along a clean line.

But if the line isn't at least relatively clean, you probably won't be saving enough die area to make the chop worth anything.

The way you're talking about this hints that you haven't even looked at annotated die shots of the M3 Max, much less compared them to M2/M1 Max (where we can look for evidence of how they do layout to support a chop).

At the bottom of the M3 Max (the place where we have to try to imagine a chop existing), there are two rows of six GPU cores each. The only option would be to chop one entire row of six. The four remaining GPU cores that need to go can be explained as defect management, but when I do pixel based area analysis on the annotated image I'm using, I get these areas:

Full die: 1414w*1457h = 2.06m
1 GPU row chopped: 1414w*1357h = 1.92m

That works about to 7% area reduction. Is that even going to substantially increase the number of die you can fit on one wafer? Seems unlikely.

The next obvious thing: M1/M2 Max have a little bit of kerf separating the upper group of memory controllers on each side from the lower group. This kerf exists because the lower group gets chopped away in M1/M2 Pro. If M3 Max layout supported a chop, you'd expect a kerf 3/4 of the way down on each of the two blocks of memory controllers, but it isn't there.

I could go on and on about many other things which strongly argue against a chop. The one I'll end with is that M3 Max has 40 GPU cores and no spares are visible in the layout. If Apple was running different masks and wafers for full and chopped M3 Max, they'd have to scrap a large number of full M3 Max die due to defective GPU cores. I don't think that's even slightly plausible. They need a way to harvest some of the scrap, and the 30c GPU Max variants are how they do it. Nothing else makes sense.

Confused-User · Nov 11, 2023

mr_roboto said:
I could go on and on about many other things which strongly argue against a chop. The one I'll end with is that M3 Max has 40 GPU cores and no spares are visible in the layout. If Apple was running different masks and wafers for full and chopped M3 Max, they'd have to scrap a large number of full M3 Max die due to defective GPU cores. I don't think that's even slightly plausible. They need a way to harvest some of the scrap, and the 30c GPU Max variants are how they do it. Nothing else makes sense.

I have no opinion on the rest of your post, because I don't know if 7% area saved would outweigh the costs of making two dies. But this specific argument makes no sense. There is no reason Apple couldn't sell both chopped M3 Maxes, and full M3 Maxes with some cores and a memory bus disabled, as 30-core M3 Maxes.

Chancha · Nov 11, 2023

Someone posted these from Japanese twitter. All three 14" boards stripped.

https://twitter.com/x/status/1722364734887096653

https://twitter.com/x/status/1722369631191740881

Way too little could be deducted from these but this is the first SoC package shot of the M3 Max. And the poster didn't note if the Pro and Max are binned or not. The only thing that can be confirmed is the lost of 4 NAND solder spots for the M3 Pro board, from the total of 8 on the Max (used to have all 8 on M1 Pro M2 Pro).

altaic · Nov 11, 2023

Confused-User said:
I have no opinion on the rest of your post, because I don't know if 7% area saved would outweigh the costs of making two dies. But this specific argument makes no sense. There is no reason Apple couldn't sell both chopped M3 Maxes, and full M3 Maxes with some cores and a memory bus disabled, as 30-core M3 Maxes.

I did my own surgery on the Max die, and I came up with ~18% area reduction. That said, they’d have to have another (albeit much smaller) mask set. Ultimately, the only way I’d think this sort of production run would be good is if Apple is binning chips for yet to be announced devices.

mr_roboto · Nov 11, 2023

Confused-User said:
I have no opinion on the rest of your post, because I don't know if 7% area saved would outweigh the costs of making two dies.

It almost certainly wouldn't. One thing I only obliquely referenced in that post is that die have to be packed or "tiled" onto a wafer, and this has consequences.

Like everyone else, TSMC uses 300mm dia. wafers. This yields about 70,000mm^2 of usable area.

So how many M3 Max can you fit on one wafer? That's a hard question to answer because I haven't seen anyone reveal the actual measured size. I'm going to assume it's 500mm^2 because it should be a bit larger than past Max chips, and the M1 Max was supposedly about 430mm^2. Also, 500 makes the numbers for the first round of calculations nicer.

Speaking of, we'll ignore that the wafer is circular while the die is a rectangle. So we just divide 70000 by 500 and get a neat round number of 140 M3 Max per wafer. Let's further assume that the M3 Max is square. (It actually almost is, according to pixel counting, though the hidden Ultra Fusion interconnect will likely push it slightly more rectangular.) We have to lay the 140 die out in a grid, and since sqrt(140) is not a nice integer, let's bump the assumed wafer size slightly. That puts our baseline as 144 die in a 12*12 grid on a 72,000mm^2 square wafer. (Spherical cow, square wafer, same kinda thing...)

With that 144-die wafer as our baseline, how many extra die do we get once we chop 7% of the height?

Well... the answer is actually a big fat zero. A 500mm^2 die that's perfectly square is 22.36mm x 22.36mm, and the chop gives us 22.36 x 20.8 - we're saving 1.56mm of height. The total height saved is 12 rows * 1.56mm = 18.72mm. But we'd need to free up 20.8mm to add another row, so chopping 1.56mm per die gained us nothing.

Now obviously in the real world you're packing rectangles (not squares) onto a circle (not a square). So you lose a lot of area at the edges to the whole rectangle-packed-on-circle thing. There's also saw kerf laid out between the rows and columns because at some point you gotta singulate the wafer (cut it apart with a diamond saw), and so forth. Many complications I papered over. However, the basic geometric principle remains true: when packing objects that are relatively large into a given area, slightly reducing the size of those objects may not improve the number which can be packed at all.

(As die size goes down, the gains you get by reducing die size do start to more closely approximate what you'd naively expect just by dividing wafer area by die area.)

Confused-User said:
But this specific argument makes no sense. There is no reason Apple couldn't sell both chopped M3 Maxes, and full M3 Maxes with some cores and a memory bus disabled, as 30-core M3 Maxes.

I think you're being "technically correct" here. I'm making economic / practical arguments here, not talking about whether it's physically possible to do as you say.

The yield of perfect 40/40 core Maxes is bound to be fairly low as it's such an enormous die. They're going to have plenty of supply of binned down 30c GPU Maxes just by running the wafers required to supply them with 40c GPU Maxes, so why spend all that extra money developing and manufacturing a second mask set? Especially if the chop can only be a very small one, meaning it doesn't even yield them many (or any) extra die per wafer?

M3 Chip Generation - Discussion Megathread

macrumors newbie

macrumors G5

macrumors 6502a

macrumors 68030

macrumors 65816

macrumors 603

macrumors Core

macrumors Core

Cancelled

macrumors 603

macrumors 6502a

macrumors 603

macrumors 6502a

macrumors G5

macrumors 68000

macrumors 6502

macrumors 601

macrumors 604

macrumors 68030

macrumors 65816

macrumors 6502a

macrumors 6502a

macrumors 68030

macrumors 6502a

macrumors 6502a

Our Staff