M3 Chip Generation - Discussion Megathread

Macintosh IIcx · Sep 16, 2023

AgentMcGeek said:
Another thing we need to keep in mind is that the 3nm process node that Apple is (exclusively) using is not good.

You state that as a fact and I don’t think we have evidence to support that claim. There were problems with yield early on during the development phase, but they have reportedly been largely fixed. Is a new process perfect from the start? Most likely not, but the notion that 3nm is somehow flawed is simply incorrect, IMHO.

We will see how it fares with the big M3 Max chip, that will probably be that best chip to judge TMSC’s 3nm node on.

AgentMcGeek · Sep 16, 2023

T'hain Esh Kelch said:
N3B *is* N3.

Boil said:
N3B is the first 3nm process; Intel backed out of their deal with TSMC and Apple took up the slack...

I wouldn't say it was "not good", more like "less than optimal"...

N3E is the follow-up to N3B; N3S & N3P after that; and N3X is the final 3nm variant, designed for higher power draw & clock speeds...

Yes apologies, I meant to say N3E not B!

name99 · Sep 16, 2023

BTW it appears that the first round of A17 GB6 results were dramatically lower than they "should" have been.
Look at

iPhone16,1 vs iPhone 14 Pro Max - Geekbench

or

Geekbench Search - Geekbench

I'm not sure what to make of this. Do some reviewers already have models but are still under embargo? Were the first reviewers running GB6 the moment they opened the phones (so that it was running while the phone was still doing initial setup and backup-restore stuff?)
Hell, maybe even this "iOS17.0" that they all claim to be running is not the same? Maybe they ship with a iOS17.0xxx and as soon as you go through the usual updates you get a 17.0yyy that, among other things, flips some chicken bits? I've no idea how leading edge reviewer phones are!

Certainly the newer (~20% improvement) results don't look obviously faked. If there is any pattern it's that stuff I would expect to depend more of NEON/FP gets more of a boost, so 5th NEON unit?
The obvious next config (IMHO!) would be
10-wide decode
[connected by queue, not just direct coupling as in earlier designs]
8-wide rename
5x NEON,
8x INT (including 3 branch units)
probably still 4x LD/ST but MAYBE switch from the current 2x LD, 1x ST, 1x ambidexstrous to 2x LD, 2x ambidextrous?

point of the 10-wide 8-wide is that
(a) decode is easy, rename is hard. Might as well do as extra decode and store it in a queue as a buffer against when there is less work to decode because of earlier branches.
(b) some decode results in no work for rename (eg NOPs and fusions) so a 10-wide decode results in something like an average of 8-wide clients for rename anyway

Already a recent design at HotChips (I can't remember, maybe the newest ARM? maybe the Veyron) has this similar wider decode than rename, so it's not like it's my crazy idea; it's the logical next step given aggressive fusion.

There's probably also interesting other work in other parts of the CPU (for example there's a weird "pattern" cache patent which makes no sense to me except as a fancy way of describing a zero-content cache, but a zero-content cache is in fact a nice addition!) And as I've said there are multiple patents suggesting big changes in virtualization (probably? not relevant to phone -- though "work phone vs home office" profile?) but part of these are large pages which will boost even some phone workloads. Of course with any patent, who knows if it's implemented this year or next year?

name99 · Sep 16, 2023

BTW also some various in the Metal benchmarks. Only two of those, so again maybe we will see a performance boost over the next few days with (? less background work as the test is run? updated OS and Metal code?)

iPhone16,1 vs iPhone 14 Pro - Geekbench

Superficially you might think 20% avg boost is basically 16% from one more core and the rest from maybe GHz, but look at how much variation there is across the sub-benchmarks!
Looks like in fact an awful lot has changed, so substantially new design as Apple claimed.

Retskrad · Sep 16, 2023

Why does Apple make the pest performance/watt CPU in the world but when it comes to GPU’s they can’t even compete with Qualcomm? Qualcomms equivalent chip to the A16 was already 20% faster and the A17 Pro is closing the gap with Qualcomms previous generation GPU

jeanlain · Sep 16, 2023

Retskrad said:
Why does Apple make the pest performance/watt CPU in the world but when it comes to GPU’s they can’t even compete with Qualcomm? Qualcomms equivalent chip to the A16 was already 20% faster and the A17 Pro is closing the gap with Qualcomms previous generation GPU

According to https://www.notebookcheck.net/Qualcomm-Adreno-740-GPU-Benchmarks-and-Specs.669947.0.html, the Samsung S23 SoC consumes about 12W during GFXBench to achieve 142 fps, while the A16 consumes a bit less than 6W to achieve 133 fps.
The Sony Xperia 1 V consumes about 6.5 W and achieves 82 fps.
So it looks like the A16 is quite a bit more power efficient.

name99 · Sep 16, 2023

Retskrad said:
Why does Apple make the pest performance/watt CPU in the world but when it comes to GPU’s they can’t even compete with Qualcomm? Qualcomms equivalent chip to the A16 was already 20% faster and the A17 Pro is closing the gap with Qualcomms previous generation GPU

Making a faster GPU is "trivial"; either use more area or run it faster (use more energy).
Of course that's not exactly true, but it's first order true in a way that's not nearly as easy to make a single-threaded CPU fast.

Additionally, the impression I get (which may be wholely false!) is that Apple are trying to add a larger amount of compute (functionality and performance) to their GPU than are their mobile competitors. In terms of "OS-like" functionality, eg arbitrary pre-emotion, MMU functionality, "virtualization" (ie simultaneous, but segregated, execution of tasks from different processes) etc, even nVidia only attained this functionality about five years ago, and they are still tweaking it; and Apple more or less matches them (except the big-iron functionality like Multi-Instancing.

But it is true that, as far as I can tell, as regards smarts in the GPU design Apple are more like contemporaries with nVidia than far ahead as they are compared to anyone else in CPU-land. There are many Apple GPU good ideas not yet used by nVidia – but the reverse is just as true. I don't know enough about Adreno to have an opinion, but my *guess* is that they get ahead by area and/or power, not by smarts to the extent that nVidia and Apple do; and perhaps by doing more in fixed hardware and less in generic, ie GPU Compute, hardware.

jeanlain has already discussed the power issue.
If we look at the generic compute issue, the best (very unsatisfactory!) that we can do is note that the Adreno 740 gets 7853 on GB6 Compute (OpenCL), while the M1-7-core gets 18355, ie more than 2x for 7 cores (vs 4 on the A14, let alone the substantially improved A16), and that on OpenCL not even on Metal.

In other words, QC use more energy to do one thing (graphics); Apple strive harder to save energy, and try to do more different things (also compute).

dgdosen · Sep 16, 2023

name99 said:
BTW it appears that the first round of A17 GB6 results were dramatically lower than they "should" have been.
Look at

iPhone16,1 vs iPhone 14 Pro Max - Geekbench

or

Geekbench Search - Geekbench

I'm not sure what to make of this. Do some reviewers already have models but are still under embargo? Were the first reviewers running GB6 the moment they opened the phones (so that it was running while the phone was still doing initial setup and backup-restore stuff?)
Hell, maybe even this "iOS17.0" that they all claim to be running is not the same? Maybe they ship with a iOS17.0xxx and as soon as you go through the usual updates you get a 17.0yyy that, among other things, flips some chicken bits? I've no idea how leading edge reviewer phones are!

Certainly the newer (~20% improvement) results don't look obviously faked. If there is any pattern it's that stuff I would expect to depend more of NEON/FP gets more of a boost, so 5th NEON unit?
The obvious next config (IMHO!) would be
10-wide decode
[connected by queue, not just direct coupling as in earlier designs]
8-wide rename
5x NEON,
8x INT (including 3 branch units)
probably still 4x LD/ST but MAYBE switch from the current 2x LD, 1x ST, 1x ambidexstrous to 2x LD, 2x ambidextrous?

point of the 10-wide 8-wide is that
(a) decode is easy, rename is hard. Might as well do as extra decode and store it in a queue as a buffer against when there is less work to decode because of earlier branches.
(b) some decode results in no work for rename (eg NOPs and fusions) so a 10-wide decode results in something like an average of 8-wide clients for rename anyway

Already a recent design at HotChips (I can't remember, maybe the newest ARM? maybe the Veyron) has this similar wider decode than rename, so it's not like it's my crazy idea; it's the logical next step given aggressive fusion.

There's probably also interesting other work in other parts of the CPU (for example there's a weird "pattern" cache patent which makes no sense to me except as a fancy way of describing a zero-content cache, but a zero-content cache is in fact a nice addition!) And as I've said there are multiple patents suggesting big changes in virtualization (probably? not relevant to phone -- though "work phone vs home office" profile?) but part of these are large pages which will boost even some phone workloads. Of course with any patent, who knows if it's implemented this year or next year?

If these newer scores are real - we've got a current gen phone SoC beating a -2nd (-3rd?) gen computer SoC handily in SC, and pretty close in MC.

Bodes well for the M3 and whatever they can cram it into. Hopefully a reborn 12" Macbook Pro or the like.

sunny5 · Sep 16, 2023

Apple’s A17 Pro Might Just Be An Improved A16 Bionic, With Clock Speed Bumps & Optimizations, According To Codenames Leak

A leak from Weibo showing the codenames of different SoC cores reveals that the A17 Pro might be an improved version of the A16 Bionic

wccftech.com

It seems A17 Pro might be another A16 with more optimization. Leaked performance already seems disappointed despite having a new nm, architecture, and higher clock speed.

NT1440 · Sep 16, 2023

sunny5 said:
Apple’s A17 Pro Might Just Be An Improved A16 Bionic, With Clock Speed Bumps & Optimizations, According To Codenames Leak

A leak from Weibo showing the codenames of different SoC cores reveals that the A17 Pro might be an improved version of the A16 Bionic

wccftech.com

It seems A17 Pro might be another A16 with more optimization. Leaked performance already seems disappointed despite having a new nm, architecture, and higher clock speed.

It’s still a completely revamped GPU, along with a new micro architecture for the performance and low power cpu cores.

This is the new floor, I expect big improvements over the next two cycles.

sunny5 · Sep 16, 2023

NT1440 said:
It’s still a completely revamped GPU, along with a new micro architecture for the performance and low power cpu cores.

This is the new floor, I expect big improvements over the next two cycles.

If Apple is so sure about the CPU/GPU improvements, they should've made a performance graph but this time, they never did.

jeanlain · Sep 16, 2023

sunny5 said:
If Apple is so sure about the CPU/GPU improvements, they should've made a performance graph but this time, they never did.

When was the last time they did?

sunny5 · Sep 16, 2023

jeanlain said:
When was the last time they did?

1663070375_apple_a16_vs_a13_vs_competition_cpu_perf.jpg

A16 which is last year.

jeanlain · Sep 16, 2023

sunny5 said:
View attachment 2265500
A16 which is last year.

Was that on the iPhone page? It seems strange to compare to the A13.

They may not have made a comparison chart, but they give numbers (10% faster CPU, etc.). That's basically the same.

JordanNZ · Sep 16, 2023

sunny5 said:
Apple’s A17 Pro Might Just Be An Improved A16 Bionic, With Clock Speed Bumps & Optimizations, According To Codenames Leak

A leak from Weibo showing the codenames of different SoC cores reveals that the A17 Pro might be an improved version of the A16 Bionic

wccftech.com

It seems A17 Pro might be another A16 with more optimization. Leaked performance already seems disappointed despite having a new nm, architecture, and higher clock speed.

"Disappointing"? In what world would these https://browser.geekbench.com/search?k=v6_cpu&q=iphone16&utf8=✓ be "Disappointing"?

We already know it's not just an A16 core clocked up. Apple said so themselves in the announcement, and the Geekbench subtests show that.

leman · Sep 17, 2023

Retskrad said:
Why does Apple make the pest performance/watt CPU in the world but when it comes to GPU’s they can’t even compete with Qualcomm? Qualcomms equivalent chip to the A16 was already 20% faster and the A17 Pro is closing the gap with Qualcomms previous generation GPU

I was wondering about this same thing some time ago, but I was not able to find any reliable info about Qualcomm Adreno. For example, Wikipedia quotes some really high FP16/FP32 figures for these GPUs, which are not at all consistent with reality. What's also very interesting is that Adreno 740 is a total bust in GB compute, barely outperforming Apple's A12.

Here are my entirely uneducated guesses based on the little info I could find:

- Qualcomm appears to be running very wide SIMD (Vulkan subgroups size of 64/128) at low frequencies, which would allow high compute/area and watt, but won't work that well on more complex code (due to divergence)
- Qualcomm manuals warn that floating point conformance will incur a high performance overhead, and that should can use fast "native" point operations if one wants max performance. It's very safe to assume they use "native" operations fro graphics, where precise results are not that important, compromising accuracy for performance. Would also explain why the compute results are so poor. And this idea is also indirectly supported by low image quality results for newer Adreno GPUs on GFXbench (but I'd take them with a grain of salt).

To sum it up, Qualcomm is likely using a GPU that's designed to be as fast as possible for (simpler needs of) mobile graphics, but sacrifices accuracy and ability to run more general-purpose tasks. Apple on the other hand wants to develop a general purpose GPU that performs well across a large variety of complex tasks.

leman · Sep 17, 2023

Also, please ignore sunny5. It's his habit to wake up once per two weeks and pour out a bucket of random nonsense on our heads. I am not even sure he stays around to enjoy the following confusion. Some people just want to know that the world is burning, they don't even need to watch.

sack_peak · Sep 17, 2023

A question to anyone reading this.

Does the manufacturer's or user's benchmark influence your purchase more than say a non-chip feature like better speakers, screens, etc?

In my case I buy per app spec, based on a schedule or if it is cheaper to replace outright.

My application has not changed since 2015 for my 2012 iMac 27" so I've managed to delay replacement as late as Valentine's Day 2023 but because Apple has yet to offer a larger than iMac 24" M3 I am still stuck here as I am inflexible with any other configuration.

thenewperson · Sep 17, 2023

sack_peak said:
Does the manufacturer's or user's benchmark influence your purchase more than say a non-chip feature like better speakers, screens, etc?

Nah. I get iPhones because I think iOS is the best mobile platform currently (and I think in the future based on what I keep seeing on the Android side lol). Hardware in general doesn't really influence my decisions as much as software does.

leman · Sep 17, 2023

sack_peak said:
A question to anyone reading this.

Does the manufacturer's or user's benchmark influence your purchase more than say a non-chip feature like better speakers, screens, etc?

In my case I buy per app spec, based on a schedule or if it is cheaper to replace outright.

My application has not changed since 2015 for my 2012 iMac 27" so I've managed to delay replacement as late as Valentine's Day 2023 but because Apple has yet to offer a larger than iMac 24" M3 I am still stuck here as I am inflexible with any other configuration.

Everybody has different needs. I don't really care about the performance of my phone (I ordered a new 15 Pro because my phone is old and broken, and I like the new features like brighter screen, dynamic island, and USB-C port), but performance is most important thing to me in my laptop, as it directly impacts my work.

jeanlain · Sep 17, 2023

JordanNZ said:
We already know it's not just an A16 core clocked up.

Yes, but what about the performance cores? Are they different from the A16's beside clock speeds?

leman · Sep 17, 2023

jeanlain said:
Yes, but what about the performance cores? Are they different from the A16's beside clock speeds?

Apple claims they are. And there are plenty of GB results that show improved IPC on A17.

Gudi · Sep 17, 2023

sunny5 said:
If Apple is so sure about the CPU/GPU improvements, they should've made a performance graph but this time, they never did.

Apple are so unsure about GPU improvements that they showed a side-by-side comparison of the same game running with and without hardware accelerated RTX − on a phone!📱

Gudi · Sep 17, 2023

thenewperson said:
Hardware in general doesn't really influence my decisions as much as software does.

That’s a fallacy. Neither runs without the other. The only way the software can be superior is because of superior hardware. Without ARM chips iPhones would be thick as a brick and won’t run without a fan.

And without Apple’s own chip design team there would be no 64-bit, no dual-core, no efficiency/performance cores, no security enclave, no image signal processor, no neural engine, no sensors co-processor, no 120 hertz promotion etc. etc. etc.

souko · Sep 17, 2023

sack_peak said:
A question to anyone reading this.

Does the manufacturer's or user's benchmark influence your purchase more than say a non-chip feature like better speakers, screens, etc?

In my case I buy per app spec, based on a schedule or if it is cheaper to replace outright.

My application has not changed since 2015 for my 2012 iMac 27" so I've managed to delay replacement as late as Valentine's Day 2023 but because Apple has yet to offer a larger than iMac 24" M3 I am still stuck here as I am inflexible with any other configuration.

For me hardware improvements influences when I will upgrade. When it does make sense for my usage. Slightly better screen is not enough. But oled and 120Hz was enough with camera improvements.
Next upgrade will be with better camera, bigger screen and some other features that affects my usage. (I use 6.1” and want 6.7” in my next phone).

On the other hand. Next upgrade for my Mac will be probably with M3 as I need more than 16GB RAM and will use better cpu and gpu.

M3 Chip Generation - Discussion Megathread

macrumors 6502a

macrumors 6502

macrumors 68030

macrumors 68030

macrumors regular

macrumors 68020

macrumors 68030

macrumors 68030

Suspended

macrumors P6

Suspended

macrumors 68020

Suspended

macrumors 68020

macrumors 6502a

macrumors Core

macrumors Core

Suspended

macrumors 65816

macrumors Core

macrumors 68020

macrumors Core

Suspended

Suspended

macrumors 6502

Our Staff