M4+ Chip Generation - Speculation Megathread [MERGED]

falainber · May 9, 2024

name99 said:
So you consider addition of SME "not indicative of a new [CPU] architecture"?
OK then.

No. Does anyone?

mr_roboto · May 9, 2024

falainber said:
M4 is not a full fledged new generation product. While Apple is free to use whatever naming scheme they want, the reality is that M4 does not use a new architecture. It uses a slightly modified (to boost yield and reduce cost) tech process. All semi companies but Apple simply refused to use N3B because it was a dud. Apple was in a pickle because they had to stick with the annual iPhone release schedule. But they knew the process was bad, so, while still working on the N3B based chips they probably started re-spinning the layout for N3E. The small time gap between M3 and M4 is not an indication of some sort of acceleration of the development/release cycle.

falainber said:
Clock speed depends on many factors, not just architecture. Apple explains what's new in M4 here and new architecture is not listed. Unless Apple is so modest as to not mention the new CPU architecture (while boasting about the new display engine), it's not a new architecture.

The sentence you highlighted in your link ("M4 builds on the GPU architecture of M3...") concerns the GPU, not the CPU. Just above it is a sentence where they promise a big CPU performance boost.

Also, Apple marketing is not exactly where I go for reliable technical information about Apple's CPU microarchitecture. It's not exactly the kind of thing they put a lot of effort into. We'll know a lot more once Apple updates their CPU Optimization Guide - a developer oriented doc - with all the details on M4.

Sign In - Apple

developer.apple.com

I think it's funny that you claim M4 isn't a full new generation product, yet at the same time you acknowledge clock speed depends on many factors. One of those factors is... microarchitecture! Clock speed isn't exclusively a product of process node.

Also funny is that we do have reasonably good evidence that CPU microarchitecture did change. If M4's CPU was just M3 ported to N3E plus the end result of AMX morphing into SME, you would expect all non-ML single core test results to land at about 4.40 GHz / 4.05 GHz = 1.086 = 108.6% in this comparison:

iPad16,6 vs Mac15,12 - Geekbench

browser.geekbench.com

What we actually see: some are clustered around 108%, others very much aren't, and some of the ones which aren't are definitely not tests you'd expect SME to be used in. (For example, Navigation and HTML5 Browser.) There's also some which fail to reach the 108% bar. This is exactly the kind of spread you expect to see with a CPU uarch update - some things fared better than others.

In general when I see your posts I know exactly what to expect: all Apple's grapes are sour, nothing they do is any good. You've got one note to play, and you play it a lot!

falainber · May 9, 2024

mr_roboto said:
The sentence you highlighted in your link ("M4 builds on the GPU architecture of M3...") concerns the GPU, not the CPU. Just above it is a sentence where they promise a big CPU performance boost.

Also, Apple marketing is not exactly where I go for reliable technical information about Apple's CPU microarchitecture. It's not exactly the kind of thing they put a lot of effort into. We'll know a lot more once Apple updates their CPU Optimization Guide - a developer oriented doc - with all the details on M4.

Sign In - Apple

developer.apple.com

I think it's funny that you claim M4 isn't a full new generation product, yet at the same time you acknowledge clock speed depends on many factors. One of those factors is... microarchitecture! Clock speed isn't exclusively a product of process node.

Also funny is that we do have reasonably good evidence that CPU microarchitecture did change. If M4's CPU was just M3 ported to N3E plus the end result of AMX morphing into SME, you would expect all non-ML single core test results to land at about 4.40 GHz / 4.05 GHz = 1.086 = 108.6% in this comparison:

iPad16,6 vs Mac15,12 - Geekbench

browser.geekbench.com

What we actually see: some are clustered around 108%, others very much aren't, and some of the ones which aren't are definitely not tests you'd expect SME to be used in. (For example, Navigation and HTML5 Browser.) There's also some which fail to reach the 108% bar. This is exactly the kind of spread you expect to see with a CPU uarch update - some things fared better than others.

In general when I see your posts I know exactly what to expect: all Apple's grapes are sour, nothing they do is any good. You've got one note to play, and you play it a lot!

I did not highlight this sentence, Google search did. I just copied the link. But I was talking about the entire article and, in it, Apple did not claim new CPU architecture. It's funny, that Apple did not claim it but Apple fans do. CPU architectures are not developed in 6 months. uarch update may improve the performance but it does not constitute new architecture.

Edit: as far as the clocks are concerned, just check, say, Intel CPU lineup. They have dozens of processor models with different clocks and the same architecture. Original claim that slight clock increase indicates new architecture is technically naive.

senttoschool · May 9, 2024

mr_roboto said:
I think it's funny that you claim M4 isn't a full new generation product, yet at the same time you acknowledge clock speed depends on many factors. One of those factors is... microarchitecture! Clock speed isn't exclusively a product of process node.

I think what the internet is missing is that, ultimately, perf/ghz does not matter. It's always perf/watt that is important and Apple just increased perf/watt in a massive way (assuming no drastic increase in power).

I assume that Apple did tweak the design to increase clock speeds without using more power. Otherwise, N3E might be a godsend node and we should all be buying TSMC stock.

name99 · May 9, 2024

falainber said:
No. Does anyone?

I don't know how to respond to this. This indicates a level of cluelessness that's truly scary.
Do you have any idea what SME is? Do you know what it entails in terms of changing the CPU?
Would you also not consider the addition of, say, AVX to be a change in architecture?
Let's put it differently, what WOULD satisfy you as being "a change in architecture"?

falainber · May 9, 2024

name99 said:
I don't know how to respond to this. This indicates a level of cluelessness that's truly scary.
Do you have any idea what SME is? Do you know what it entails in terms of changing the CPU?
Would you also not consider the addition of, say, AVX to be a change in architecture?
Let's put it differently, what WOULD satisfy you as being "a change in architecture"?

SME is an extension to architecture. It is fairly common for CPUs from the same architecture generation to support or omit some extensions. For example, Intel Xeons will have AVX-512 and their desktop siblings won't, but they will still share the same core architecture generation.

leman · May 9, 2024

falainber said:
SME is an extension to architecture. It is fairly common for CPUs from the same architecture generation to support or omit some extensions. For example, Intel Xeons will have AVX-512 and their desktop siblings won't, but they will still share the same core architecture generation.

So you are suggesting that M3 CPU supported all these things, but for some reason Apple just decided to pretend like they don’t exist, in addition to artificially reducing IPC? That’s a rather cumbersome hypothesis.

At any rate, I think you might be mixing up the terms “architecture” as in platform (e.g. x86, ARM) and architecture as in CPU design (also called microarchitecture). M4 supports new CPU instructions and has a different performance behavior across the board from M3, which already qualifies as new architecture. If you don’t consider it a a new architecture, then you also should be prepared to argue that Zen4 or Alder Lake are not new architectures.

Xiao_Xi · May 10, 2024

leman said:
If you don’t consider it a a new architecture, then you also should be prepared to argue that Zen4 or Alder Lake are not new architectures.

Many consider a new microarchitecture to be a redesign, not an upgrade. Thus, many consider Zen 3 an upgrade of Zen 4, but Zen 5 a new microarchitecture.

name99 said:
Do you know what it entails in terms of changing the CPU?

Has Apple changed the decoder to adopt SME?

Apple-M4-chip-new-CPU-240507_big.jpg.large.jpg

smalm · May 10, 2024

DaniTheFox said:
I see a similar concept. They have building blocks of different parts of a SoC. In several states of functionality. When they have to freeze the design, it could be a block more advanced would be available just a little time latter. Bad luck. But you have to freeze once. You can’t forever.

And then comes the time where you have to shoot the engineer to get the project out of the door...

quarkysg · May 10, 2024

Xiao_Xi said:
Many consider a new microarchitecture to be a redesign, not an upgrade. Thus, many consider Zen 3 an upgrade of Zen 4, but Zen 5 a new microarchitecture.

Has Apple changed the decoder to adopt SME?

May I know how you came to the conclusion that the AMD slide shows a new design, while the Apple slide does not?

Both says about the same thing to me ... like wider decode?

So AMD gets a pass when they say so, but Apple doesn't?

Xiao_Xi · May 10, 2024

quarkysg said:
May I know how you came to the conclusion that the AMD slide shows a new design, while the Apple slide does not?

Honestly, I don't know if M4 can be considered a new microarchitecture. What I have tried to convey, perhaps poorly, is that AMD considers Zen 5 a new microarchitecture, while Zen 4, does not. I am under the impression that Zen 1, Zen 5 and maybe Zen 3 can be considered new microarchitectures, while Zen 2, Zen 3+, Zen 4 are upgrades of the previous version.

quarkysg · May 10, 2024

Xiao_Xi said:
Honestly, I don't know if M4 can be considered a new microarchitecture. What I have tried to convey, perhaps poorly, is that AMD considers Zen 5 a new microarchitecture, while Zen 4, does not. I am under the impression that Zen 1, Zen 5 and maybe Zen 3 can be considered new microarchitectures, while Zen 2, Zen 3, Zen 3+ are upgrades of the previous version.

Well, if you ask me, it doesn't really matter if it is a new architecture or not. At the end of the day, it is how much performance the CPU architect and designer can squeeze out of their designs that counts.

Xiao_Xi · May 10, 2024

[Off-topic] I don't know if this will benefit microarchitecture geeks like us, but Chips and Cheese has become a non-profit organization.

https://chipsandcheese.com/2024/05/09/chips-and-cheese-state-of-the-union/

altaic · May 10, 2024

Xiao_Xi said:
Honestly, I don't know if M4 can be considered a new microarchitecture. What I have tried to convey, perhaps poorly, is that AMD considers Zen 5 a new microarchitecture, while Zen 4, does not. I am under the impression that Zen 1, Zen 5 and maybe Zen 3 can be considered new microarchitectures, while Zen 2, Zen 3+, Zen 4 are upgrades of the previous version.

quarkysg said:
Well, if you ask me, it doesn't really matter if it is a new architecture or not. At the end of the day, it is how much performance the CPU architect and designer can squeeze out of their designs that counts.

So it’s agreed that no one knows what “new” means, and no more bike shedding 👏

Anyway, back to the topic, if Apple’s claims about the M4 operating at 1/2 the power of M3 are true, and if the geekbench scores are legit (where the M4 is much better than the M3 Pro), that’s ****ing amazing. That means that the fabled “double Ultra” is feasible in the Studio. Plus, with LPDDR5X, they could go up to 10700 at similar power, which would be a marked increase in memory throughput. Better start bitching about how you’re being obsoleted before it’s cool!

Dulcimer · May 10, 2024

altaic said:
So it’s agreed that no one knows what “new” means, and no more bike shedding 👏

Anyway, back to the topic, if Apple’s claims about the M4 operating at 1/2 the power of M3 are true, and if the geekbench scores are legit (where the M4 is much better than the M3 Pro), that’s ****ing amazing. That means that the fabled “double Ultra” is feasible in the Studio. Plus, with LPDDR5X, they could go up to 10700 at similar power, which would be a marked increase in memory throughput. Better start bitching about how you’re being obsoleted before it’s cool!

The half-power comparison was to M2, not M3. And let’s be real, that claim is likely specific to certain workloads taking advantage of new arch features.

altaic · May 10, 2024

Dulcimer said:
The half-power comparison was to M2, not M3. And let’s be real, that claim is likely specific to certain workloads taking advantage of new arch features.

Half power compared to M2 is more impressive. Not sure what you’re getting at 🙃

MrGunny94 · May 10, 2024

Honestly, I think we need to wait for this to reach the Mac to be sure when it comes to Battery Life. I do think the chip is amazing and more capable in the M4 Iteration.

I really loved the fact that they decided to push forward with more E Cores and keep improving them, that's where I wanted them to take the base and Pro chips.

thenewperson · May 10, 2024

MrGunny94 said:
Honestly, I think we need to wait for this to reach the Mac to be sure when it comes to Battery Life. I do think the chip is amazing and more capable in the M4 Iteration.

I really loved the fact that they decided to push forward with more E Cores and keep improving them, that's where I wanted them to take the base and Pro chips.

They did move the Pro to 6E already so it’s good to see the base chip get this too. What I’m curious about is if the A18 gets this upgrade as well.

leman · May 10, 2024

name99 said:
Main QUESTION is whether NEON is now SVE, and if so whether it's 128b SVE or 256b SVE.

Since GB6 supports SVE, I think we would have noticed if SIMD has been extended to 256-bit. As you say, the big question is whether M4 supports the regular (non-streaming-mode) SVE at all. Streaming mode solves the HPC problem in a way that AVX512 failed to solve, but Apple might choose not to implement non-steraming SVE at all, deeming Neon to be enough as the base low-latency SIMD ISA. Would be great to get masks though.

name99 said:
I'm not so sure.

If you look at SME (especially the latest SME2.1 stuff, eg
https://reviews.llvm.org/D137571 )
so much of it, in hindsight, seems motivated by AMX functionality. For example LUTI2 and LUTI4 seem to match AMX lookup table stuff [since AMX1, apparently to support quantized weights] along with the strided 2 and 4 vector loads that were added to M3 AMX.

That is true, and LUT instructions was what I had in mind when I wrote my post. Do you know if these new instructions cover all the functionality that were described as part of AMX? Also if my memory serves me right, there were some specialized addressing modes in AMX that I don't remember seeing in SVE? Then again, I have a lot of difficulty navigating ARM's documentation, so I probably missed a lot of things.

P.S. I just had another look and I don't see an equivalent for the generating genlut instruction (https://github.com/corsix/amx/blob/main/genlut.md)

MRMSFC · May 10, 2024

name99 said:
I don't know how to respond to this. This indicates a level of cluelessness that's truly scary.
Do you have any idea what SME is? Do you know what it entails in terms of changing the CPU?
Would you also not consider the addition of, say, AVX to be a change in architecture?
Let's put it differently, what WOULD satisfy you as being "a change in architecture"?

Is this going to be a Ship of Theseus argument?

I don’t know the criteria for a “new” architecture.

leman · May 10, 2024

MRMSFC said:
Is this going to be a Ship of Theseus argument?

I don’t know the criteria for a “new” architecture.

I think this is precisely the point. Characterizing something like a "new architecture" is entirely subjective. Frankly, my feeling is that some say M4 is not a new architecture just because it has been released so soon after M3. Which is hardly a good argument. In terms of features and performance, the M3->M4 is roughly comparable to Rocket Lake->Alder lake, so there's that.

Personally, I do not believe asking whether the architecture is "new" is a productive line of inquiry, because it leads pretty much nowhere. It's much more interesting to ask "what is new" and "what has changed".

mr_roboto · May 10, 2024

altaic said:
Half power compared to M2 is more impressive. Not sure what you’re getting at 🙃

I believe Apple's claim was half power at the same performance as M2. This is likely a comparison between a M2 core running at its highest performance state, and a M4 core running at a reduced clock speed chosen to approximate the performance of the full-speed M2.

I expect M4 P cores in their highest performance state to still use about the same amount of power as Apple's P cores always do - somewhere around 5 to 6 watts. It seems to be the target they aim for.

altaic · May 10, 2024

mr_roboto said:
I believe Apple's claim was half power at the same performance as M2. This is likely a comparison between a M2 core running at its highest performance state, and a M4 core running at a reduced clock speed chosen to approximate the performance of the full-speed M2.

I expect M4 P cores in their highest performance state to still use about the same amount of power as Apple's P cores always do - somewhere around 5 to 6 watts. It seems to be the target they aim for.

I think you’re right about that. Still an impressive improvement, though.

MrGunny94 · May 10, 2024

thenewperson said:
They did move the Pro to 6E already so it’s good to see the base chip get this too. What I’m curious about is if the A18 gets this upgrade as well.

Most likely everything M Pro below will get it because it's portable and on the go devices.

I'm curious the approach they will take with the M4 Pro but I do hope they go this route, but based on this chip alone I think it's safe to say (same goes for M3 Pro)

name99 · May 10, 2024

MrGunny94 said:
Most likely everything M Pro below will get it because it's portable and on the go devices.

I'm curious the approach they will take with the M4 Pro but I do hope they go this route, but based on this chip alone I think it's safe to say (same goes for M3 Pro)

Apple are engaged in on-going work (that moves a little more each year) to split the OS up into more and more pieces that can run independently on separate cores. Obviously this is a goal that every OS vendor strives for in the age of multi-core; Apple's nothing special in this respect, just the techniques they will use will be optimal for the structure of Darwin.
There have been academic OSs in the past (like Barrelfish, from MS) that have pushed this idea, but moving a large commercial OS in this direction is obviously harder!

I've mentioned before that part of how Apple run faster is to run experiments in parallel. IMHO the M3 6E cluster was such an experiment – put it in a chip where it can't cause any harm, and see just how well it can get used (both by the OS and by lightweight threads in apps). Presumably the experiment was a big success, enough so that we see it as the new norm (and perhaps also justifying moving to 6E cores for M4 Max?)

Open questions then include
- does 6E make sense for an iPhone? I guess we'll see soon! Maybe it does?

- does going up to 8 E-cores now make sense? (There are two issues here. The presence of 8 E cores, is there enough work for them? And whether it's still feasible to have them all sharing a single set of L2 capacities like the L2 itself, the L2 TLB and page walkers, and AMX/SME. If those resources start to be overloaded, maybe better to dial it back to 4E+4E for the M4 Pro and slowly over the new few years work our way back to 6E+6E in four years or so?)

- does a dedicated OS-only E cluster make sense? The idea here is that we devote an E cluster (maybe only two E-cores, maybe no AMX/SME needed, and small L2) to running the most security critical elements of the OS and NOTHING ELSE. The idea is that if we have these cores isolated to this extent malicious apps won't be able to [or at least will have to work even harder to find some scheme] either modify the OS or eavesdrop on what it's doing. This will also allow us to make the other cores more aggressive in terms of things like variable timing and speculation without having to worry about this endless stream of micro-architectural security issues (Spectre, GoFetch and the rest of them). If you want to do crypto or anything involving passwords, call into the OS which will shunt the work to a security core, and given this fact, who CARES that an app can, with immense effort, sometimes read a few bytes from the memory range of some other app?

M4+ Chip Generation - Speculation Megathread [MERGED]

macrumors 68040

macrumors 6502a

macrumors 68040

macrumors 68030

macrumors 68030

macrumors 68040

macrumors Core

macrumors 68000

macrumors member

macrumors 65816

macrumors 68000

macrumors 65816

macrumors 68000

macrumors 6502a

macrumors 65816

macrumors 6502a

macrumors 65816

macrumors 65816

macrumors Core

macrumors 6502

macrumors Core

macrumors 6502a

macrumors 6502a

macrumors 65816

macrumors 68030

Our Staff