M4+ Chip Generation - Speculation Megathread [MERGED]

leman · May 8, 2024

DrWojtek said:
I think Gurman is wrong and we will see all desktops updated to M4 equivalent this WWDC. Macbook Pros will follow in 2025 with OLED screens. Macbook Air and iMac will skip M4 (or get OLED, which is unlikely).

We just had two M-series chips in half a year. What makes you think that Apple will keep M4 around for that long? What speaks agains the next MacBook Pros getting M5 in fall, just like the last year?

DrWojtek said:
Edit: And unless proven (if you have proof, please post) wrong I think the M3 GPU was based on A16 Pro gpu cores with Dynamic Caching and new features added, a kind of hybrid, and the M4 is the A17 Pro version, the one that they failed to implement in the iPhone 14 Pro due to heating issues or whatever it was.

If it quacks like a duck and walks like a duck it is likely to be a duck. M3 and A17 GPUs have the same capabilities, same features, same performance characteristics, same instruction encoding (and distinct from M2/A16), not to mention that they are reported as same generation in Apple documentation. What reasons do you have to think M3 and A17 GPUs might be different?

DrWojtek · May 8, 2024

leman said:
We just had two M-series chips in half a year. What makes you think that Apple will keep M4 around for that long? What speaks agains the next MacBook Pros getting M5 in fall, just like the last year?

If it quacks like a duck and walks like a duck it is likely to be a duck. M3 and A17 GPUs have the same capabilities, same features, same performance characteristics, same instruction encoding (and distinct from M2/A16), not to mention that they are reported as same generation in Apple documentation. What reasons do you have to think M3 and A17 GPUs might be different?

If N3P is ready in Q4, they likely would get M5. If not, I don't think so. However, moving from M4 that fast, it doesn't make sense Apple released it at all. It would be too fast. Scary fast. In that case I think it's more likely the new MBPs will come around October, maybe without OLED screens, and they'll use a Halloween theme to deliever them again. "Here's a treat" or something.

That (M3 = A17) sounds plausibe then. Do you believe the M4 is based on A18 then, or just a rebrand of the same GPU on another node?

ader42 · May 8, 2024

Basic75 said:
Because the current Studio comes with the Max and Ultra chips, not the Pro and Max.

I think we’ll find that M4 Max > M2 Ultra and M4 Pro > M2 Max

leman · May 8, 2024

DrWojtek said:
If N3P is ready in Q4, they likely would get M5. If not, I don't think so. However, moving from M4 that fast, it doesn't make sense Apple released it at all. It would be too fast. Scary fast. In that case I think it's more likely the new MBPs will come around October, maybe without OLED screens, and they'll use a Halloween theme to deliever them again. "Here's a treat" or something.

That (M3 = A17) sounds plausibe then. Do you believe the M4 is based on A18 then, or just a rebrand of the same GPU on another node?

I do think that M4 makes sense. For now it appears to be M3 ported to N3E with a new NPU, new display engine, and a few additional tweaks and rebalances. It does not look like there have been significant changes to the CPU or GPU microarchitecture.

The M3->M4 transition represents faster iteration and testing cycles for Apple. I don't see why this can't continue going forward. They don't have to rebuild everything from scratch to release a new chip family. Smaller, targeted refinements could be a good way to revitalize the brand and deliver consistent improvements. It's expensive, but that is what Apple does after all.

At any rate, I do think we will see updates to either the CPU or GPU cores, or maybe some other innovations, which will be released under the A18/M5 name by the end of the year.

diamond.g · May 8, 2024

leman said:
I do think that M4 makes sense. For now it appears to be M3 ported to N3E with a new NPU, new display engine, and a few additional tweaks and rebalances. It does not look like there have been significant changes to the CPU or GPU microarchitecture.

The M3->M4 transition represents faster iteration and testing cycles for Apple. I don't see why this can't continue going forward. They don't have to rebuild everything from scratch to release a new chip family. Smaller, targeted refinements could be a good way to revitalize the brand and deliver consistent improvements. It's expensive, but that is what Apple does after all.

At any rate, I do think we will see updates to either the CPU or GPU cores, or maybe some other innovations, which will be released under the A18/M5 name by the end of the year.

Is it possible the M3 and M4 were developed at the same time, or near enough, since they are on different nodes (and one isn't an optical shrink of the other)?

Xiao_Xi · May 8, 2024

leman said:
The M3->M4 transition represents faster iteration and testing cycles for Apple. I don't see why this can't continue going forward. They don't have to rebuild everything from scratch to release a new chip family. Smaller, targeted refinements could be a good way to revitalize the brand and deliver consistent improvements. It's expensive, but that is what Apple does after all.

We don't know to what extent the failure of TSMC's first generation N3 has changed Apple's planning for the M4 and future SoCs.

dugbug · May 8, 2024

leman said:
I do think that M4 makes sense. For now it appears to be M3 ported to N3E with a new NPU, new display engine, and a few additional tweaks and rebalances. It does not look like there have been significant changes to the CPU or GPU microarchitecture.

Maybe they reintroduced the interposer and we will see a studio/pro M4 ultra for wwdc. I mean, one can hope.

tenthousandthings · May 8, 2024

Xiao_Xi said:
We don't know to what extent the failure of TSMC's first generation N3 has changed Apple's planning for the M4 and future SoCs.

Intel is using it for Lunar Lake, so not exactly a “failure”—quite the opposite—and we don’t know what Apple is planning for the Ultra, so let’s not suggest facts not in evidence.

The important thing to keep in sight when trying to understand/guess Apple’s plans for their silicon is to take them at their word when they say it’s driven by the needs of their products.

The seventh generation iPad Pro got M4 because it needed a different display engine than M3 could provide. It’s got nothing to do with the pros and cons of N3 versus N3E.

leman · May 8, 2024

Xiao_Xi said:
We don't know to what extent the failure of TSMC's first generation N3 has changed Apple's planning for the M4 and future SoCs.

I've been seeing this narrative (about TSMC's failure), and I just don't understand where this comes from. To me it seems that N3B delivered pretty much everything it promised?

Xiao_Xi · May 8, 2024

tenthousandthings said:
Intel is using it for Lunar Lake

Has Intel confirmed this? Do you have a link to it?

leman said:
I've been seeing this narrative (about TSMC's failure), and I just don't understand where this comes from. To me it seems that N3B delivered pretty much everything it promised?

I measure the success of N3B by the number of companies using it. The rumors were that TSMC customers were waiting for N3E. If the rumors are true and only Apple uses N3B, I would consider N3B a failure.

tenthousandthings · May 8, 2024

Xiao_Xi said:
Has Intel confirmed this? Do you have a link to it?

Yes, Gelsinger (CEO) stated it in February in remarks after a keynote he gave at a conference, as summarized by China Times (one of the biggest newspapers in Taiwan): "Gelsinger also confirmed the expansion of orders to TSMC, confirming that TSMC will hold orders for Intel's Arrow and Lunar Lake CPU, GPU, and NPU chips this year, and will produce them using the N3B process, officially ushering in the Intel notebook platform that the outside world has been waiting for many years."

The news has driven the Intel roadmap watchers into a frenzy, as nobody is sure of the details (GPU only?), and they’re trying to reconcile past leaks with this statement. Here’s a fairly clear assessment as of February, from TrendForce

I think the original news article in Chinese is here: https://www.chinatimes.com/newspapers/20240223000152-260202/

tenthousandthings · May 8, 2024

Xiao_Xi said:
I measure the success of N3B by the number of companies using it. The rumors were that TSMC customers were waiting for N3E. If the rumors are true and only Apple uses N3B, I would consider N3B a failure.

The rumor that Apple had reserved all of TSMC’s N3/N3B capacity never made much sense to me. TSMC and Apple have a long relationship and TSMC can accurately estimate how much Apple will need, so they can plan accordingly and add capacity for others. But if Intel came in and made a reservation for Lunar Lake, then that could be massive enough (and important enough) to freeze everyone else out of N3/N3B. That scenario makes more sense than the Apple-only narrative.

Edit to point out that, speaking after a keynote given in San Jose, California, Intel’s CEO chose to give this scoop to a reporter from a newspaper in Taiwan. That wasn’t by accident.

CWallace · May 8, 2024

DrWojtek said:
And to speculate even further, I think the 'Air' moniker will be dropped for both Macbooks and iPads with M5.

"Air" has far too much cachet with customers for Apple to drop it. To customers, "Air" means "the machine for consumers like me" whereas Pro means "the machine for professionals".

leman said:
We just had two M-series chips in half a year. What makes you think that Apple will keep M4 around for that long? What speaks again the next MacBook Pros getting M5 in fall, just like the last year?

Because M3 was an interim step to validate 3nm design/production using N3B because that was TSMC's first 3nm process. It also gave Apple bragging rights as "first to 3nm".

M4 (and A18) will be the first mainstream 3nm SoCs for Apple products so they will not be supplanted until M5/A19 arrive in late 2025.

Xiao_Xi said:
We don't know to what extent the failure of TSMC's first generation N3 has changed Apple's planning for the M4 and future SoCs.

N3B was not a failure, even if it was not meant for mainstream. Also do not forget Apple bought up TSMC's entire N3B production capacity so it is not like any other customer could get it.

As to the reason why Apple was able to buy all N3B capacity, @tenthousandthings, it is because TSMC knew yields would be low and N3E was the better platform for customers to use so TSMC was not going to commit to building out significant N3B production capacity.

Xiao_Xi · May 8, 2024

CWallace said:
Also do not forget Apple bought up TSMC's entire N3B production capacity so it is not like any other customer could get it.

The question of interest is what TSMC's other customers, such as AMD, will do when Apple adopts N3E. Will they start using N3B or N3E? If those customers prefer to wait for N3E and use N4, I would consider N3B a failure.

Mac_fan75 · May 8, 2024

Basic75 said:
Because the current Studio comes with the Max and Ultra chips, not the Pro and Max.

Ah yeah lol I meant Max and Ultra sorry for not being precise

leman · May 8, 2024

CWallace said:
M4 (and A18) will be the first mainstream 3nm SoCs for Apple products so they will not be supplanted until M5/A19 arrive in late 2025.

What makes you so confident about it? I don’t think the process tells the entire story. If Apple wants to iterate more rapidly, they certainly can. It might be illuminating to watch their patents this summer. It could give us indication whether they plan another short-term revision.

CWallace said:
N3B was not a failure, even if it was not meant for mainstream. Also do not forget Apple bought up TSMC's entire N3B production capacity so it is not like any other customer could get it.

Wasn’t the story even that N3B was developed in collaboration with Apple? It sounds strange that they would only use it for one short-lived product.

CWallace · May 8, 2024

Xiao_Xi said:
The question of interest is what TSMC's other customers, such as AMD, will do when Apple adopts N3E. Will they start using N3B or N3E? If those customers prefer to wait for N3E and use N4, I would consider N3B a failure.

I don't expect anyone to use N3B when N3E/N3P/N3X are available to them.

leman said:
What makes you so confident about it? I don’t think the process tells the entire story. If Apple wants to iterate more rapidly, they certainly can. It might be illuminating to watch their patents this summer. It could give us indication whether they plan another short-term revision.

Because the M series SoCs use the A series as their foundation. So M4 uses A18 as the foundation and M5 will use A19. With A18 not even shipping until September, I do not see Apple pushing M5 out before A19 and A19 will not ship until September 2025.

IMO, the iPad Pro (7th Generation) received M4 for three reasons:

Apple had completed design on A18 / M4 and TSMC was ready to start initial fabrication of them;
Apple had not updated the iPad Pro for 18 months and while this is not unusual for the iPad Pro, it did place its launch within the window it could use M4 instead of M3;
Apple will be pushing AI hard at WWDC and in the next versions of iOS, iPadOS and macOS and the A18/M4 appear to be better-optimized for AI than the A17/M3. So putting M4 in the iPad Pro makes sense to give it an edge in AI over lesser iPads using older SoCs.

leman said:
Wasn’t the story even that N3B was developed in collaboration with Apple? It sounds strange that they would only use it for one short-lived product.

N3B looks like it was meant to be a short-term process and TSMC planned its successors (N3E, N3P and N3X) to be the mainstream processes for 3nm fabrication.

Xiao_Xi · May 8, 2024

CWallace said:
N3B looks like it was meant to be a short-term process and TSMC planned its successors (N3E, N3P and N3X) to be the mainstream processes for 3nm fabrication.

I wonder what prompted TSMC to launch the N3B node. Since it is incompatible with other N3 nodes, it could assume that very few companies would use it.

CWallace said:
I don't expect anyone to use N3B when N3E/N3P/N3X are available to them.

I don't rule out that Apple will stop selling products with N3B-based SoCs as soon as it replaces them with ones with N3E-based SoCs.

LoopsOfFury · May 8, 2024

Xiao_Xi said:
I wonder what prompted TSMC to launch the N3B node. Since it is incompatible with other N3 nodes, it could assume that very few companies would use it.

TSMC probably had a contracted due date with Apple for N3 and the cost of missing it (financially, relationship-wise, or both) was greater than the cost of moving forward with a flawed process. There were rumors that Apple was buying all the supply for N3B, and that TSMC was covering the cost of defective chips (around 50% I seem to recall), which meant the penalties for missing the timeline was probably signifcant (e.g., Apple moving to Intel’s foundry?). I doubt any other customer has the same relationship with TSMC that Apple does. For example, Apple routinely funds the cost of manufacturing equipment for its critical suppliers in exchange for exclusive access, while Nvidia prefers to spread its business among multiple suppliers, and Intel is a direct competitor of TSMC.

MRMSFC · May 8, 2024

CWallace said:
"Air" has far too much cachet with customers for Apple to drop it. To customers, "Air" means "the machine for consumers like me" whereas Pro means "the machine for professionals".

Not that I think Apple would drop the “Air” branding, but I don’t think “Air” and “Pro” mean that to the layperson.

I’d say it’s more like “Air”=small and “Pro” = more power to the average consumer.

Especially since everything has been slapped with the “Pro” label anymore.

krell100 · May 8, 2024

Pro in Apple speak doesn't mean 'Professional'. Also Air doesn't mean lightest and most mobile.

Apple are weird.

mr_roboto · May 8, 2024

leman said:
Wasn’t the story even that N3B was developed in collaboration with Apple? It sounds strange that they would only use it for one short-lived product.

N3 is a family of process recipes at the same node, so investing in developing N3B helped develop the whole family, probably to a much greater extent than you are thinking.

The lowest layer in the chip, the transistors, is the first to be built on the silicon substrate. In N3 family processes, this transistor base layer requires cutting edge double patterned EUV lithography, which is extremely costly.

After they create the base layer, they start building via and metal wiring layers. The feature sizes on the first few metal layers needs to rival the transistor base layer feature size, but you don't have to maintain that fine pitch all the way up the metal layer stack. By the time you get to the uppermost metal layers, you can be using much older and cheaper lithography tech borrowed from an ancient node.

N3B uses EUV double patterning for more layers in the stackup than other N3 recipes. This means N3B is higher effective density (fewer cases where wire density limits achievable transistor density), but it comes at a significant cost. The right mass market tradeoff between density and manufacturing cost may not have been obvious early on, even as N3B firmed up enough for Apple to target real chips at it.

It's very likely that TSMC will be manufacturing N3E etc in the same facilities as N3B, on literally the same machines. N3B wafer batches will spend more time in the expensive (both to buy and to run) EUV lithography machines, so for as long as Apple wants to keep making N3B chips, they'll be paying more per wafer since N3B reduces the effective capacity of the fab relative to running other recipes. (Or forces TSMC to buy and run more EUV machinery, but that's also expensive.)

name99 · May 8, 2024

leman said:
But you can't force it to run on the NPU, right?

Edit: I think it more useful to compare the M4 to the base M3, like here

iPad16,3 vs Mac15,13 - Geekbench

You can't force a layer to run on HW that isn't supported. eg a 32b layer cannot run on the ANE (which supports FP16 and INT8). Also remember a MODEL does not run on some hardware, a LAYER of the net runs on some hardware.

Poorly ported neural nets (ie most of them) run on a mix of ANE, GPU, and CPU because the various layers have not been tweaked (often in very simple ways) so that they can run on just ANE. If you read the various articles on the Apple Machine Learning website (eg the one talking about porting Transformers to ANE) you will see a discussion of this. The final result from Apple could run all layers on ANE except some minor lookup table stuff. There's a similar article about image segmentation, again showing the desired endpoint is 100% layer execution on ANE.

Finally the three cases are supersets: either every layer runs on CPU, or on CPU or GPU, or on whichever of CPU or GPU or ANE is optimal.

So overall it's very messy. But if you look at the patterns of numbers on GB6-ML, you see that mostly FP16 and INT8 stuff asked to run on ANE does run on ANE, whereas FP32 runs mostly on GPU.

And note how the FP16 and INT8 numbers are so similar in GB6-ML NPU results. Like I said, each FMAC unit can execute either FP16 or INT8 so throughput is the same for either.

The benchmark that leaps out at you is Text Classification, but we don't have any data to get a good analysis of this.
Let's compare with the A17

iPad16,3 vs iPhone 15 Pro Max - Geekbench

browser.geekbench.com

(And, editorial, GB6 ML browser is ABSOLUTELY ***** MADDENING. Like it's deliberately designed to ensure you make mistakes in comparing GPU with CPU with ANE...)

Most of the F32 stuff is 2x. No surprise, that's all on the GPU.
Most of the F16 and INT8 stuff is about 1.3x. Probably ANE is running faster, and faster DRAM (and maybe larger SLC).
But again Text Classification on M4 runs like a bat out of hell.

How to interpret this?
Well (and good luck getting the details right starting from scratch in GB6-ML browser!) look at
https://browser.geekbench.com/ml/v0/inference/373246 (A17 CPU)
https://browser.geekbench.com/ml/v0/inference/373248 (A17 GPU)
https://browser.geekbench.com/ml/v0/inference/373283 (A17 ANE)
and look particularly at Text Classification.

Clearly the ML compiler is making a serious error here. (The only error I saw, but I have missed something.)
Text Classification runs a LOT faster on CPU [presumably AMX] than on ANE or GPU, but ML Compiler seems to want to run most/all of its layers on something other than the CPU. (The similarity of speeds suggests it's running on GPU in both the GPU and ANE cases.)

So that gives us a feeling for the speed of Text Classification on CPU[AMX] vs GPU (and "ANE").
Now go back to looking at the M4 numbers for Text Classification. They're a lot better than the A17 (or M3) GPU (and "ANE") numbers, but a lot worse than the A17 CPU[AMX] numbers...
This suggests three things:
- the ML Compiler bug that's incorrectly routing Text Classification away from the CPU still exists
- Text Classification is being routed to the ANE (since the M4 performance is more than 2x the A17 performance, and they have essentially similar GPUs, so we're told).
- something big changed in the M4 ANE relative to not just the M3 ANE but also relative to the A17 ANE.

What could this be?
I've stated before that the known ANE consists of two parts, 16 Neural Cores that perform convolutions, and a Planar Engine that performs pooling, with the two sharing a "Smart L2" pool of SRAM.
But there are at least two patents (eg https://patents.google.com/patent/US11614937B1 ) that describe a vector DSP and are from people associated with the ANE. The patents are somewhat vague as to where this "accelerator" might be placed, but the obvious placement is as a third unit of the existing ANE.

So my current hypothesis is that
- the vector DSP was part of the A17 (but perhaps hidden behind chicken bits)
- by the M4 is was considered bug free and so available for use by the ML Compiler, and that's what we're seeing here

There are a number of questions here. For example, if the vDSP was behind chicken bits on A17, maybe it does in fact work correctly? Maybe it just requires the latest version of the coreML runtime to route to it, and that latest version hasn't yet been released, so we see it on this iOS 18 code but not on any iOS 17 benchmarks?

And maybe the vDSP can improve many more layers? Quite likely all we're seeing is a very first quick mod to coreML to test maybe one type of layer, but we might see many more layers (especially language-relevant layers) moved to run on the vDSP, meaning speed boost for both the M4 iPad and A17 iPhone in September?

name99 · May 8, 2024

Xiao_Xi said:
What could “next-generation machine learning (ML) accelerators in the CPU” be?

Apple introduces M4 chip

Apple today announced M4, the latest Apple-designed silicon chip delivering phenomenal performance to the all-new iPad Pro.

www.apple.com

There are on-going small tweaks to AMX every year. Recent examples include
- adding more vector (rather than matrix) functionality
- "pre-executing" loads
These are nice but don't really justify the term "next generation".

What might justify that term is indexed functionality, ie loads that are able to perform one degree of indirection. (effectively support for some forms of sparse matrix lookup). This was a big change to the ANE that primarily benefits LLMs (and which might be one part of the M4 LLM improvements I mentioned in my previous post, along with, or separately from, the vDSP stuff I described).

It seems that for at least some text purposes AMX is actually the fastest neural accelerator available (again see my previous post) and indexed functionality would only boost that. Even if it's not used for inference (the ANE is slower but a lot less power) it might be used for text-based training?

name99 · May 8, 2024

Xiao_Xi said:
The question of interest is what TSMC's other customers, such as AMD, will do when Apple adopts N3E. Will they start using N3B or N3E? If those customers prefer to wait for N3E and use N4, I would consider N3B a failure.

Isn't the claim that Arrow Lake's GPU tile will be on N3B?

This seems like (yet another) footgun by Intel, but we shall see. Apple are agile enough to move processes rapidly, and to renegotiate deals when it makes sense for both parties. Unclear that Intel is that agile.

M4+ Chip Generation - Speculation Megathread [MERGED]

macrumors Core

macrumors 6502

macrumors 6502

macrumors Core

macrumors G5

macrumors 68000

macrumors 68000

Contributor

macrumors Core

macrumors 68000

Contributor

Contributor

macrumors G5

macrumors 68000

macrumors regular

macrumors Core

macrumors G5

macrumors 68000

macrumors member

macrumors 6502

macrumors 6502a

macrumors 6502a

macrumors 68030

macrumors 68030

macrumors 68030

Our Staff