M4+ Chip Generation - Speculation Megathread [MERGED]

dandeco · May 7, 2024

Mac_fan75 said:
Well I would think if they at least release the Mac Mini with M4, Studio with M4 Pro/Max would boost the sales and the MacBook Pro later this year. Hence I got the M2 Max Studio, wasn't gonna buy the M3 Max but the M4 Max would be tempting.

I dunno if they'd do that. It seems when it comes to the headless small Mac desktops, they go with the regular M chip and Pro version for the Mac Mini, and the Max and Ultra versions for the Mac Studio. (The Mac Pro is essentially a much bigger Ultra-chip Mac Studio with PCI slots.) The Pro-level Mini is replacing the last Intel Mac Mini that was meant to be their high-end option when the Mini line was refreshed in 2018.
I still plan to wait until a Mac Mini with the M4 Pro is out before I buy my next Apple Silicon Mac. It helps that after that car situation I faced a couple months ago, I need to work my way on saving up again, especially since I'm gonna be really busy this summer: my day job with the electronics recycling/reselling company, my summer baseball mascot job (we now also have another team at our stadium and I'm going to be escorting/handling and teaching THEIR mascot performer), along with some contract work for my old college's TV studio in helping digitize their old tape library. I'll be rolling in extra cash!

CWallace · May 7, 2024

If it is true that the next Ultra is a custom SoC and not two Max fused together, then I could see Apple skipping M3 and launching the M4 Ultra at WWDC for the Mac Studio (alongside M4 Max) and Mac Pro. I would also expect Apple to refresh the Mac Mini with M4 and M4 Pro at WWDC, as well.

Then in the Fall the MacBook Pro line would be updated to M4 / M4 Pro / M4 Max as by then yields should be good enough to meet the significantly higher demand for the laptops vis-a-vis the desktops.

tenthousandthings · May 7, 2024

kiranmk2 said:
There must be an M4 Mac coming at WWDC. The iPhone 15 Pro (A17 Pro) and now iPad Pro (M4) have large NPU bumps compared to the previous chip generations (~35-38 TOPS vs ~15-18 TOPS in the A16 & M3). If Apple are going to push AI at WWDC, they will need some kind of Mac with the latest NPU otherwise MacOS will struggle in AI tasks compared to iOS/iPadOS. My money is on a Mac Studio with M4 Ultra but who knows? Perhaps we could see a wider Mac line-up with M4-related chips. Given MacOS won't roll out until October 2025, I think it's more likely that an M4 Ultra Mac Studio is shown at WWDC to demo the AI capabilities of the next MacOS, then just as MacOS launches, we get the M4 range of MBPs to also take advantage of the latest NPU.

I really hope at WWDC Apple corrects/clarifies the misconception/misperception created by the discrepancy in the quoted numbers you mention. Apple used different performance formats for A17 versus M3. With A17 and now M4, they used INT8 numbers, but for M3 (and all previous A-series and M-series), they used INT16/FP16 numbers. The lower precision of the INT8 format allows it to be processed at a higher rate, thus the higher quoted number. See Anandtech's discussion of the M3 launch.

So A15/M2, A16, A17/M3, and M4 show gradual improvement in Neural Engine performance. There wasn't a doubling from A16 to A17, and M3 wasn't left behind. Using the INT16/FP16 numbers, the progress is from 15.8 to 17 to 18 to 19.

senttoschool · May 7, 2024

tenthousandthings said:
I really hope at WWDC Apple corrects/clarifies the misconception/misperception created by the discrepancy in the quoted numbers you mention. Apple used different performance formats for A17 versus M3. With A17 and now M4, they used INT8 numbers, but for M3 (and all previous A-series and M-series), they used INT16/FP16 numbers. The lower precision of the INT8 format allows it to be processed at a higher rate, thus the higher quoted number. See Anandtech's discussion of the M3 launch.

So A15/M2, A16, A17/M3, and M4 show gradual improvement in Neural Engine performance. There wasn't a doubling from A16 to A17, and M3 wasn't left behind. Using the INT16/FP16 numbers, the progress is from 15.8 to 17 to 18 to 19.

Just curious, where did you see INT8 being used for M4?

Pressure · May 7, 2024

MayaUser said:
The M4, should be faster no matter what in that fanless design,if indeed Apple is doing proper thermals inside...so even in current thermal design with M2 should be faster than the former M2
At peak and after on sustaining performance, thermals helps, hope this will translate with the next iphones pro as well
Hoping Apple is not lying

John specifically mentioned incorporating graphic sheets into the housing and copper in the Apple logo to improve thermal performance by nearly 20%.

freezelighter · May 7, 2024

hovscorpion12 said:
Zero chanse of this happening. You would have an outcry like never seen before. Releasing the M4 series in less than 3 months would put a MASSSIVE bad taste in consumers resulting in a huge loss of faith. NO COMPANY (unless during a recall in which hardware changes are legally required) has ever refeshed a product line in 2-months after launch.

2024 will the year Apple refreshes the Macbook Airs (13 & 15" with M3) and the M3 Ultra's for Mac Studo/Mac Pro.

2025 will be the OLED year with M4 Macs.

So what would you say after today’s M4 introduction?

tenthousandthings · May 7, 2024

leman said:
Why would you assume that the Mac line will use M4? Next laptops could jump straight to M5. I could see M4 in the desktops though. It is possible that the N3E can be clocked higher than the M3.

It will be interesting to see. It's possible we'll still see M3 Ultra/Extreme at WWDC.

Apple has always, consistently said that the needs of products would drive the silicon. OLED iPad Pro needed new silicon, so it got it. The key part of the slide about M4 is "Tandem OLED display engine" -- that doesn't exist in M3, so no M3 for the OLED iPad Pro. This demonstrates an agility that the competition can only dream about.

hovscorpion12 · May 7, 2024

freezelighter said:
So what would you say after today’s M4 introduction?

[Edit: looked back at my original post]

Since I was referring to M4 Max and not standard M4, today was very impressive. Especially with the new 10-Core CPU/10-Core GPU design. The added bonus of the new thermals+keyboard.

This will definitely be a good comparison price wise to 13" Macbook Air.

My statement still stands if Apple releases M4 Max before October.

tenthousandthings · May 7, 2024

senttoschool said:
Just curious, where did you see INT8 being used for M4?

That was an assumption on my part, but I see Anandtech does discuss the discrepancy (again) in their story, referring to it ("freely mixing precisions") as "headache inducing" (LOL):

https://www.anandtech.com/show/21387/apple-announces-m4-soc-latest-and-greatest-starts-on-ipad-pro

I think really it has to do with how the competition is advertising their numbers, so Apple has to follow suit. This is marketing executives earning their keep, making judgments about what to explain and what not to explain. The consumer doesn't know or care what INT16, INT8, or INT4 is...

tenthousandthings · May 7, 2024

I think the big-picture question is as follows: Is this indicative of a change, and the M-series will now lead the way? Or will it shift back and forth as products' needs arise? Probably the latter. Today's launch is a case in point.

So we'll have M4/A18 this year, that much seems certain. Where the Pro/Max fit into that is anyone's guess. Not to mention the Ultra/Extreme...

name99 · May 7, 2024

senttoschool said:
Just curious, where did you see INT8 being used for M4?

People are making up these claims about ANE. Simple as that. Almost no-one knows how ANE works, and people are just assuming (a dumb assumption...) that it works like an nV chip. Apple already HAS a GPU, so why would they create new hardware that behaves like a second GPU???

The ANE that we are familiar with uses a clever FMAC design that can support FP16 or INT8. So same performance for FP16 vs INT8. If you look at the GB6 ML benchmarks run on NPU, you will see that this is the case. The FP32 benchmarks (which run on GPU, even if you ask them to run on NPU, bcs NPU does not support FP32) run at one speed, the FP16/INT8 benchmarks run at a different speed that's mostly the same (modulo small differences because of additional memory traffic and suchlike).

Apple PROBABLY switched to using "OPS" rather than FLOPS for the simple reason that the US is a crazy litigious society that can always find some idiot who will file a lawsuit based on some crazy claim that they did not get the FP ANE performance they expected because they think FLOPS means FP32 or FP64, or because Apple wants to count operations like ReLU lookup that are a big part of a neural network but which no-one calls a FLOP. "OPS" is vague enough that non-one can form a lawsuit around it.

The original ANE had 8 cores [not the very first, Lattice, version which was only good for FaceID, nothing else], optimized for convolution. Subsequent generations added the Planar Engine (pooling and element-wise operations) along with a lot of surrounding infrastructure (ways to use less memory, ways to synchronize cores and the Planar Engine). There are hints that the next step would be to add a vectorDSP. It's possible this was done with the A17 and the M4 ANE, and that's how the OPS count jumps from 18T to 38T.

But in terms of "primary computation" the existing (ie up to M3) design is essentially the same number of INT8 or FP16 ops per cycle - 256 FMACs per core, 16 cores. The design is very clever (nothing like a GPU), but with interesting limitations of how data can be routed around.

There is an "exploratory" patent showing how the current multiplier could be split to allow for two INT4 multiplies, BUT for that to be useful, some modifications to the data routing would be required and we see no evidence of that. Honestly I think it's a much lower priority than other things.
Apple can get most of the value of INT4 by quantized weights, with much more flexibility (they can quantize weights to 4bits, but also 5 or 6 bits), and if you follow both Apple announcements and the compilation of LLMs and art generators on Mac, you will see that aggressive use is made of this weight quantization.

tenthousandthings · May 7, 2024

name99 said:
People are making up these claims about ANE. Simple as that. Almost no-one knows how ANE works, and people are just assuming (a dumb assumption...) that it works like an nV chip. Apple already HAS a GPU, so why would they create new hardware that behaves like a second GPU???

The ANE that we are familiar with uses a clever FMAC design that can support FP16 or INT8. So same performance for FP16 vs INT8. If you look at the GB6 ML benchmarks run on NPU, you will see that this is the case. The FP32 benchmarks (which run on GPU, even if you ask them to run on NPU, bcs NPU does not support FP32) run at one speed, the FP16/INT8 benchmarks run at a different speed that's mostly the same (modulo small differences because of additional memory traffic and suchlike).

Apple PROBABLY switched to using "OPS" rather than FLOPS for the simple reason that the US is a crazy litigious society that can always find some idiot who will file a lawsuit based on some crazy claim that they did not get the FP ANE performance they expected because they think FLOPS means FP32 or FP64, or because Apple wants to count operations like ReLU lookup that are a big part of a neural network but which no-one calls a FLOP. "OPS" is vague enough that non-one can form a lawsuit around it.

The original ANE had 8 cores [not the very first, Lattice, version which was only good for FaceID, nothing else], optimized for convolution. Subsequent generations added the Planar Engine (pooling and element-wise operations) along with a lot of surrounding infrastructure (ways to use less memory, ways to synchronize cores and the Planar Engine). There are hints that the next step would be to add a vectorDSP. It's possible this was done with the A17 and the M4 ANE, and that's how the OPS count jumps from 18T to 38T.

But in terms of "primary computation" the existing (ie up to M3) design is essentially the same number of INT8 or FP16 ops per cycle - 256 FMACs per core, 16 cores. The design is very clever (nothing like a GPU), but with interesting limitations of how data can be routed around.

There is an "exploratory" patent showing how the current multiplier could be split to allow for two INT4 multiplies, BUT for that to be useful, some modifications to the data routing would be required and we see no evidence of that. Honestly I think it's a much lower priority than other things.
Apple can get most of the value of INT4 by quantized weights, with much more flexibility (they can quantize weights to 4bits, but also 5 or 6 bits), and if you follow both Apple announcements and the compilation of LLMs and art generators on Mac, you will see that aggressive use is made of this weight quantization.

Well, if by "people" (as in "People are making up these claims") you mean Ryan Smith at Anandtech, then I guess.

But Apple in 2022 explicitly cited FP16 (see the graphic) with regard to the ANE: https://machinelearning.apple.com/research/neural-engine-transformers

So Apple's numbers are not completely mysterious. Doesn’t FP16 = INT16? That's how Anandtech cites it (as “INT16/FP16”) when discussing the A17/M3 discrepancy, always differentiating that from INT8 and INT4.

MayaUser · May 7, 2024

Mac_fan75 said:
Well I would think if they at least release the Mac Mini with M4, Studio with M4 Pro/Max would boost the sales and the MacBook Pro later this year. Hence I got the M2 Max Studio, wasn't gonna buy the M3 Max but the M4 Max would be tempting.

If the Studio comes with M4 Pro and Max that would be a disappointment, a big step backwards

Xiao_Xi · May 8, 2024

leman said:
It is possible that the N3E can be clocked higher than the M3.

According to the first iPad benchmark result, M4 has slightly lower clock speeds than M3 (3.93GHz vs 4.05GHz).

And its neural engine is almost as powerful as that of the M3 Pro/Max.

MacBook Pro (16-inch, Nov 2023) vs iPad16,3 - Geekbench

leman · May 8, 2024

Xiao_Xi said:
According to the first iPad benchmark result, M4 has slightly lower clock speeds than M3 (3.93GHz vs 4.05GHz).

This makes sense to me since they would need to pay extra attention to the power draw.

Xiao_Xi said:
And its neural engine is almost as powerful as that of the M3 Pro/Max.

Is there a way to measure the NPU performance directly? I was under impression that CoreML will use whatever device is best suitable for the job. I'd expect the large test differences to be because of the GPU, not the NPU.

galad · May 8, 2024

CoreML has some options to decide where the model will be run: https://developer.apple.com/documentation/coreml/mlcomputeunits

leman · May 8, 2024

galad said:
CoreML has some options to decide where the model will be run: https://developer.apple.com/documentation/coreml/mlcomputeunits

But you can't force it to run on the NPU, right?

Edit: I think it more useful to compare the M4 to the base M3, like here

iPad16,3 vs Mac15,13 - Geekbench

Xiao_Xi · May 8, 2024

leman said:
I think it more useful to compare the M4 to the base M3, like here

iPad16,3 vs Mac15,13 - Geekbench

The M4 neural engine has improved greatly in text classification.

Text Classification uses Compressed BERT (BERT-Tiny) as its network. BERT-Tiny was chosen because it retains the high accuracy found in larger versions of the model, but also provides simple, small, and effective model.

https://www.geekbench.com/doc/ml-0.5-inference-workloads.pdf

Mac_fan75 · May 8, 2024

MayaUser said:
If the Studio comes with M4 Pro and Max that would be a disappointment, a big step backwards

Why?

galad · May 8, 2024

Well, I guess it depends on what the "cpuAndNeuralEngine" option actually do. Does it try the neural engine and fall back to cpu if the model can't run on the neural engine? Anyway, you can on cpu, and then cpuAndNeuralEngine and get the difference.

Xiao_Xi · May 8, 2024

galad said:
I guess it depends on what the "cpuAndNeuralEngine" option actually do. Does it try the neural engine and fall back to cpu if the model can't run on the neural engine? Anyway, you can on cpu, and then cpuAndNeuralEngine and get the difference.

According to GB ML inference results, a 12-core M3 Pro scores about 4000 on CPU, 7500 on GPU and 10000 on neural engine.

MacBook Pro (16-inch, Nov 2023) - Geekbench

Benchmark results for a MacBook Pro (16-inch, Nov 2023) with an Apple M3 Pro processor.

browser.geekbench.com

MacBook Pro (16-inch, Nov 2023) - Geekbench

Benchmark results for a MacBook Pro (16-inch, Nov 2023) with an Apple M3 Pro processor.

browser.geekbench.com

MacBook Pro (16-inch, Nov 2023) - Geekbench

Benchmark results for a MacBook Pro (16-inch, Nov 2023) with an Apple M3 Pro processor.

browser.geekbench.com

Does it make sense that the neural engine test scores higher than the GPU test? Is it possible for the neural engine test to use all three?

leman · May 8, 2024

galad said:
Well, I guess it depends on what the "cpuAndNeuralEngine" option actually do. Does it try the neural engine and fall back to cpu if the model can't run on the neural engine? Anyway, you can on cpu, and then cpuAndNeuralEngine and get the difference.

Xiao_Xi said:
According to GB ML inference results, a 12-core M3 Pro scores about 4000 on CPU, 7500 on GPU and 10000 on neural engine.

From what I understand cpuAndNeuralEngine can use either, maybe even simultaneously. The problem with GB6 ML benchmark is that it performs multiple tests that might run on different functional blocks. I agree with @galad that a way forward would be to re-run each test using different setting and evaluate the difference, unfortunately, that is not what GB6 does. We could still get some insights by inspecting individual test results instead of the aggregate.

Xiao_Xi said:
Does it make sense that the neural engine test scores higher than the GPU test?

Makes a lot of sense to me. On paper at least the NPU has more throughput in some scenarios than the GPU.

Basic75 · May 8, 2024

Mac_fan75 said:
Why?

Because the current Studio comes with the Max and Ultra chips, not the Pro and Max.

DrWojtek · May 8, 2024

I think Gurman is wrong and we will see all desktops updated to M4 equivalent this WWDC. Macbook Pros will follow in 2025 with OLED screens. Macbook Air and iMac will skip M4 (or get OLED, which is unlikely).

The M5 will bring 12GB as base RAM in accordance to the Iphone 17 Pro 12 GB RAM rumor. The M4 series will stay on same as M3.

Edit: And unless proven (if you have proof, please post) wrong I think the M3 GPU was based on A16 Pro gpu cores with Dynamic Caching and new features added, a kind of hybrid, and the M4 is the A17 Pro version, the one that they failed to implement in the iPhone 14 Pro due to heating issues or whatever it was.

If not, the M4 would use future A18 Pro gpu cores, which does not make any sense, or it would use the same GPU as the M3, which we will be able to tell soon.

edit: And to speculate even further, I think the 'Air' moniker will be dropped for both Macbooks and iPads with M5. I also believe the future iPhone SE will take the regular iPhones place and the iPhone Plus will be rebranded to something else, with more features than the SE but less than the Pros.

Apple needs a cheaper iPhone to compete. The new SE / iPhone will only have 1 camera and 60hz screens, with FaceID and notch.

Reasons are they cannot differentiate enough between regular models and Pro if regular ones also gets 120hz screens at the current pricing. People would not pay $300 extra just for a telephoto camera and smaller notch.

Xiao_Xi · May 8, 2024

What could “next-generation machine learning (ML) accelerators in the CPU” be?

Combined with faster memory bandwidth, along with next-generation machine learning (ML) accelerators in the CPU, and a high-performance GPU, M4 makes the new iPad Pro an outrageously powerful device for artificial intelligence.

Apple introduces M4 chip

Apple today announced M4, the latest Apple-designed silicon chip delivering phenomenal performance to the all-new iPad Pro.

www.apple.com

M4+ Chip Generation - Speculation Megathread [MERGED]

macrumors 65816

macrumors G5

Contributor

macrumors 68030

macrumors 603

macrumors 6502

Contributor

macrumors 68040

Contributor

Contributor

macrumors 68030

Contributor

macrumors 68040

macrumors 68000

macrumors Core

macrumors 6502a

macrumors Core

macrumors 68000

macrumors regular

macrumors 6502a

macrumors 68000

macrumors Core

macrumors 68020

macrumors 6502

macrumors 68000

Our Staff