M4+ Chip Generation - Speculation Megathread [MERGED]

dugbug · Nov 19, 2024

DrWojtek said:
More likely the Ultra/Extreme is M5? Due to the found Mac17 identifiers? No way they’d put M5 on the Air first. Absolutely no way. And if the Ultra/Extreme is M5, it might be even better!

They put an M4 in an ipad first

tenthousandthings · Nov 19, 2024

DrWojtek said:
If they release an M5 Macbook Air next spring, 6 months after release of M4 MBPs, that will almost wipe sales for the base M4 and M4 Pro MBPs, and those who already bought them will be rightfully pissed.
If it were me I’d throw my M4 MBP against the glass windows of an Apple Store screaming NOOOO at the top of my lungs. Like that gif with some climate activist screaming when they’re about to chop down some tree.

I hope you’ve got “accident” coverage for that thing, because if it follows the M2 MBA, it will be at WWDC, 8 months after the M4 MBP, and 3 months after they launch M4 Ultra and/or whatever else!

It’s still the simplest scenario, Gurman’s certainty about M4 MBAs notwithstanding. There is recent precedent, M2 was on N5P, M5 will be on N3P, and the M2 launch was wildly successful.

If I understand you correctly, you think all of the other hardware differences between the MBP and the MBA don’t matter: larger and better displays, more ports, and so on. They are not insignificant.

Chuckeee · Nov 19, 2024

There is also the possibility the first M5 will be a binned/dumbed down M5 (like the first M4 used in iPad Pro). And this "fewer core" "under clocked" M5 is debuted in the MBA,

name99 · Nov 19, 2024

Confused-User said:
That's the most likely scenario, 4-6 E cores. I still think we're headed for segregated E cores for OS-only use at some point, and that would be a good machine in which to introduce that feature, but I make no predictions about the upcoming models. I also think there's a low chance that they might decide to try for good coverage of embarrassingly parallel workloads, with many many E cores, but I would be very surprised if they tackled that before they get to mix+match chiplets.

The other way to look at this is that Apple are probably trying to move more and more processing to E-cores, both for the obvious reason of lower energy, but also more parallelism.
Consider for example this patent:

US20240370155A1 - Out-of-process hit-testing for electronic devices - Google Patents

Out-of-process hit-testing can provide increased privacy and efficiency in computer user input systems. In an aspect, an application specifies a control style for a UI window to be managed separately from the application, such as by a system process running outside of the application process...

patents.google.com

The patent envisions controls (in other words fragments of a window) that execute their associated code in a separate process from the main process.
The obvious justification for this is security, but once you have the framework in place, you can imagine using it to split many different types of UI control to a separate process, and thus an E- rather than a P-core.

Why would you do this?
Suppose the control involves a lot of compute, but is not sped up by a P-core (eg a Maps control is probably gated by the network; a photo browsing control is probably gated by IO). The obvious way of writing an app would have this code executing on the P-core because the entire app (and especially the UI) executes on the P-core. But now you have a fairly simple way, without much developer hassle, to delegate some of that UI to run on an E-core when appropriate.

Nice for saving a little energy. And also adding to the continually growing reasons to have more E-cores. (Adding to on-going attempts to move more OS functionality into separate processes, generally running on E-cores.)

Confused-User · Nov 19, 2024

name99 said:
The other way to look at this is that Apple are probably trying to move more and more processing to E-cores, both for the obvious reason of lower energy, but also more parallelism.
Consider for example this patent:

That's a cool find and quite interesting. I don't think it changes the game significantly though, in that I doubt you have more than one E core's worth of UI processing ever going on at once. So... what else can use a bunch of E cores? I can't think of much, considering all the existing special purpose silicon - media engines, SSD controller, SE, etc. But I am no doubt missing something. Am I missing enough that there's a good argument for, say, 12 E cores?

I was in any case thinking about something more ambitious- closer to what AMD did with Bergamo and later with Zen 5c cores- though probably not quite that many cores.

They could easily do that, but I don't think their TAM is big enough to make that worth doing right now. At least, I don't think they think so. Hard to say what would turn up if they actually built it- though without less insane RAM prices, I doubt anyone would ever find out.

Icelus · Nov 20, 2024

Inside M4 chips: CPU core performance

In-core performance compared across P and E cores in M1, M3 and M4 chips shows substantial performance improvements, particularly in vector and matrix computation.

eclecticlight.co

komuh · Nov 20, 2024

leman said:
My knowledge is possibly outdated. How would you run MPSGraph on the NPU? I am only aware of the Metal device type.

Yes MPSGraph is lowered to MLIR like format and compiled to device that is selected by a system (if you use proper types and ops it can run on ANE). CoreML is also using MPSGraph or some part of its as backend at least from macOS 13+ (.mlpackage format). They have also a private ANE framework which is most likely used as kinda Metal/CUDA for NPU calculations.

name99 · Nov 20, 2024

Confused-User said:
That's a cool find and quite interesting. I don't think it changes the game significantly though, in that I doubt you have more than one E core's worth of UI processing ever going on at once. So... what else can use a bunch of E cores? I can't think of much, considering all the existing special purpose silicon - media engines, SSD controller, SE, etc. But I am no doubt missing something. Am I missing enough that there's a good argument for, say, 12 E cores?

I was in any case thinking about something more ambitious- closer to what AMD did with Bergamo and later with Zen 5c cores- though probably not quite that many cores. They could easily do that, but I don't think their TAM is big enough to make that worth doing right now. At least, I don't think they think so. Hard to say what would turn up if they actually built it- though without less insane RAM prices, I doubt anyone would ever find out.

I think for the cases I described (eg a Maps control, or scrolling through photos) you could have a fair amount of code delegated to an E-core. Of course most controls are light-weight, but some are not.

Your overall point I agree with; moving ever more code to additional cores (E or P) in a way that's not too onerous for the developer is a long term project where we will likely only see minor improvements each year.
OTOH if you buy your M4 today, with "only 4 E-cores used most of the time", and by macOS 21 that's moved up to 6 E-cores "most of the time", everyone's better off.

I'm not sure what the market is for things like Sierra Forest outside a server running per-client VMs.
Which may be a market of Apple interest in some sense (eg Private Compute Cloud) but even if they create a dedicated chip (a big if, but not outside the bounds of possibility) there may be no incentive to sell that to the public. (Same way that, as far as I know, I can't buy a TPU5 from Google, or a Nitro card from Amazon.)

name99 · Nov 20, 2024

Icelus said:
Inside M4 chips: CPU core performance

In-core performance compared across P and E cores in M1, M3 and M4 chips shows substantial performance improvements, particularly in vector and matrix computation.

eclecticlight.co

Interesting to see just how much "fast mode" E-cores are improving...
They used to be about .3 to .5 of a P-core depending on exactly what you're doing.
Now they're more like .5 to .7 of a P-core.

Or to put it another way, while P-cores are ~1.5x as fast from M1 to M4, E-cores are more like ~2x as fast.

OTOH
It's worth pointing out that Howard's tests (and these numbers), while useful, represent a very simplistic sort of benchmark that's basically testing "guaranteed not to exceed" performance, simple loops that provide almost no stress to the instruction fetch side of the machine, and very predictable data access patterns.

MOST of the extra functionality in a P-core is to deal with real code and how it does not conform to these; the jumping around of eg gcc/llvm or safari in both their instruction and data access patterns. It's not like the design of the P-cores is dumb and a waste of area relative to E-cores!
But it is nice to see how they've been able to add so much functionality to E-cores without hurting the energy goal.

Confused-User · Nov 20, 2024

name99 said:
It's worth pointing out that Howard's tests (and these numbers), while useful, represent a very simplistic sort of benchmark that's basically testing "guaranteed not to exceed" performance, simple loops that provide almost no stress to the instruction fetch side of the machine, and very predictable data access patterns.

Yes. I haven't looked at his code but from his description it would live entirely in (L1) cache and should have virtually no mispredicts. (EDIT: I was just talking about his int and fp loops. I'm just reading the new article now and he's got some more stuff now that I hadn't read about previously. Though his descriptions of them mostly look similar - a small amount of code in a tight loop. Using the AMX does change things somewhat.)

Still interesting - if E cores really have improved that much while maintaining their efficiency, that's significant.

tenthousandthings · Nov 22, 2024

Chuckeee said:
There is also the possibility the first M5 will be a binned/dumbed down M5 (like the first M4 used in iPad Pro). And this "fewer core" "under clocked" M5 is debuted in the MBA,

That’s a good observation in general, about possible motives for the early launches. Think about it—presumably there are always problems early on as volume production scales up, so there is more binned product early on.

The M2 MacBook Air had an 8/8 configuration, so an early launch was a good way to empty out some of those early bins.

The 9/10 M4 iPad Pro, with just one dead performance core, is a way to extract additional revenue from silicon that would otherwise have ended up in the 8/8 M4 iMac (as the lineup currently stands). It’s reasonable to assume we’ll see this approach again, because it’s reasonable to assume that performance cores are harder to perfect, so you have enough of these “almost perfect” SoC early on that you don’t want to dumb them down if you don’t have to…

Confused-User · Nov 22, 2024

tenthousandthings said:
It’s reasonable to assume we’ll see this approach again, because it’s reasonable to assume that performance cores are harder to perfect [...]

Sort of, though as a side effect. Generally, it's all about die area. P cores are larger than E cores, and so are more likely to have a defect. It's not about one being "harder" than the other.

Populus · Nov 27, 2024

tenthousandthings said:
That’s a good observation in general, about possible motives for the early launches. Think about it—presumably there are always problems early on as volume production scales up, so there is more binned product early on.

The M2 MacBook Air had an 8/8 configuration, so an early launch was a good way to empty out some of those early bins.

The 9/10 M4 iPad Pro, with just one dead performance core, is a way to extract additional revenue from silicon that would otherwise have ended up in the 8/8 M4 iMac (as the lineup currently stands). It’s reasonable to assume we’ll see this approach again, because it’s reasonable to assume that performance cores are harder to perfect, so you have enough of these “almost perfect” SoC early on that you don’t want to dumb them down if you don’t have to…

This is interesting, but I think it is more likely that the next MBA will use a binned M4, similar to the iPad Pro or the iMac one.

Populus · Nov 27, 2024

By the way, do you think I should wait until next spring to see if the 2025 MBA comes with an M5, then wait further for the M5 Mac mini? I mean, if I’ve waited until now, I guess I can wait 10 more months…

The M4 is already a great upgrade, especially if we take into account that the Mac mini packs the 10/10 SoC, but if the M5 comes with a significantly better GPU or better Neural Engine… maybe it is worth the wait?

I mean, I’m going to configure it with 1TB/32GB, so if I’m going to make that expense, maybe it would be better to secure a more capable machine.

However, honestly, being the M5 likely built on the N3P process, a third gen 3nm process, maybe it isn’t as big of a jump as the M4 has been… so maybe it makes sense to buy now and stop eternally waiting. Because then, the M6 will actually be a big jump if it finally comes in the new 2nm process.

LaysC · Nov 27, 2024

Populus said:
By the way, do you think I should wait until next spring to see if the 2025 MBA comes with an M5, then wait further for the M5 Mac mini? I mean, if I’ve waited until now, I guess I can wait 10 more months…

The M4 is already a great upgrade, especially if we take into account that the Mac mini packs the 10/10 SoC, but if the M5 comes with a significantly better GPU or better Neural Engine… maybe it is worth the wait?

I mean, I’m going to configure it with 1TB/32GB, so if I’m going to make that expense, maybe it would be better to secure a more capable machine.

However, honestly, being the M5 likely built on the N3P process, a third gen 3nm process, maybe it isn’t as big of a jump as the M4 has been… so maybe it makes sense to buy now and stop eternally waiting. Because then, the M6 will actually be a big jump if it finally comes in the new 2nm process.

It’s possible that M5 skips the Mac Mini - like how it skipped the M3 generation. Probably worth pulling the trigger on the M4 while it’s brand new

Populus · Nov 27, 2024

LaysC said:
It’s possible that M5 skips the Mac Mini - like how it skipped the M3 generation. Probably worth pulling the trigger on the M4 while it’s brand new

Yep, I had contemplated that possibility. Maybe Apple will skip the M3 for the desktop lineup just like they did with the Mini and the Studio, but honestly, who knows… also, I think next spring MBA is coming with an M4 rather than an M5, so…

Nah, I think now is a good moment to jump into the Apple Silicon era with this Mac mini.

DrWojtek · Dec 1, 2024

Any new die shots and therefore new Ultra speculation?

Icelus · Dec 7, 2024

Inside M4 chips: CPU power, energy and mystery

Power use in two in-core performance tests, by number of threads run, leading to estimates of total energy used by P and E cores running the same code, at high frequencies. How efficient are the CP…

eclecticlight.co

Inside M4 chips: Matrix processing and Power Modes

A matrix multiplication test appears to be run on the AMX matrix co-processor, and behaves differently from in-core tests. And what Power modes really do.

eclecticlight.co

Inside M4 chips: Controlling frequency

How macOS controls CPU P core cluster frequency according to the cluster total active residency, in synthetic in-core tests, compression and when running virtual machines.

eclecticlight.co

Inside M4 chips: CPU core management

A first attempt to describe how macOS decides for a thread which type of core, which cluster, which core, what frequency, and how mobile it should be.

eclecticlight.co

wmy5 · Dec 9, 2024

Still no M4 Max die shot? I am very interested to know whether M4 Ultra will be two M4 Max UltraFusion'ed.

name99 · Dec 16, 2024

The people wondering whether and how "AI" might influence "the real OS", and what relevant APIs might look like should view the very recent

which answers these questions in the context of Google.

Many examples are given, but the sort of thing that's of immediate relevance is a memory allocator which uses input variables (in particular a hash of the call stack) to decide in which of several heaps of varying lifetime a new object should be allocated. This not only works well to very well, substantially reducing the footprint of some code, but is also apparently shipping on Pixel 6.

DrWojtek · Dec 17, 2024

Any M5 rumors?

Chuckeee · Dec 17, 2024

DrWojtek said:
Any M5 rumors?

It will still be produced on the 3nm production line, not the 2nm one

tenthousandthings · Dec 18, 2024

DrWojtek said:
Any M5 rumors?

The most exciting one is probably that M5 is going to launch with the Ultra first, instead of last.

name99 · Dec 18, 2024

Let me follow up the above with two more references.

The first is

You don't have to buy into the argument being made for why AI matters economically. What matters is that plenty of people DO buy into the argument. Which means, as I see it, there are two takeaways relevant to recent discussions:

1- if energy usage is going to grow as rapidly as expected, those with performance advantages in respect of inferences/joule will have a substantial advantage. This would appear to work to Apple's favor, both in terms of (we expect) being able to offload more inference locally [which may still mean higher energy usage, but Apple isn't paying for it] AND in terms of Apple probably being able to provide the highest inferences/joule, even at scale.
This latter is not certain, but seems likely given Apple's obsessive (in the past and still) concern with reducing energy anywhere and everywhere. One could imagine that new architectures designed from the ground up for inference might be more efficient, but I've not yet seen an indication of such.

Which suggests that things like Apple clusters, and Apple-sold compute services have perhaps a more promising future (in terms of being cheaper TCO) than it seems right now. Remember, our concern is say half a decade out; not just today's LLM's but the (possible? ridiculous?) future in which LLMs are no longer just a cute trick but the equivalent of the spreadsheet or the compiler, the tool that defines the work (and output, and compensation) of various professionals...

2- the talk includes a slide 13 minutes in that I have not seen elsewhere giving the amount of energy used in the US by the largest data warehouse companies. The interesting item I see there is that Apple comes in at 2GW - substantially behind Google and MS, but 2/3 of Amazon, or the same size as Meta/Facebook (and twice the size of X/Twitter).

People have frequently scoffed that Apple's native data center footprint is insignificant (or, more sensibly, have wondered what it is). This gives us elements of an answer -it's as large as Meta, and not too different from Amazon.
Which in turn suggests that if it makes business sense for those companies to develop various chips (eg Meta's inference server chip, or Graviton+Trainium+Nitro) it makes as much sense for Apple to do so -- REGARDLESS of issues of whether these "server" chips are sold externally... Apple may be slightly smaller but their server chip development is probably also cheaper given the infrastructure they can reuse. And Apple's footprint may grow rapidly, not just once Private Cloud Compute takes off, but also if/as they see value in moving other Apple services off AWS storage or Trainium training or whatever else they currently outsource.

My second recommendation link is

Useful to the Point of Being Revolutionary: Introducing Wolfram Notebook Assistant

Easy access to LLM technology that interprets plain text input into computational Wolfram Language code. Efficient for testing new ideas, pinpointing the most-useful functions and options, and filling in content.

writings.stephenwolfram.com

Again you don't have to buy into my love of Mathematica, that's not the point. The point is that Mathematica is a vast system, ridiculously powerful but also, as a consequence, difficult [or at least slow, in terms of constant lookups] to use as soon as you move out your the area in which you usually work. This provides an extremely powerful and useful tool for improving that situation. I've not used things like Copilot for say C++, but this feels to me like not just what I'd hope for from Copilot but a whole lot more in terms of handling optimization, refactoring, providing quick solutions, and so much more.

Now imagine something similar for other tools that are too complex for one person to fully understand - Photoshop, or Blender, or even Linux system management, or (and these may well exist as prototype internal tools) "assistants" for working on the Apple code base, or the MS code base -- tools that make use of the company-wide conventions, can easily point you to possibly already written versions of the function you want, that can at least provide a first pass at possible performance, security, or obsolescense issues, etc. Presumably most of the training that went into the Wolfram Assistant (internal documentation, stackoverflow posts, code repositories, etc) is available in more or less similar form inside Apple or MS.

It's with this in mind that the first part of my comment, I think, might make more sense. Look, sure, it's possible that we have, in 2024, gone as far as this particular set of ideas will take us, that Wolfram Assistant's successes (and failures), like ChatGTP 4o-whateverItIsTheseDays is as good as it gets for sline-level interactive chat, and nVidia's or Google's chip layout experiments are also as good as it gets. But it seems foolish to assume that given the past two years.
Meaning that even IF you don't see anything that excites you in the current crop of LLM assistants, all that *probably* means is that someone hasn't yet created one for your particular interests.
But 2025 might be the year such an assistant is released for Windows sysadmins... Or for Linux kernel coders... Or for Star Wars fan fiction writers... Or...

Wolfram basically have everything in place to do this "first". Well, sure, maybe Copilot is first, but Wolfram is an "independent developer" in a way that possibly suggests to people who are not Microsoft or Apple or Google some combination of "hey I could do that" and "OMG, if we don't do this but our competitors do".
The other tricky thing is that Wolfram has a history of charging for its products, so no-one is surprised (or ranting all over the internet) that this is an add-on cost. The same idea is *possible* for organizations that work with open source (for example Blender could be free but charge a monthly fee for an assistant, likewise for Ubuntu. Even Google could offer a Google Premium [cf X/Twitter Premium] that gives you ad-free search, a much more powerful AI element to the search, and various other things - some amount of image generation or video generation? Summaries of web sites based on knowledge of what interests you?).

Would these all then back down in the face of the usual mindless screams and rants from the masses? Hmm. We have a once-in-generation chance to restructure after the known issues of the free (ie ad-supported) web...

novagamer · Dec 21, 2024

name99 said:
Let me follow up the above with two more references.

…

My second recommendation link is

Useful to the Point of Being Revolutionary: Introducing Wolfram Notebook Assistant

Easy access to LLM technology that interprets plain text input into computational Wolfram Language code. Efficient for testing new ideas, pinpointing the most-useful functions and options, and filling in content.

writings.stephenwolfram.com

Again you don't have to buy into my love of Mathematica, that's not the point. The point is that Mathematica is a vast system, ridiculously powerful but also, as a consequence, difficult [or at least slow, in terms of constant lookups] to use as soon as you move out your the area in which you usually work. This provides an extremely powerful and useful tool for improving that situation. I've not used things like Copilot for say C++, but this feels to me like not just what I'd hope for from Copilot but a whole lot more in terms of handling optimization, refactoring, providing quick solutions, and so much more.

…

Wolfram basically have everything in place to do this "first". Well, sure, maybe Copilot is first, but Wolfram is an "independent developer" in a way that possibly suggests to people who are not Microsoft or Apple or Google some combination of "hey I could do that" and "OMG, if we don't do this but our competitors do".
The other tricky thing is that Wolfram has a history of charging for its products, so no-one is surprised (or ranting all over the internet) that this is an add-on cost.

Although I’m a bit neutral-to-negative on LLMs in general especially where summarization is concerned I agree with you about Wolfram having a potentially unique opportunity here, and the discipline to do things in a “more’ correct way.

On that topic, do you have any resources regarding Wolfram notebooks and more specifically what people are using their latest technology for day-to-day? I follow you on the Mathematica stuff but I have zero experience with his languages otherwise and am deeply curious about what some practical applications are for their desktop apps or other tools that people apparently pay for.

I think this is a huge blind spot for me professionally and for my knowledge of what’s going on in the broader industry so I’d sincerely love some outside resources to look at if you know of any (and I did spend an hour on their site earlier after reading your post). I understand the temptation to say “if you don’t know it’s not for you” but I’m the type of learner that really likes to understand the tools and grok the scope of their usage because that can lead to implementation insights that I would never have considered and it seems like you have a pretty good handle on this.

edit: feel free to PM me if you consider this too off-topic, but it might be of use to others too. There’s a dearth of information in this area in my (limited) experience which is the reason I’m seizing this opportunity.

M4+ Chip Generation - Speculation Megathread [MERGED]

macrumors 68000

Contributor

macrumors 68040

macrumors 68030

macrumors 6502a

macrumors 6502a

Suspended

macrumors 68030

macrumors 68030

macrumors 6502a

Contributor

macrumors 6502a

macrumors 604

macrumors 604

macrumors member

macrumors 604

macrumors 6502

macrumors 6502a

macrumors 6502

macrumors 68030

macrumors 6502

macrumors 68040

Contributor

macrumors 68030

macrumors 6502a

Our Staff