Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

MrGunny94

macrumors 65816
Dec 3, 2016
1,148
675
Malaga, Spain
Apple are engaged in on-going work (that moves a little more each year) to split the OS up into more and more pieces that can run independently on separate cores. Obviously this is a goal that every OS vendor strives for in the age of multi-core; Apple's nothing special in this respect, just the techniques they will use will be optimal for the structure of Darwin.
There have been academic OSs in the past (like Barrelfish, from MS) that have pushed this idea, but moving a large commercial OS in this direction is obviously harder!


I've mentioned before that part of how Apple run faster is to run experiments in parallel. IMHO the M3 6E cluster was such an experiment – put it in a chip where it can't cause any harm, and see just how well it can get used (both by the OS and by lightweight threads in apps). Presumably the experiment was a big success, enough so that we see it as the new norm (and perhaps also justifying moving to 6E cores for M4 Max?)


Open questions then include
- does 6E make sense for an iPhone? I guess we'll see soon! Maybe it does?

- does going up to 8 E-cores now make sense? (There are two issues here. The presence of 8 E cores, is there enough work for them? And whether it's still feasible to have them all sharing a single set of L2 capacities like the L2 itself, the L2 TLB and page walkers, and AMX/SME. If those resources start to be overloaded, maybe better to dial it back to 4E+4E for the M4 Pro and slowly over the new few years work our way back to 6E+6E in four years or so?)

- does a dedicated OS-only E cluster make sense? The idea here is that we devote an E cluster (maybe only two E-cores, maybe no AMX/SME needed, and small L2) to running the most security critical elements of the OS and NOTHING ELSE. The idea is that if we have these cores isolated to this extent malicious apps won't be able to [or at least will have to work even harder to find some scheme] either modify the OS or eavesdrop on what it's doing. This will also allow us to make the other cores more aggressive in terms of things like variable timing and speculation without having to worry about this endless stream of micro-architectural security issues (Spectre, GoFetch and the rest of them). If you want to do crypto or anything involving passwords, call into the OS which will shunt the work to a security core, and given this fact, who CARES that an app can, with immense effort, sometimes read a few bytes from the memory range of some other app?
I think we'll be seeing more and more the approach you mentioned on the last point.

Especially with ARM coming to Windows and Intel now even planning some crazy hybrid core strategy with ARM in the future, it's possible that more and more the OS only E Cluster will be present and the P cores will be just used for Heavy Duty applications.

Can't wait to see the Mac Lineup with M4 though
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
What set of SME instructions could Apple have implemented? SME? SME2?
Can iPadOS access that information or can only macOS?
 

leman

macrumors Core
Oct 14, 2008
19,521
19,674
That’s a pretty good question. IIRC, SME is part of ARM V.9.2 right? And SME2 comes with AMR 9.4

These are still optional extensions. Apple can implement v8 with SME subset on top. For instance, I don’t see them adopting SVE in the near future.
 

Dulcimer

macrumors 6502a
Nov 20, 2012
967
1,148
What do we think Apple will do with core counts for M4 Max? Trend for M4 (and M3 Pro) was adding more E-cores. M3 Max saw 4 more P-cores and 2 more GPU cores (highest configuration).

So it’s clear they want more distinction between Pro and Max lines. Do we have reason to believe they’ll continue this trend?
 

senttoschool

macrumors 68030
Nov 2, 2017
2,626
5,482
What do we think Apple will do with core counts for M4 Max? Trend for M4 (and M3 Pro) was adding more E-cores. M3 Max saw 4 more P-cores and 2 more GPU cores (highest configuration).

So it’s clear they want more distinction between Pro and Max lines. Do we have reason to believe they’ll continue this trend?
I'm not expecting more GPU cores because N3E has reduced density vs N3B. I do expect a few more E cores though because it's a cheap way (in terms of area) to improve MT performance. I don't expect more P cores.
 
  • Like
Reactions: Dulcimer

streetfunk

macrumors member
Feb 9, 2023
82
41
i would expect,
M4 Max: 6 E-cores / 12 P-cores
M4 miniPro: 6 E-cores / 6 P-cores
The question is, will there be a miniPro with 8 P-cores.

I would expect them to continue to have a clear distinction between miniPro and Max.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Apple can implement v8 with SME subset on top. For instance, I don’t see them adopting SVE in the near future.
Is that possible?
SME builds on the Scalable Vector Extensions (SVE and SVE2), adding new capabilities to efficiently process matrices. Key features include:

  • Matrix tile storage
  • Load, store, insert, and extract tile vectors, including on-the-fly transposition
  • Outer product of SVE vectors
  • Streaming SVE mode
A new operating mode is added, Streaming SVE Mode. When in Streaming SVE Mode, the new SME storage and instructions are available, as well as significant subset of the existing SVE2 instructions. When not in Streaming SVE mode, behavior is unchanged from SVE2. Applications can switch between operating modes depending on what is needed.
 

Pressure

macrumors 603
May 30, 2006
5,179
1,544
Denmark
It's a shame the "tech" influencers currently doing their "in-depth reviews" only have the ability to run Geekbench 6 when we could also get some Geekbench ML results at least.
 
  • Haha
Reactions: Basic75

leman

macrumors Core
Oct 14, 2008
19,521
19,674
  • Like
Reactions: crazy dave

leman

macrumors Core
Oct 14, 2008
19,521
19,674

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,225
Some interesting discoveries from Anandtech forums:

1. Geekbench6 does use AMX x86 instruction sets: https://browser.geekbench.com/v6/cpu/compare/5330257?baseline=4656765

2. Intel CPUs that have AMX do look be working in GB6: https://browser.geekbench.com/v6/cpu/compare/5330257?baseline=4656765. See Object Remover scores.

Wait, so matrix operations on an iPad are almost as fast as on the latest and greatest Intel server CPU with AVX512 and AMX? Wow…
I'm a little tired so this is probably obvious, but how does this show it? Is it that the Intel Core i7-12700T has the AMX but the Intel Xeon w5-3435X doesn't? @senttoschool can you link to the Anandtech forum page where they found this?

I just naturally assumed given the scores that GB didn't use AMX, but ... wow ... if it does ...

Primate Labs needs to make it more clear which extensions it uses for every GB test.

EDIT: They actually do for some them, maybe all of them, how the hell did I miss that:


Photo Library and Object Detection use AMX, they don't mention which they use for Object Remover, maybe they don't use them? Any so it's possible Photo Library's small bump is also SME related? It's just didn't have as pronounced an effect?
 
Last edited:
  • Like
Reactions: Xiao_Xi

thenewperson

macrumors 6502a
Mar 27, 2011
992
912
I'm a little tired so this is probably obvious, but how does this show it? Is it that the Intel Core i7-12700T has the AMX but the Intel Xeon w5-3435X doesn't? @senttoschool can you link to the Anandtech forum page where they found this?

I just naturally assumed given the scores that GB didn't use AMX, but ... wow ... if it does ...

Primate Labs needs to make it more clear which extensions it uses for every GB test.

EDIT: They actually do for some them, maybe all of them, how the hell did I miss that:


Photo Library and Object Detection use AMX, they don't mention which they use for Object Remover, maybe they don't use them? Any so it's possible Photo Library's small bump is also SME related? It's just didn't have as pronounced an effect?

According to someone on anandtech forums Intel reserves AMX for their server CPUs so that’s where you’ll notice the difference. Link: https://forums.anandtech.com/posts/41209202/
 
Last edited:
  • Like
Reactions: crazy dave

senttoschool

macrumors 68030
Nov 2, 2017
2,626
5,482
EDIT: They actually do for some them, maybe all of them, how the hell did I miss that:

https://www.geekbench.com/doc/geekbench6-benchmark-internals.pdf
Photo Library and Object Detection use AMX, they don't mention which they use for Object Remover, maybe they don't use them? Any so it's possible Photo Library's small bump is also SME related? It's just didn't have as pronounced an effect?
Yea my first link was meant to point to https://www.geekbench.com/doc/geekbench6-benchmark-internals.pdf

That's fixed in my post now.
 
  • Like
Reactions: crazy dave

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,225
Do we have any GPU benchmark already?
Yeah it's about a 13-15% improvement - probably a mix of clocks/bandwidth, we'll know when they get out into the wild and we see what the GPU clocks are.


Yea my first link was meant to point to https://www.geekbench.com/doc/geekbench6-benchmark-internals.pdf

That's fixed in my post now.

I've been using that link for *days* and just skipped over the instructions or at least if I noticed them, the import didn't register. Rather embarrassing.
 
Last edited:

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,866
Benched from a passively cooled device!
I'd caution against reading too much into that, thanks to the design of Geekbench. Quoting from the same GB document @crazy dave linked:

Geekbench inserts a pause (or gap) between each workload to minimize the effect thermal
issues have on workload performance. Without this gap, workloads that appear later in the
benchmark would have lower scores than workloads that appear earlier in the benchmark.
The default gap is 2 seconds for both single-core and multi-core workloads.

It doesn't say so here but each workload runs for even less time than the 2s gap. GB CPU is deliberately designed to avoid any thermal clockspeed limits. John Poole has given a rationale for this which goes beyond the paragraph above and makes sense given what he wanted GB to be, but it's something you do need to be aware of when interpreting results.

Here's two examples to demonstrate this in action: 2020 M1 MBA (fanless) vs 2020 13" M1 MBP (fan), and 3rd gen iPad Pro 11 (fanless M1) vs the same 13" M1 MBP. Chose these because all three have the exact same SoC.


If higher tier versions of M4 like Pro/Max/Ultra are in the works, we'll probably see slightly higher clock speeds, but if Apple follows their prior patterns we won't be seeing enormous increases in GB CPU from that - it's usually a fairly small bump.
 
  • Like
Reactions: crazy dave

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,225
Just as a quick comment — that document seems to be outdated since it does not mention SVE and SME instruction set support for ARM in the latest versions of GB6.
As I wrote in the other place though, if GB6 supports AMX and AVX-512 for x86, SME support for ARM can hardly be considered "cheating". Hell, even AVX2 is almost ... ehk ... nevermind ...
 
  • Like
Reactions: leman
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.