I think we'll be seeing more and more the approach you mentioned on the last point.Apple are engaged in on-going work (that moves a little more each year) to split the OS up into more and more pieces that can run independently on separate cores. Obviously this is a goal that every OS vendor strives for in the age of multi-core; Apple's nothing special in this respect, just the techniques they will use will be optimal for the structure of Darwin.
There have been academic OSs in the past (like Barrelfish, from MS) that have pushed this idea, but moving a large commercial OS in this direction is obviously harder!
I've mentioned before that part of how Apple run faster is to run experiments in parallel. IMHO the M3 6E cluster was such an experiment – put it in a chip where it can't cause any harm, and see just how well it can get used (both by the OS and by lightweight threads in apps). Presumably the experiment was a big success, enough so that we see it as the new norm (and perhaps also justifying moving to 6E cores for M4 Max?)
Open questions then include
- does 6E make sense for an iPhone? I guess we'll see soon! Maybe it does?
- does going up to 8 E-cores now make sense? (There are two issues here. The presence of 8 E cores, is there enough work for them? And whether it's still feasible to have them all sharing a single set of L2 capacities like the L2 itself, the L2 TLB and page walkers, and AMX/SME. If those resources start to be overloaded, maybe better to dial it back to 4E+4E for the M4 Pro and slowly over the new few years work our way back to 6E+6E in four years or so?)
- does a dedicated OS-only E cluster make sense? The idea here is that we devote an E cluster (maybe only two E-cores, maybe no AMX/SME needed, and small L2) to running the most security critical elements of the OS and NOTHING ELSE. The idea is that if we have these cores isolated to this extent malicious apps won't be able to [or at least will have to work even harder to find some scheme] either modify the OS or eavesdrop on what it's doing. This will also allow us to make the other cores more aggressive in terms of things like variable timing and speculation without having to worry about this endless stream of micro-architectural security issues (Spectre, GoFetch and the rest of them). If you want to do crypto or anything involving passwords, call into the OS which will shunt the work to a security core, and given this fact, who CARES that an app can, with immense effort, sometimes read a few bytes from the memory range of some other app?
Especially with ARM coming to Windows and Intel now even planning some crazy hybrid core strategy with ARM in the future, it's possible that more and more the OS only E Cluster will be present and the P cores will be just used for Heavy Duty applications.
Can't wait to see the Mac Lineup with M4 though