One thing I had also thought this meant is that Apple E cores can do the same as Apple’s P cores, just more efficiently. So, with a light workload, a customer wouldn’t notice the diff if it was all E cores.
Technically I don't think that is true. It isn't documented well , but it appears a P core complex has a 'hidden' AMX (apple matrix) co-processor in it. However, Apple doesn't expose access to that processor to 'normal' applications (or even most of their own software). You can't get to it except through Apple's Acceleration library code. The acceleration library has a "slower" way of doing the same problems. If something is run the 'slow' way enough it probably will get promoted to a P core anyway.
Apple core complexes are different but it is a cleaner transition of loads because the differences are carefully managed.
Intel has a relatively much more messy difference. There is some evidence that Intel "glued" their P and E cores later after they had been featured designed separately. So they present when queried in certain ways as two distinct x86_64 APIs . Technically that is true. The P cores natively (at least for Gen12) have documented AVX-512 instructions (there are fully accessible in the Xeon SP (Sapphire Ridge) implementations of that baseline design). For Gen 12 (Alder Lake they are turned off somewhat late in the roll out process). Somewhere inside of Intel there was some internal battle on whether some Gen12 configuration (without E cores) would have it turned on and others turned off ( or even perhaps that would push some relatively large hackery into OS kernels to have some awkward "stop and restart" failover to P cores). ). There were some hidden hooks in the UEFI/BIOS to turn AVX-512 on/off. Intel has now just completely fused it off inside the current stepping iterations of the Gen12 processor packages.
However, for Intel, they reach E cores mainly by removing parts of what makes Pcores… well, P. So, an Intel chip with all E cores wouldn’t be able to run legacy software that has not been rewritten to understand what these new less capable cores do. Instead of, “route this command to any available core” it would be “route this command to an available P core, only”.
Not really. There are differences but not this (when the P cores are properly set). The E cores were just a substantively different approach to the design. It does leave out AVX-512. But what Intel laregly did was take Skylake ( gen 7 , Xeon W-2100 , Xeon W-6100 ) and make it much smaller. Brought AVX2 and some AI/ML stuff for enhanced SIMD but left out the AVX-512 to save lots of space. Designed for lower Turbo clocks and really focused on die space efficiency.
It isn't trying to be the 'old school' Atom processor. At the Architecture day intro in 2021 , Intel had lots of comparisons to Skylake.
"...
When comparing 1C1T of Gracemont against 1C1T of Skylake, Intel’s numbers suggest:
- +40% performance at iso-power (using a middling frequency)
- 40% less power* at iso-performance (peak Skylake performance)
*'<40%' is now stood to mean 'below 40 power'
When comparing 4C4T of Gracemont against 2C4T of Skylake, Intel’s numbers suggest:
- +80% performance peak vs peak
- 80% less power at iso performance)peak Skylake performance
We pushed the two Intel slides together to show how they presented this data.
...."
www.anandtech.com
So they are chasing performance, but just not willing to sacrifice lots of area to do it. That's why it was aimed at Intel 4 (old 7nm). Some area space wins they were going to get with just a smaller fab process. But the baseline design was on a flex deployment baseline so could 'fall back' to Intel 7 ( Enhanced Super Fin '10nm' ).
They took SMT out because it has a die space consumption overhead (in addition to some issues if don't implement it securely ). With a smaller core, if they just want a higher thread count just throw more cores into the SoC . ( this core was always suppose to also go into 20+ core special market server chips. Something similar to the Xeon D class. ) . Note in the above when have just one of these cores on there is a bigger power drop relative to Skylake than if have 4 cores on. Once get up into the double digit Gracemount/E cores it isn't
not a huge power saver. It is better (enough to be significantly helpful) , but not huge.
The P core (Golden Cove) cores are coupled as much to the Xeon SP line up constraints as to the mid-high range desktop (and high range laptop ) ones. Bigger area (at higher prices) for more features and higher Turbo ranges with "as much as you need" wall socket power. It is not as clear this was suppose to be on Intel 4 when they initially scoped it out. ( It would have helped to control the die area consumption problem.)
Reportedly Gen13 ( Raptor Lake ) has some modified Redwood Cove (**) cores. It wouldn't be surprising if the stuff that is completely fused off in Gen12 P cores is just removed. Why didn't Intel do that in the first round ? Probably because didn't have time or resources because running around chasing a bunch of crazy forest fires across the product line and internal strife. (e.g., desktop Rocket Lake (Gen11 ) was likely a wasted effort misadventure over the long term. ).
And this is why you only get the best performance from the new chips using the latest Windows… they’ve moved part of the scheduler to the OS to help Intel, but old software can still break if routed to an E core.
Intel didn't move the scheduler to the OS. The scheduler is/was in the OS. What they have done is provider the OS scheduler with more concrete quantifiable data so it can do its job better with less "guessing".
Again not too different than what Apple does. Apple has a few cases where OS scheduler can push some overhead of the juggling to the processor, but it is still more a matter of bubbling up good , informative metrics to the OS scheduler to get better allocations of resources.
Userland software should provide hints and/or suggestions were threads should go. The OS scheduler should be doing the actual scheduling work though because to do the job right need to have the aggregate picture. Apps only have a picture of what they are doing.
As pointed out in another response, there were/are problems with DRM mechanisms and 'too brittle for their own good' apps that freak out if all the CPU cores don't present the 100% identical info. That code is crufty anyway. That kind of code shouldn't hold back processor package evolution.
Edit: (**) Intel code name bingo tripped me up.
Raptor Lake has Raptor Cove cores. Redwood Cove is for Gen14 and a bigger step change. The "lake"and "cove" matching is indicative that the core adjustment is specific to that incremental change being made. Architecture documentation wise Raptor and Golden cove are the same. How much it is a marketing name change ( trying to indicate more progress than there is ) and how much an implementation difference ( better design libraries , security fixes , bug fixes, etc ) we'll see.