....
Apple however went the opposite direction. They didn't make their chips efficient by making them simple. They made their chips efficient by making them complex. Their CPUs can process more instructions per single clock than any x86 chip and they can do it by consuming very little power. The result is a mobile phone CPU core running at sub 3Ghz while consuming 5 watts of power that can trade blows (in synthetic benchmarks) with a desktop x86 core running at 4.5+Ghz consuming who knows how much.
Errrr.. More instructions per clock than the baseline ARM design but the full set of x86 chips? Got any backup for that.
Apple's chips are in part just bigger than many of the other ones in their classe.
The A13 is
".. The new die is 98.48mm² which is 18.3% larger than the A12 of last year. ..."
There are a few here that have cellular modems in the m that are smaller (4th column from left in table(s) below)
"...
Snapdragon 8cx | 7nm TSMC | 8.3 x 13.5 | 112* | > 5.3b < 10.6b | > 56.4 < 94.6 |
Snapdragon 845/850 | 10LPP Samsung | 94 | 5.3 b | 56.4 | |
Snapdragon 835 | 10LPE Samsung | 72.3 | 3.0 b | 41.5 | |
Kirin 980 | 7nm TSMC | 74.13 | 6.9 b | 93.1 |
8-core Ryzen | 14nm GloFo | 22.06 x 9.66 | 192 | 4.8 b | 25.0 |
Skylake 4+2 | 14nm Intel | 13.31 x 9.19 | 122 | 1.75 b | 14.3 |
"
Spotted: Qualcomm Snapdragon 8cx Wafer on 7nm
www.anandtech.com
There are two ways to increase effective IPC. One way is to wider. Anohter way is crank up the cache hits so that issuing less "no-op". if the clock is running fast but if all doing is issuing "no ops" 10-40% of the time then effective 'wall clock' isn't going to necessarily represent how 'wide' the core function units are.
Somewhat questionable whether Apple's system memory (cache) and layerin will do as well on 40GB or 400GB working set footprints as it does on 4GB working set footprints. iOS wouldn't even let anything even allocate over
"...On iOS, 429.mcf was a problem case as the kernel memory allocator generally refuses to allocate the single large 1.8GB chunk that the program requires (even on the new 4GB iPhones). I’ve modified the benchmark to use only half the amount of arcs, thus roughly reducing the memory footprint to ~1GB. The reduction in runtime has been measured on several platforms and I’ve applied a similar scaling factor to the iOS score – which I estimate to being +-5% accurate. The remaining workloads were manually verified and validated for correct execution. ..."
The iPhone XS & XS Max Review: Unveiling the Silicon Secrets
www.anandtech.com
Apple has made progress here since 2018 but one (or two) orders of footprint magnitude progress. Small working set , hot drag racing benchmarks are nice. it isn't that hard to tune to a subset of those to make yourself look good.
What if you have more room to spare than just 5 watts however? What if you can give the CPU 30 watts? 50 watts? 150 watts? You see what I am getting at? Apple claims very confidently that their CPUs scale up. Personally, I doubt that we will see an Apple CPU running at 4Ghz anytime soon. But they most likely can run at 3.5Ghz — which will give it a healthy performance boost over anything in the x86 world while still consuming very little performance in the relative terms.
Scaling up on cores is gimimicky is the workloads aren't really scaling up all that much. 10 cores at a 3GB working set will probably do a bit worse than 15 cores at a 3GB working set if the cache striding manages not to overwhelem the memory channel(s).
So let's have a look at a practical example. Intel Xeon W-3275M, which is the largest Mac Pro CPU currently packs 28 cores running nominal 2.5Ghz into a 205Watt TDP package. Well, Apple could pack 40 A13 cores running at 2.5Ghz into the same package. And an A13@2.5Gz core is considerably faster than a Xeon@2.5Ghz core.
packing the cores isn't the issue. At some point the cores can't all sit close to the singular system cache. At some poin will need a ring or mesh and the latencies will go up and/or get more irregular. And then what happens to Apple's design. Apple has simply just avoided issues. They have grow core count as fab modes have gone down. What happens when can't wait for fab mode to go down to doube , quad cores and caches, GPU , memory controllers all get more spread out ?
I don't think it will completely stump Apple but also think they have tip toe around some issues also because not chasing core counts and pushing more optimization tasks on the software developers (e.g. smaller, limited RAM working sets , fewer ( than high double digit or triple digit ) threads , hand optimizing GPU low level code.and cache loading to minimize bandwidth, etc. etc. )
Of course, making large chips like that is not trivial at all, and Apple probably still has a lot of work to do before they can deliver it. But this is where their efficiency can be turned into some serious power.
Doesn't talk to dGPUs. Doesn't do Thunderbolt speeds. Or even 10GbE I/O. RAM capacities one or two levels higher. Apple systems are barely over the 32-bit , 4GB limit. There are a couple more digits to go to be in the current Mac Pro capacity range. The current A-series has a MMU but is it seriously tasked with anything substantial ? Apple has thrown some advanced security tasks at it but capacity?