Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
Andrei's review is very clear that P/W on A15 is significantly better. There was one earlier review that claimed the opposite, but frankly, I think I will trust an established expert who has been doing in-depth reviews of Apple chips for years (and has the methodology).

I understand - in fact that’s exactly what I’m saying! I should’ve been more clear but clearly something went wrong with that *other guy’s* (not Andrei’s) calculations. I’m trying to dig in and figure out what but so far the only tests in common is Aztec and even there I can’t get them to line up sensibly so probably no luck.
 
  • Like
Reactions: leman

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
One area where Andrei did ding Apple’s design was its thermals (not efficiency, but it’s ability to dissipate the heat it produces effectively) - in particular its PCB layout. However this led to an interesting post by a user in the comments which I thought I’d link to:


Basically they may have made a trade off as to where the heat goes vs total heat dissipation potential. However, Andrei points out that there are other thermal issues as well and Apple has had trouble in the past with the ratio between sustained and peak performance even if each is still much better than the competition.
 

cmaier

Suspended
Original poster
Jul 25, 2007
25,405
33,474
California
One area where Andrei did ding Apple’s design was its thermals (not efficiency, but it’s ability to dissipate the heat it produces effectively) - in particular its PCB layout. However this led to an interesting post by a user in the comments which I thought I’d link to:


Basically they may have made a trade off as to where the heat goes vs total heat dissipation potential. However, Andrei points out that there are other thermal issues as well and Apple has had trouble in the past with the ratio between sustained and peak performance even if each is still much better than the competition.

I think it just doesn’t matter. It’s a phone. They hit their speed goal, regardless of thermal throttling. Changing the package to maximize sustained performance may have had lots of other trade offs with little real world benefit.

And, of course, they may use completely different packaging in other devices where sustained performance matters more.
 
  • Like
Reactions: jdb8167

leman

macrumors Core
Oct 14, 2008
19,522
19,679
I think it just doesn’t matter. It’s a phone. They hit their speed goal, regardless of thermal throttling. Changing the package to maximize sustained performance may have had lots of other trade offs with little real world benefit.

And, of course, they may use completely different packaging in other devices where sustained performance matters more.

I agree. Why cares how much the mobile GPU throttles if it still performs miles better than the closest competition? We only discuss this because Apple chose to keep the GPU unrestricted. If they had capped the power consumption on 3-4 watts, everybody would be only mentioning how fast the GPU is compared to Android phones...

What's important for us is that even the smallest laptop has much more thermal headroom. So the fact that Apple GPU can scale should actually make us happy. Looking at these numbers, the 10-core M2 GPU promises to be at least 30-40% faster than M1 which would be tremendous.
 
  • Like
Reactions: jdb8167

cmaier

Suspended
Original poster
Jul 25, 2007
25,405
33,474
California
I agree. Why cares how much the mobile GPU throttles if it still performs miles better than the closest competition? We only discuss this because Apple chose to keep the GPU unrestricted. If they had capped the power consumption on 3-4 watts, everybody would be only mentioning how fast the GPU is compared to Android phones...

What's important for us is that even the smallest laptop has much more thermal headroom. So the fact that Apple GPU can scale should actually make us happy. Looking at these numbers, the 10-core M2 GPU promises to be at least 30-40% faster than M1 which would be tremendous.

Yep. Now deliver those damned machines, Apple. My 2016 MBP is limping along with half the keys not working right.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
I think it just doesn’t matter. It’s a phone. They hit their speed goal, regardless of thermal throttling. Changing the package to maximize sustained performance may have had lots of other trade offs with little real world benefit.

And, of course, they may use completely different packaging in other devices where sustained performance matters more.

True and the idea of pumping heat to the display rather than the back of the phone seems like the right call to me as we generally only touch the display but hold the back and are more likely to notice an extra warm phone if the back is hot.
 

leman

macrumors Core
Oct 14, 2008
19,522
19,679
By the way, @cmaier, when I look at all these die shots I am always surprised how little space the core components actually occupy. What are all the unannotated areas for?

APL1W07_TMMU71_BPoly_floorplan_TechInsights.jpg
 

cmaier

Suspended
Original poster
Jul 25, 2007
25,405
33,474
California
By the way, @cmaier, when I look at all these die shots I am always surprised how little space the core components actually occupy. What are all the unannotated areas for?

View attachment 1856598

This one should make you more comfortable, then - the first chip I helped design professionally :)

As for the unmarked areas on the floor plan - no idea. Impossible to tell without seeing a lot more information. There are some regular-looking structures which suggest memory structures or crossbars or the like, but I couldn’t guess.

1633375031370.jpeg
 
  • Like
Reactions: Tagbert and leman

leman

macrumors Core
Oct 14, 2008
19,522
19,679
This one should make you more comfortable, then - the first chip I helped design professionally :)

As for the unmarked areas on the floor plan - no idea. Impossible to tell without seeing a lot more information. There are some regular-looking structures which suggest memory structures or crossbars or the like, but I couldn’t guess.

Nice! I've been also looking at some annotated dies of Tiger Lake and I was surprised how much space display controllers and other video related stuff occupies...
 

cmaier

Suspended
Original poster
Jul 25, 2007
25,405
33,474
California
Nice! I've been also looking at some annotated dies of Tiger Lake and I was surprised how much space display controllers and other video related stuff occupies...

I don’t know what Intel does, but often times some of those non-timing critical blocks are made using Synopsys, which wastes a lot of space. Hand-crafting is always much more space-efficient.
 
  • Love
Reactions: Macintosh IIcx

Erasmus

macrumors 68030
Jun 22, 2006
2,756
300
Australia
I wonder if Apple will, at some point, go to three tiers of cores. Would be interesting to profile some workloads and figure out if there’s even more efficiency to be gained somehow in the future.
Instead of introducing an 'M-Core', or some such mid-tier physical hardware solution, would a dynamic undervolt/underclock on the P-cores have a similar result?
 

cmaier

Suspended
Original poster
Jul 25, 2007
25,405
33,474
California
Instead of introducing an 'M-Core', or some such mid-tier physical hardware solution, would a dynamic undervolt/underclock on the P-cores have a similar result?
Not really, at least i surmise not. There is far more to be gained by differentiating microarchitecture. For example, a completely in-order core with no hardware multiplier, no reservation stations, etc. can have far fewer pipe stages and a lot denser hardware, which means the wires are shorter, the branch prediction miss penalty is much lower, etc., which brings much higher power efficiency for instruction steams that don’t need to worry about speed.
 

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,866
yep, the performance improvement was undersold, and the efficiency improvement is fantastic.

I wonder if Apple will, at some point, go to three tiers of cores. Would be interesting to profile some workloads and figure out if there’s even more efficiency to be gained somehow in the future.
The data in the "CPU ST Performance" page of that Anandtech article is fascinating. The S888's A78 mid core and A55 little core aren't too different (on average) if going by joules of energy consumed to run each SPECcpu 2017 benchmark. The A55 is much slower, of course, but it's a much lower power core too, and it seems that the area under the curve is about the same regardless of whether you're slow and cool on the A55 or medium fast and warm on the A78..

That suggests that if you have sufficiently fast A78 DVFS state transitions, you shouldn't really need anything but A78. The main reason you want the A55 is if you can't turn the A78 on and off fast enough to avoid burning too many wasted joules doing nothing.

That could be the reason Apple hasn't tried three tiers yet. It would make sense that they could have better/faster DVFS (and better whole-SoC DVFS integration) than a SoC like S888, where Qualcomm integrates Arm Holding's core designs.

Also, Apple's A15 efficiency core has performance close to the A78, but power close to the A55, which is a killer combo.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
The main reason you want the A55 is if you can't turn the A78 on and off fast enough to avoid burning too many wasted joules doing nothing.

The main reason most Android manufacturers use the A5x cores is die area. They are tiny and therefore cheap to include. So for those cores it is about Performance per Watt per Area. Apple doesn’t stress that in their designs.
 

Joelist

macrumors 6502
Jan 28, 2014
463
373
Illinois
Really comparing Apple's designs to ARM Holdings Cortex reference or to Snapdragons is a bit like Apples and Oranges. Apple not only has completely different microarchitectures but their entire SOC is designed along different lines, being far more specialized in the blocks included. Apple because it controls the entire hardware and software stack can and does create specially designed blocks in its SOC to deliver the specific experiences it is aiming at. In other words Apple Silicon is much less generic than the others.
 

cmaier

Suspended
Original poster
Jul 25, 2007
25,405
33,474
California
Really comparing Apple's designs to ARM Holdings Cortex reference or to Snapdragons is a bit like Apples and Oranges. Apple not only has completely different microarchitectures but their entire SOC is designed along different lines, being far more specialized in the blocks included. Apple because it controls the entire hardware and software stack can and does create specially designed blocks in its SOC to deliver the specific experiences it is aiming at. In other words Apple Silicon is much less generic than the others.

Tell me more. I want to learn.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
Not really, at least i surmise not. There is far more to be gained by differentiating microarchitecture. For example, a completely in-order core with no hardware multiplier, no reservation stations, etc. can have far fewer pipe stages and a lot denser hardware, which means the wires are shorter, the branch prediction miss penalty is much lower, etc., which brings much higher power efficiency for instruction steams that don’t need to worry about speed.

In your three tier hypothetical, would you see Apple adding a smaller, more efficient E-core? (making the current E-core the new M-core) Or adding a new M-core in-between their current E- and P-cores? or a bigger, more hungry P-core? (making the current P-core the new M-core) The last one seems unlikely for a phone SOC (though I suppose if they went something like the X1 where there’s only one in the SOC, maybe not too bad …), so I’m guessing one of the two first ones? Are the potential gains worth it from a complexity/scheduling POV? I mean Android does it and Apple held off doing big-little until Apple finally did it, but is adding a third tier worth it?
 
Last edited:

EntropyQ3

macrumors 6502a
Mar 20, 2009
718
824
So the main message (according to Andreis' data, which I trust) is that Apple managed to improve efficiency significantly, despite increasing clock speeds on a very similar process technology to that of the A14.
What is a bit lacking is the analysis of exactly how that was achieved.
His hypothesis was that this is mostly achieved by reducing main memory bus traffic by doubling the size of the SLC.
But this doesn't sit too well with me and if true, would make me question the benchmarking suite used for the efficiency measurements (i.e. it would have a memory footprint that fell specifically in the expanded region) - which on the other hand doesn't really seem to be the case.
So I'm a bit stumped.

While reducing main memory traffic reduces power draw, increasing the amount of on-chip cache (and cache traffic) represents an increase in power draw, and any gain is only in the delta between the two, which certainly hasn't seemed to be on anywhere near this scale before or even necessarily a net gain. And it isn't as if a doubling of SLC at these sizes does a world of difference in hit rate either, typically, so the actual difference producing the delta can't be all that much to begin with.

So to me it feels as if there has to be more to it. Again, I trust the basic data, it's the best around, and it is corroborated by system level battery life tests. But I have no decent hypothesis as to how the remarkable efficiency gains were achieved, much less a good idea how to probe such a hypothesis for validity.
 

leman

macrumors Core
Oct 14, 2008
19,522
19,679
I mean Android does it and Apple held off doing big-little until Apple finally did it, but is adding a third tier worth it?

I always had an impression that the only reason for the three core hierarchy in modern Android chips is that they don't look like total losers in benchmarks compared to the iPhones. There are a lot of benchmark shenanigans going on. Like CPUs/GPUs that run on full power when running a popular benchmarks but throttling down in real life. And didn't one plus actually disable the big core when running anything but benchmarks?
 
  • Like
Reactions: Basic75

leman

macrumors Core
Oct 14, 2008
19,522
19,679
So to me it feels as if there has to be more to it. Again, I trust the basic data, it's the best around, and it is corroborated by system level battery life tests. But I have no decent hypothesis as to how the remarkable efficiency gains were achieved, much less a good idea how to probe such a hypothesis for validity.

@cmeier mentioned above that A15 could have a better optimized layout, and of course, it is likely manufactured at an improved process. I think it's fairly clear that A15 is an "optimization step" of A14 in many ways.
 
  • Like
Reactions: crazy dave

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
I always had an impression that the only reason for the three core hierarchy in modern Android chips is that they don't look like total losers in benchmarks compared to the iPhones. There are a lot of benchmark shenanigans going on. Like CPUs/GPUs that run on full power when running a popular benchmarks but throttling down in real life. And didn't one plus actually disable the big core when running anything but benchmarks?

TBH ?‍♂️. I mean companies have - generally - stopped outright cheating because too many reviewers got wise and it’s too easy to get caught. Actually I think for one of them, maybe this was the One Plus, it was the opposite: it blacklisted a bunch of apps to the mid-cores, including ones popularly used for benchmarks, in order to save battery life. Which is kind of … i mean it’s not technically cheating if it does it while being benched … but wow the X1 in the 888 uses too much power …

I get the impression that the A55’s are so cheap to add die-wise it’s almost a “why not add them?” But I do agree with @mr_roboto that there’s not much point to them beyond the die savings compared to downclocked A7x cores.

Then again we used to have these discussions over whether big-little was worth it given how far ahead Apple was in core design. Eventually Apple decided it was as it allowed them to push those big cores even further and maintain efficiency. So Android’s cores were behind but the structure was right. I’m not sold that’s the case here admittedly so that’s why I asked @cmaier what he envisioned in his hypothetical and where the payoff would be.
 

Erasmus

macrumors 68030
Jun 22, 2006
2,756
300
Australia
In your three tier hypothetical, would you see Apple adding a smaller, more efficient E-core? (making the current E-core the new M-core) Or adding a new M-core in-between their current E- and P-cores? or a bigger, more hungry P-core? (making the current P-core the new M-core) The last one seems unlikely for a phone SOC (though I suppose if they went something like the X1 where there’s only one in the SOC, maybe not too bad …), so I’m guessing one of the two first ones? Are the potential gains worth it from a complexity/scheduling POV? I mean Android does it and Apple held off doing big-little until Apple finally did it, but is adding a third tier worth it?
Pretty sure he's talking about M-cores, which seem like they could be useful. However, as a non-electrical engineer, M-cores seem a bit conceptually boring to me. :p

What about U-cores, that are ultra-low power? For doing things like tasks during sleep (dealing with WiFi, pre-loading eMails, siphoning RAM to SSD before switch to hibernate, reporting 'Find My' locations while shut down, etc.)

Although perhaps at that point, it's better for Apple to start throwing dedicated modules at those issues.
 

leman

macrumors Core
Oct 14, 2008
19,522
19,679
Pretty sure he's talking about M-cores, which seem like they could be useful. However, as a non-electrical engineer, M-cores seem a bit conceptually boring to me. :p

What about U-cores, that are ultra-low power? For doing things like tasks during sleep (dealing with WiFi, pre-loading eMails, siphoning RAM to SSD before switch to hibernate, reporting 'Find My' locations while shut down, etc.)

Although perhaps at that point, it's better for Apple to start throwing dedicated modules at those issues.

I think it is important to make a distinction between power and energy. For the tasks you describe you don’t really need a core that draws the least amount of power, you need a core that can do the job with the least amount of energy spent so that you can be done quickly and send the circuitry into deep sleep. Don’t forget that the CPU is just one part of the puzzle, you also have RAM, caches, data interconnects - all of which use power. So the more time you spend with those powered down the better.

And looking at Andrei‘s review, the efficiency of A15 E-cores is simply astonishing. Yes, they draw more power that A55, but being faster allows them to finish the work using 2-4 less energy than A55 while also taking less time.

By the way, all this make the rumors of only two E-cores in the prosumer silicon very credible. I mean, two E-cores would offer performance comparable to that of an 2018 MacBook Air! That’s more than enough for any background work you might need to do, with less than a watt of active power draw.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
Pretty sure he's talking about M-cores, which seem like they could be useful. However, as a non-electrical engineer, M-cores seem a bit conceptually boring to me. :p

What about U-cores, that are ultra-low power? For doing things like tasks during sleep (dealing with WiFi, pre-loading eMails, siphoning RAM to SSD before switch to hibernate, reporting 'Find My' locations while shut down, etc.)

Although perhaps at that point, it's better for Apple to start throwing dedicated modules at those issues.

The issue as I see with it are as follows: a dedicated M-core would indeed allow the P-core and E-core to be even more specialized microarcheticurally to speed and efficiency. This is true. BUT, worth it? Hard to see how. The P-core uarch is already good enough to be a world beating desktop class chip and, could in theory, be clocked higher, draw more power, gain and still be be more efficient than anything from AMD and Intel. Going further on a phone would be expensive for power and area and ultimately maybe not even better. Focusing on keeping the P-core both efficient and powerful is what makes the current design so good.

On the other end of the spectrum, the U-core you describe would likely be something in-order, which, yes it draws fewer watts and takes up next to no die space, but is ... well ... crap. It's even crap ultimately on battery life because the performance is so poor relative to something like Apple's current E-cores. So yes, dedicated modules. ;) Thus it isn't clear to me what further specializing the E and P-cores gets you. (and I get the notification that @leman already basically typed all of this faster than I did :))

In contrast, going at least big-little makes sense to me as bifurcating the uarch design once does give benefits ... but adding a third level ... hmmm ... I don't see it, but I've never been a CPU designer. So you know ... me not seeing it is maybe not that meaningful!

EDIT: And as aforementioned, adding another layer adds complexity to scheduling in terms of which threads go where.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.