Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
By the way, @cmaier, when I look at all these die shots I am always surprised how little space the core components actually occupy. What are all the unannotated areas for?

View attachment 1856598

Do you folks just skip over the content of the anandtech articles and jump right to the benchmarking charts?


2021-09-14%2020_04_51.jpg


5 things listed on the right had side and all want to gush about is one of those. ( System cache is far easier to eyeball in being large and regular. ).

A hardware ProRes 422 de/encoder. Are they really going to be able to get 100% reuse out of the H.265 de/en coder?
If take a narrow subset of Afterburner FPGA and turn that into fixed function application specific circuits with with their own intermediate data workspace space storage that would consume space.


Better camera processor ... probably doesn't come for 'free". the image "processor" time slices through 3-4 cameras or has different pipelines for each one (to make 'real time" processing constraints).


The application specific circuits aren't necessarily to be the same across different implementors.
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
And looking at Andrei‘s review, the efficiency of A15 E-cores is simply astonishing. Yes, they draw more power that A55, but being faster allows them to finish the work using 2-4 less energy than A55 while also taking less time.

However, somewhat indicative of why the Apple Watch processor is "stuck" still using A13 e-core derivatives. ( still S6 which is a clock bumped S5).




By the way, all this make the rumors of only two E-cores in the prosumer silicon very credible. I mean, two E-cores would offer performance comparable to that of an 2018 MacBook Air! That’s more than enough for any background work you might need to do, with less than a watt of active power draw.

Where does that comes from? There is a 11890HK on the graph in the article (6.67). There is no Core i5-8210Y directly on the graph but if the Amber Lake Y processor was just 40% of the speed of the 1189HK that would be 2.7 score... which would be higher than the A15 E score. (close but higher).

SPECint-energy_575px.png



It is rough ballpark (can't find a Spec2017 score) , but the 11890HK Geekbench 5 score is 1114 and the 8210Y is 660 which would be 59%. Even if round that down to 50% would be an estimated score around 3.3 . That would put the 8210Y in the A78's range not the A77's.

The 8210Y does drop off more substantively if go multicore. More so in the 45-7% range (and hence closer : estimated score around 3.00-3.13). That is more "hand grenade " than "horse shoe" close.


However, even with a 2.5 score just having 2 E cores and perhaps one half-time P core would be enough to run a MPB in a coffeeshop writing emails and causally listening to podcast or streaming some video. (like a MBA 2018 could).
Apple didn't need an E core that did "twice as much work" for the MBP 16". There is lots of power savings in just dumping the dGPU and VRAM. Even if the new MBP 14 and MBP 16 come in at the same battery life as the current models it would probably still be an overall system performance win. [ Apple's super low bar battery test of just watching video.... that doesn't need 4 E cores to work. The vast bulk of the work is being done by the fixed function video decoder. 2 E cores is plenty to keep that going. If the user is doing next to nothing (just gawking at the streaming video) don't particularly need a P core active at much at all. 2 E cores won't hurt that battery metric all.
Bigger factor when measuring run time when have more of the P cores lit up with heavy workload. ]


2 e cores more so because probably going to run out of transistor budget. ( GPU cores being bigger hogs and system level cache behind them in hogging more space). The 14" MBP probably takes the "chop" die and GPU don't bleed as much power. but for the MBP 16" is about getting performance ; not trying to exactly match the MBA run time.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,679
Where does that comes from? There is a 11890HK on the graph in the article (6.67). There is no Core i5-8210Y directly on the graph but if the Amber Lake Y processor was just 40% of the speed of the 1189HK that would be 2.7 score... which would be higher than the A15 E score. (close but higher).

It comes from me misinterpreting some data and not reading things carefully :)

Anyway, from SPEC results the E-cores offer around 30% of the P-core performance, so their Geekbench single should be in the ballpark of 550-580 (and a 2-cluster total around 1000). That would put it around 20-30% slower than the 8210Y.
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
If you could get such improvements from doubling the SLC, everyone would do it, particularly since everyone other than Apple would have to invest far fewer transistors. Andrei points out that Qualcomms 888 only has 3MB of SLC and the Exynos 2100 seems to have 6MB, (so he is chewing on these issues), it would be a total no-brainer. But this isn't the type of improvements you typically see from a doubled cache size.

Does the 888 and 2100 really have that kind of copious extra transistor budget? The SoCs have embedded 5G radio base units in them. Apple has none ( Apple has some other stuff. Bigger camera and bigger NPU. AMX ). there is also a cap on just how "custom" can take the A78,A77,X1 cores and the bandwidth of the on die interconnect. Even Arm's X1 only maxes out at 8MB L3. In part, that is for "space" on the die designs that are also trying to squeeze in radios and also keep the SoC die price down (area optimization a priority. That why only getting one X1 and three tiers of core sizes. The X2 can scale up to 16MB L3 ... but are any of these phone SoCs going to use more than one X2 core. The laptop SoCs have a better chance to doubling up in their implementations if given substantively larger die size allocations. ).


Additionally, Andrei also points out that the A15 sandwiched up on the 5G modem is a thermal coupling. Well, putting the 5G modem on the same die is an even bigger thermal coupling. Personally, I'll be surprised if Apple puts their modem on the die. Multi chip module in the same package ? No. Same die? Yes. In part, that's because Apple "super sizes" things like cache and un-core fixed function application specific logic. Multiple chips can nelp spread the thermal load out a bit.

There is an upside and downside of having a discrete modem chip. Apple has to handle the power overhead of talking to a discrete component. However, they also get upside in that they are substantively less transistor budget constrained because that is off chip ( can throw "more" at the CPU and GPU and System caches). [ Also development cycles are less coupled. So easier project management/coordination. ]



P.S. the "new" X2 L3 foundation.

 
Last edited:

cmaier

Suspended
Original poster
Jul 25, 2007
25,405
33,474
California
Does the 888 and 2100 really have that kind of copious extra transistor budget? The SoCs have embedded 5G radio base units in them. Apple has none ( Apple has some other stuff. Bigger camera and bigger NPU. AMX ). there is also a cap on just how "custom" can take the A78,A77,X1 cores and the bandwidth of the on die interconnect. Even Arm's X1 only maxes out at 8MB L3. In part, that is for "space" on the die designs that are also trying to squeeze in radios and also keep the SoC die price down (area optimization a priority. That why only getting one X1 and three tiers of core sizes. The X2 can scale up to 16MB L3 ... but are any of these phone SoCs going to use more than one X2 core. The laptop SoCs have a better chance to doubling up in their implementations if given substantively larger die size allocations. ).


Additionally, Andrei also points out that the A15 sandwiched up on the 5G modem is a thermal coupling. Well, putting the 5G modem on the same die is an even bigger thermal coupling. Personally, I'll be surprised if Apple puts their modem on the die. Multi chip module in the same package ? No. Same die? Yes. In part, that's because Apple "super sizes" things like cache and un-core fixed function application specific logic. Multiple chips can nelp spread the thermal load out a bit.

There is an upside and downside of having a discrete modem chip. Apple has to handle the power overhead of talking to a discrete component. However, they also get upside in that they are substantively less transistor budget constrained because that is off chip ( can throw "more" at the CPU and GPU and System caches). [ Also development cycles are less coupled. So easier project management/coordination. ]



P.S. the "new" X2 L3 foundation.


There is little benefit to putting the radio on the same chip as the CPUs. First, other than cost, there is no reason to do so. The communications between the CPU and the modem are not very high bandwidth and don’t require very low latency. Second, when you tune a fab process for digital logic like CPUs, it is seldom optimal for analog devices like radios. So you’d ideally use a different fab process for each.
 

JMacHack

Suspended
Mar 16, 2017
1,965
2,424
The vast majority of conditional branches are loops:

x = 0
A: x = x + 1

if x < 100 go to A


So most of the time, if the branch is backwards, it is a taken branch (because the loop counter hasn’t hit its final value). That fun fact actually gets you pretty far.
Okay I’m lost, what the heck is a backwards branch?
 

smulji

macrumors 68030
Feb 21, 2011
2,997
2,889
It doesn’t cost a billion to tape out a 7nm chip. Not sure what is supposed to be included in that billion dollar figure, but, just, nope.

An inefficient team designing one of these, and taking a year to do it, might have, say, 300 people on it. If each one gets $600,000 in salary and benefits and overhead costs (they don’t), that’s only $180M. In reality, a variation of an existing SoC takes a lot fewer than 300 people a lot less than a year. (And the cost of each employee, on average, is a lot less than $600 grand)
$600,000 a year? Damn!
 

smulji

macrumors 68030
Feb 21, 2011
2,997
2,889
No need. Apple already has a “developed” game library, and that’s not going away in a couple years. If the new SoCs are as good as the expectations, AAA titles will follow. I’d hope that Apple would help game studios along, but who knows.
Problem is that Metal is the GPU related API’s supported on Apple Silicon. So far developer support for that is nowhere near DirectX, CUDA or Vulkan
 

leman

macrumors Core
Oct 14, 2008
19,521
19,679
Problem is that Metal is the GPU related API’s supported on Apple Silicon. So far developer support for that is nowhere near DirectX, CUDA or Vulkan

I don’t see this as a practical hurdle. Most games don’t use these GPU directly anyway, and middleware engines nowadays have good Metal support. Games using Vulkan can leverage MoltenVK, and the performance is very good most of the time. The important pro apps already use Metal.

As (and if) the situation develops and interest in these things increases, there will undoubtedly be more attention to Metal and Apple GPUs, and devs will start using Apple‘s unique features more often. There is a lot to gain, both for the dev and for the customer.
 

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,866
In Srouji’s role?
Let's be clear what that role actually is: it's almost certainly something like "a manager whose direct reports are managers whose direct reports are project managers". I don't know how deep Apple's management structure is, but a SVP who reports to Tim Cook shouldn't be too involved in anything but the highest level technical decisions.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
I don’t know who golden reviewer is but apparently he gets more similar efficiency numbers to the original Chinese bench-marker - in that the A15 is less efficient than the A14 - and claims this is similar to other testers.


Andrei respond here:


Discussion continues afterwards.

This is the first time I can think of that there’s a major disagreement over efficiency estimates, which is interesting considering how hard it is to do correctly. Personally I think Andrei is probably right … but still be good to know what is going on.
 

nquinn

macrumors 6502a
Jun 25, 2020
829
621
[ Apple's super low bar battery test of just watching video.... that doesn't need 4 E cores to work. The vast bulk of the work is being done by the fixed function video decoder. 2 E cores is plenty to keep that going. If the user is doing next to nothing (just gawking at the streaming video) don't particularly need a P core active at much at all. 2 E cores won't hurt that battery metric all.
Bigger factor when measuring run time when have more of the P cores lit up with heavy workload. ]
Except for AV1 video which they decided not to support with hardware decoding =/
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
... The vast bulk of the work is being done by the fixed function video decoder. ...
Except for AV1 video which they decided not to support with hardware decoding =/

ProRes 422 en/decoding focuses on context creation more so than content consumption. ( ProRes is a horrible content delivery format). Apple "upselling" more folks into buying a iPhone 13 Pro ( or Pro Max) to create content deliveries more money to Apple profit coffers.

I don't think AppleTV+ uses AV1. Netflix and Youtube who are the primary folks "pushing" AV1are on the "frienemies" list at this point since Apple is competing with them. Apple will probably "drag their feet" on that codec until the next iteration or two. ( there was no process shrink on this update so the A15 is substantively bigger. So in part Apple is paying "extra" for the bigger die to get ProRes 422 out on this iteration. When there is a process shrink and Apple can add Av1 and keep the die size the same (or smaller) then that is more of a "free" buy-in for Apple. I would expect them to do this when it is "almost free" than may be N4 or N3. N3 would be "more free/lower cost" so they could wait that long. ]

But yes ... an open , royalty free codec what is taking Apple so long to get on the bandwagon? ( yet again turned into opportunity for Apple to make more money itself. )
 

Gnattu

macrumors 65816
Sep 18, 2020
1,107
1,672
But yes ... an open , royalty free codec what is taking Apple so long to get on the bandwagon? ( yet again turned into opportunity for Apple to make more money itself. )
Netflix and Youtube who are the primary folks "pushing" AV1are on the "frienemies" list at this point since Apple is competing with them.

The most interesting part is that, Apple is a founding member of AOMedia, the organization that make AV1 happen. Apple is standing behind AV1, not fighting against.
 
  • Like
Reactions: ikjadoon

leman

macrumors Core
Oct 14, 2008
19,521
19,679
The most interesting part is that, Apple is a founding member of AOMedia, the organization that make AV1 happen. Apple is standing behind AV1, not fighting against.

Do they though? It really seems like they are pushing their HEVC instead. I mean, they even adopted HEIF as their default image format. I don't think that the codec wars are anywhere close to an end. At any rate, this seems to be a purely political issue...
 

Gnattu

macrumors 65816
Sep 18, 2020
1,107
1,672
Do they though? It really seems like they are pushing their HEVC instead. I mean, they even adopted HEIF as their default image format. I don't think that the codec wars are anywhere close to an end. At any rate, this seems to be a purely political issue...
They join the AOMedia the same year they announced the support of HEVC support. This is important because Apple owns a few MPEG patent, by join the AOMedia, AV1 can use those patents for free which can be critical for its success.

Alternate codec is a political issue, but it does not preventing Apple to implement the hardware decoder and give some developers the ability to use an alternate codec, like Google get entitled to use VP9 and the hardware support is present way earlier than the day that iOS can use VP9. iOS support for VP9 starts at iOS 14, but hardware released before iOS 14 still can get hardware support for VP9. Developers have to be given a special entitlement to use codec other than HEVC.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
I don’t know who golden reviewer is but apparently he gets more similar efficiency numbers to the original Chinese bench-marker - in that the A15 is less efficient than the A14 - and claims this is similar to other testers.


Andrei respond here:


Discussion continues afterwards.

This is the first time I can think of that there’s a major disagreement over efficiency estimates, which is interesting considering how hard it is to do correctly. Personally I think Andrei is probably right … but still be good to know what is going on.

Andrei goes over why some in the community are getting the power wrong as well as the limitations in his own methodology:


A very interesting read! Basically agrees that certain power profiles can be a problem for him where it is obvious the iPhone is drawing from the battery as well as USB *but* the larger issue for others is that they are not measuring power with fine enough sampling so they are missing huge swings in power usage in between their samples and getting incorrect averages as a result.
 

Krevnik

macrumors 601
Sep 8, 2003
4,101
1,312
Do they though? It really seems like they are pushing their HEVC instead. I mean, they even adopted HEIF as their default image format. I don't think that the codec wars are anywhere close to an end. At any rate, this seems to be a purely political issue...

Apple does have a (not fully public) VP9 hardware decode block in the A14/M1 that Google has been given access to.

AV1 is relatively new to the scene trying to replace VP9, and the main benefit of it over HEVC/H.265 is the lack of royalties and some smaller efficiency gains (nothing like the H.263 -> H.264, or the H.264 -> H.265 jumps). On one hand, I suspect Apple is interested in not having to pay as much in royalties in the long term so long as they also have a seat at the table, but they also are not in a huge rush to move over. They have a lot of devices out in the world that support H.264 or H.265, and it’d be a long process to migrate things without leaning on software decode to achieve it.
 

Andropov

macrumors 6502a
May 3, 2012
746
990
Spain
the larger issue for others is that they are not measuring power with fine enough sampling so they are missing huge swings in power usage in between their samples and getting incorrect averages as a result.
I'm not that convinced with what he's saying. He cites the Nyquist theorem as an explanation of why using sampling frequencies well below the Nyquist threshold would result in a wrong averaged, but that's not true. Frequencies below the Nyquist threshold prevent you from accurately recreating the waveform. But the averaged power? That's not really a problem. At least not to an extent that would explain the huge differences between the two reviewers. And, either way, if it really were a sampling frequency issue, wether it would end up measuring higher or lower than the real power consumption would be a coin toss. If the other reviewer is consistently getting higher power usage figures... something else is going on.

That said, I think Andrei's figures must be the correct ones. I just have no idea on why/how the other reviewer is getting theirs wrong.

EDIT: Well it looks like the other reviewer is not just sampling at sub-Nyquist frequencies, he's measuring every few seconds so... but still weird if he consistently gets higher results.
 
Last edited:

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
I'm not that convinced with what he's saying. He cites the Nyquist theorem as an explanation of why using sampling frequencies well below the Nyquist threshold would result in a wrong averaged, but that's not true. Frequencies below the Nyquist threshold prevent you from accurately recreating the waveform. But the averaged power? That's not really a problem. At least not to an extent that would explain the huge differences between the two reviewers. And, either way, if it really were a sampling frequency issue, wether it would end up measuring higher or lower than the real power consumption would be a coin toss. If the other reviewer is consistently getting higher power usage figures... something else is going on.

That said, I think Andrei's figures must be the correct ones. I just have no idea on why/how the other reviewer is getting theirs wrong.

EDIT: Well it looks like the other reviewer is not just sampling at sub-Nyquist frequencies, he's measuring every few seconds so... but still weird if he consistently gets higher results.

Agreed, it should result in noisier data but not necessarily biased data unless there is something else going on too.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.