Power curve and efficiency of Apple N3

leman · Oct 2, 2023

Confused-User said:
Especially in the wake of Apple's weekend announcement, the chance that there is some bug that's lifting the expected limit of performance and heat is vanishingly small.

In my experience after a week with my new iPhone there seems to be some fairly random background power draw issue. E.g. few days per week the battery would drop over 20% overnight. I installed 17.1 beta yesterday and tonight the battery dropped by 6%. Could be a fluke either way, but it is possible that there is/was a software bug resulting in subpar battery life and high power usage for some people.

But the 17.1 didn't seem to change anything about peak frequency or power consumption of A17 Pro. I'm getting the same numbers.

theorist9 · Oct 2, 2023

leman said:
Thanks for trying to replicate the results, @theorist9!

Peer review

.

leman said:
The data I have shown before only shows the P-core (and I filter out any samples where P-core usage is very low to avoid sampling artefacts). Are you per chance aggregating the cycles/time across P- and E-cores? Cause that would obviously result in lower estimated frequency.

I showed two plots in my post, one where I plotted the P-core data only, and another where I plotted both. I'm not sure what you mean by aggregating cycles/times across P- and E-cores, but when I plotted both I simply included the {frequency, watt} data pairs for the E-cores.

But since you explained your plot was P-core data only, let's focus on that. Considering just the P-core data, there appear to be a lot of frequencies missing from your two A17 data files.

If take all the P-core data and sort by frequency, I get a gap between 1.344 GHz and 3.686 GHz, which you can see in the screenshot below. By contrast, your plot clearly shows you do have data in that frequency range—specifically, your plot seems to show a lot of data from ≈2.7 GHz to 3.3 GHz.

So am I missing something, or is that data in fact not present in your two A17 files? If the latter, could you please update them with the missing data so I could replicate your plot?

Again, I'm referring to these two specific files; these are the only ones I could find that contain A17 single-core data.

However, your R-code doesn't seem to be using them. Instead, it references "a17_data". So I'm guessing what I need is that file.

Xiao_Xi · Oct 2, 2023

leman said:
A17 Pro is clocked exactly so that at peak frequency the power consumption is very close to 5 watts.

Are your sure about that? How do you explain this?

leman · Oct 2, 2023

theorist9 said:
So am I missing something, or is that data in fact not present in your two A17 files? If the latter, could you please update them with the missing data so I could replicate your plot?

All the data I used is in the git repository. There are four files. Are you also using multi-core results? They have lower frequencies, so maybe that's the band you are missing?

leman · Oct 2, 2023

Xiao_Xi said:
Are your sure about that? How do you explain this?

They clearly don't get the peak power usage on that particular subtest. For peak power see my data.

theorist9 · Oct 2, 2023

leman said:
All the data I used is in the git repository. There are four files. Are you also using multi-core results? They have lower frequencies, so maybe that's the band you are missing?

Ah, that must be it—I was using the single-core files only, since your plot was power vs. frequency for a single thread. But in looking at the MC results, I can see they are showing individual per-core power consumption and frequency, and thus their data can be used as well!

It's bedtime, but I'll be sure to add that data and report back (hopefully tomorrow)....I thought it would be fun to try to find the simplest possible model that reasonably fits your data.

leman · Oct 2, 2023

theorist9 said:
Ah, that must be it—I was using the single core files only, since your plot was power vs. frequency for a single thread. But in looking at the MC results, I can see they are showing individual per-core power consumption and frequency, and thus their data can be used as well!

Yes, I could have made this a bit more clear. Sampling is always done per thread, it's just when running multi-core workload you have more threads per iteration. I tried to collect the data at the most basic level possible, which is why I am directly reporting the CPU counters instead of aggregated values.

theorist9 said:
It's bedtime, but I'll be sure to add that data and report back....I thought it would be fun to try to find the simplest possible model that reasonably fits your data.

Looking forward to your results!

theorist9 · Oct 2, 2023

Decided to stay up just a few more minutes...
EDIT: It looks like I now have your full dataset (694 points)!

leman · Oct 2, 2023

theorist9 said:
Decided to stay up just a few more minutes...
EDIT: It looks like I now have your full dataset (694 points)!

I started replying to the first version of your post but then I saw you found them

Yes, there are 700 entries for A17 Pro, six of which have less than 10% P-core time and are therefore discarded on the plots.

Xiao_Xi · Oct 2, 2023

leman said:
They clearly don't get the peak power usage on that particular subtest. For peak power see my data.

What would make one benchmark take A17 to 5W and another to 4W? Has any other benchmark pushed A17 to 5W like yours?

By the way, Geekerwan used this benchmark to make those graphs.

541.leela_r

www.spec.org

Confused-User · Oct 2, 2023

leman said:
In my experience after a week with my new iPhone there seems to be some fairly random background power draw issue. E.g. few days per week the battery would drop over 20% overnight. I installed 17.1 beta yesterday and tonight the battery dropped by 6%. Could be a fluke either way, but it is possible that there is/was a software bug resulting in subpar battery life and high power usage for some people.

This agrees with the point I was making: That there are bugs/issues that are causing excess power draw, but there's no bug that's causing it to run faster than Apple intended.

But the 17.1 didn't seem to change anything about peak frequency or power consumption of A17 Pro. I'm getting the same numbers.

Great! If the fixes are already in, then they didn't affect performance (supporting my point).

leman · Oct 2, 2023

Xiao_Xi said:
What would make one benchmark take A17 to 5W and another to 4W?

Apple Silicon features dynamic power gating and can turn off parts of the CPU that are not needed. I am running my tests in debug mode to ensure that the code touches enough elements so that the power consumption is maximised. When I compile my tests in release mode, the power consumption drops to 3 watts.

Xiao_Xi said:
Has any other benchmark pushed A17 to 5W like yours?

I'm sure some of the subtests of SPEC2017 would, but Geekerwan did not publish the data.

quarkysg · Oct 2, 2023

leman said:
Apple Silicon features dynamic power gating and can turn off parts of the CPU that are not needed. I am running my tests in debug mode to ensure that the code touches enough elements so that the power consumption is maximised. When I compile my tests in release mode, the power consumption drops to 3 watts.

I'm sure some of the subtests of SPEC2017 would, but Geekerwan did not publish the data.

Looks like Geekerwan rushed out the “review” to be one of the first few. Probably should not quote them too much. Doesn’t looks like they really understand how to correctly produce accurate analysis when benchmarking CPUs. Interesting graphs from them nonetheless.

leman · Oct 2, 2023

quarkysg said:
Looks like Geekerwan rushed out the “review” to be one of the first few. Probably should not quote them too much. Doesn’t looks like they really understand how to correctly produce accurate analysis when benchmarking CPUs. Interesting graphs from them nonetheless.

I don’t think they lack understanding or skills, it’s just they are more interested in publishing a popular video to make money rather than doing an in-depth architecture analysis. There are multiple issues I have with their video. They don’t disclose the methodology, they don’t publish the data (and the tool they uses produces fine-grained samples of performance counters), and they seem to do some weird shenanigans with their cooler that results in higher-than-usual power consumptio.

Xiao_Xi · Oct 2, 2023

quarkysg said:
Doesn’t looks like they really understand how to correctly produce accurate analysis when benchmarking CPUs.

leman said:
There are multiple issues I have with their video. They don’t disclose the methodology, they don’t publish the data (and the tool they uses produces fine-grained samples of performance counters), and they seem to do some weird shenanigans with their cooler that results in higher-than-usual power consumptio.

Did you have similar reservations with Anandtech reviews?

leman · Oct 2, 2023

Xiao_Xi said:
Did you have similar reservations with Anandtech reviews?

You mean one's done by Andrei? He was always fairly open with his methodology and I was always able to get more info out of him if I had questions, so no, I can't say that I have.

Analog Kid · Oct 2, 2023

quarkysg said:
Doesn’t looks like they really understand how to correctly produce accurate analysis when benchmarking CPUs. Interesting graphs from them nonetheless.

This is the problem... People get excited by interesting graphs even if the data they're built on is meaningless or wildly misinterpreted. I'm fed up with the "Hey guys, I did this weird thing and got weird results!" model that YouTube relies on. A good visual is far more insightful than a table of data, but a good visual can give the illusion of careful analysis where none exists.

I'm much more interested in the extended discussion and openness happening here. Kudos to @leman for remaining involved in the discussion and supporting @theorist9 in reviewing the data.

Analog Kid · Oct 2, 2023

leman said:
I am running my tests in debug mode to ensure that the code touches enough elements so that the power consumption is maximised. When I compile my tests in release mode, the power consumption drops to 3 watts.

Worth highlighting as one of many examples of how a software bug can cause excessive power consumption. Here we have the same functionality consuming what looks to be 60% more power because of a difference in compiler settings.

Not technically a bug in this case, as the setting was intentional, but the point stands.

leman · Oct 2, 2023

Analog Kid said:
Worth highlighting as one of many examples of how a software bug can cause excessive power consumption. Here we have the same functionality consuming what looks to be 60% more power because of a difference in compiler settings.

Not technically a bug in this case, as the setting was intentional, but the point stands.

Yep, and these things can make a big impact on a CPU like Apple's with fine-grained power gating (not that there are more CPUs like that on the market). The debug version of the code has more branches and stack load/stores, I suppose this is what makes the difference.

name99 · Oct 2, 2023

Xiao_Xi said:
What would you call the Geekbench points/W rate?

The @leman graph shows that A17 is more energy efficient than A16. But it is misleading to say that A17 scores 400 GB6 points more than A16 and is still more efficient, because it is not more efficient at Geekbench.

It doesn't matter how Cortex-X4 performs, only whether or not it can be considered a "commercial CPU architecture". If it is, then Cortex-X4 is wider than A17. Otherwise, A17 is the widest "commercial CPU architecture"".

There IS no "geekbench points/watt" number! What number would you use for the denominator? The wattage drawn by the benchmark may well vary *substantially* during the course of the benchmark. Compare an FP/SIMD heavy benchmark with one that is primarily limited by DRAM latency...
Do you have an accurate graph of this wattage over time, for iPhone or anything else?

Geekbench points/joule does make sense; it's essentially a proxy for the energy delay product, as I have pointed out on multiple occasions. But you are choosing to talk about GB6/W NOT GB6/J!

Your constant confusion about these issues is why no-one thinks you are being honest. Either you know nothing about the relevant physics (in which case why do you feel it's important to have opinions about something you don't understand?) or you just don't care about the engineering/physics issues, even though they have been explained on multiple occasions.

name99 · Oct 2, 2023

[deleted] the issue has been explained by use of ST vs MT benchmarks

name99 · Oct 2, 2023

Xiao_Xi said:
What would make one benchmark take A17 to 5W and another to 4W? Has any other benchmark pushed A17 to 5W like yours?

By the way, Geekerwan used this benchmark to make those graphs.

541.leela_r

www.spec.org

Oh FFS.
If one benchmark consists of multithreaded AMX, and the other consists of single-threaded pointer chasing, exactly what do YOU expect in terms of power draw???

Xiao_Xi · Oct 2, 2023

name99 said:
Your constant confusion about these issues is why no-one thinks you are being honest.

I'm learning, so I make mistakes from time to time. But more knowledgeable people like you usually point out my mistakes, so we can all learn.

theorist9 · Oct 2, 2023

leman said:
Sorry, typo, it was supposed to be "5 watts". What I meant is that the A17 Pro is clocked exactly so that at peak frequency the power consumption is very close to 5 watts. This appears to be a number chosen by a human rather than a random result. But maybe I'm interpreting too much into all of this.

The daughter of a member of the design team is competitive equestrian, and her birth date is 6/7/05, so he convinced the rest of the team the target should be 0.006705 hp, which just coincidentally happens to equal 5 W.

[Yes, I made that up!]

theorist9 · Oct 2, 2023

My modeling of @leman's frequency vs. power data for the A17 Pro Performance Core (single thread):

The basic theoretical models for frequency vs. power follow a power law (real world is obviously more complicated), so I started by looking for that. The easiest way to check if your data follows a power law (something of the form f(x) = a x^b) is to plot it on a log-log plot, and see if your data follows a straight line (log-log plots linearize power laws). If it does, the slope will be equal to the value of the exponent.

When I did that, I didn't see a single straight line, but rather what appear to be three different scaling regimes. Fitting each of these to its own simple power law gave these results:

Low Frequency (1.09 GHz to 1.34 GHz, 344 data points, green): p(f) = 0.4 * f^1.2
Middle Frequency (2.73 GHz to 3.38 GHz, 295 data points, blue): p(f) = 0.2* f^2.4
High Frequency (3.45 GHz to 3.78 GHz, 55 data points, red): p(f) = 0.07 * f^3.2

Note1: I've rounded all the parameter values in this post for readability, but because of the sensitivity of these equations to those values (especially the equations shown later), you won't be able to recover these plots from these equations—you'll need a lot more digits. If anyone wants these, LMK, and I'll add them to this post.

Note 2: All equations were fitted using Mathematica's NonlinearModelFit function, with a Weighted Least Squares (WLS) minimization, and a (probably excessive) internal precision of 100 digits (to avoid rounding errors). For more details, see "Note 2, extended", at bottom.

If we expand the graph, we find the high-frequency curve extrapolates to 13 watts at 5 GHz:

Now you might argue, reasonably, that when you go from 4 GHz to 5 GHz, yet another scaling regime will come into effect, with an even higher slope, leading to a power consumption >13 watts at 5 GHz. And that's essentially what leman got when he fit the whole curve, which is effectively a prediction of how the scaling exponent will continue to increase as the frequency increases (yielding a predicted power consumption of 15 watts at 5 GHz).

So given that we have no knowledge of what the next scaling exponent will be, a polynomial fit, like what leman did, seems the best we can do at this point.

Having said that, just for fun, we can play with math to see if we can get a good overall fit with fewer paramaters than what leman used as a starting point. IIUC, he fit the data to a polynomial of the form:
p(f) = a + b f + c f^2 + d f^3, + e f^4. I.e., his model uses five parameters.

With a polynomial, the simplest equation I managed to find that gave a good fit had three parameters:
p(f) = 0.2 + 0.2 f^2 + 0.0006 f^6. Like leman's equation, this predicts 15 watts at 5 GHz. It's plotted immediately below.

Note that, unlike my first model, this one isn't directly physical (I don't think there is any f^6 power scaling going on); it's simply the math that gives the best polynomial fit with the fewest parameters (that I could find). I also don't think leman's quartic model is directly physical either—while it nicely tracks the values, I don't see any evidence of f^4 power scaling in this data. [Yes, given the trend, it may not be surprising to see that in the next scaling regime, since thus far we've gone from 1.2⟶2.4⟶3.2; but it's not present in this data.]

I.e., I think the best way to understand these two polynomial equations (my sixth-power and leman's quartic) are that they use a single higher-order polynomial to model what's actually going on, which is a successively-increasing set of lower-order power law behaviors. This also applies to the exponential I show at the end.

Note: The following are linear (i.e., not log-log) plots, which is why you can see the curve.

p(f) = 0.2 + 0.2 f^2 + 0.0006 f^6

If one is willing to accept a modest reduction in quality-of-fit, one can reduce the number of parameters even further, to two, using an exponential. The exponential is a bit stronger than the polynomial, giving a predicted power of 16 w at 5 GHz:

p(f) = 0.2 e^(0.9 f)

Note 2, extended: All equations were fitted using Mathematica's NonlinearModelFit function, with Weighted Least Squares (WLS) minimization. Specifically, instead of minimizing the sum of the squares of the residuals (OLS = ordinary least squares), I minimized Sum[(residual/value)^2]. I.e., I minimized the squares of the relative errors rather than the squares of the absolute errors. The latter is only appropriate when the error is expected to be independent of the size of the data (as is found in homoskedastic data). However, I've found that, more typically, the error increases in proportion to the size of the data. If I were being paid to do this I would have done a formal test of the distribution of the residuals. But since I'm not, I just did both and determined which gave the better-looking fit (or if they were comparable, I stuck with WLS for consistency). With this approach, I ended up using WLS for everything (not that it made much of a difference in these cases -- the visual differences between the two are subtle).

Power curve and efficiency of Apple N3

macrumors Core

macrumors 601

macrumors 68000

macrumors Core

macrumors Core

macrumors 601

macrumors Core

macrumors 601

macrumors Core

macrumors 68000

macrumors 6502a

macrumors Core

macrumors 65816

macrumors Core

macrumors 68000

macrumors Core

macrumors G3

macrumors G3

macrumors Core

macrumors 68030

macrumors 68030

macrumors 68030

macrumors 68000

macrumors 601

macrumors 601

Our Staff