What microarchitecture choices have made M2 less efficient than M1?

Analog Kid · Aug 1, 2022

theorist9 said:
If you set RC = 1/(2f), then you get that power is linear instead of cubic in f, and that applies at all frequencies:

View attachment 2037812
What is typical (or is there no typical) real-world CPU power-frequency scaling behavior? Do CPU's typically have at least two frequency regimes, one up to which the power scaling is favorable and one above which it is not?

At the analysis point, a specific value of f you want to analyze around, RC=1/(2f), but you're looking at a range of f values above and below that. So if you're interested in what the behavior is around 3GHz, you'd set RC to 1/(6*10^9) and then evaluate:

P=(fCV^2) / (1 - e^(-6*10^9 / (2f)))^2

This oh so very loosely describes the power consumption of a transistor designed to operate at 3GHz but clocked above and below that value.

I've been looking for an opportunity to play with GeoGebra, but it's not as flexible as I'd hoped. It forces 1:1 axes, and doesn't like numbers more or less than 10^±10, so this isn't the greatest view, but I removed the CV^2 term (C is on the order of 10^-15) and added a 0.1 scaling factor to make it visible on the 1:1 axes.

I tried manually fitting a simple quadratic (black) and cubic (blue) to the function (magenta), but the curvature doesn't fit terribly well. It falls somewhere between quadratic and cubic at higher clocks and both fits underestimate the power at lower clocks. I did try f^2.5, and the results aren't much different. Maybe Mathematica can help you massage it to a better fit with more terms, but the underlying function isn't a simple power function so it will always be an imperfect fit.

But it all depends on the goal. In this case, I think the goal is just to have a mental framework to reason within, so I'd say "power increases with between the square and the cube of frequency" and leave it at that. We've simplified the problem so much by this point that anything more detailed is really just false precision.

theorist9 · Aug 1, 2022

Analog Kid said:
At the analysis point, a specific value of f you want to analyze around, RC=1/(2f), but you're looking at a range of f values above and below that. So if you're interested in what the behavior is around 3GHz, you'd set RC to 1/(6*10^9) and then evaluate:

P=(fCV^2) / (1 - e^(-6*10^9 / (2f)))^2

This oh so very loosely describes the power consumption of a transistor designed to operate at 3GHz but clocked above and below that value.

I've been looking for an opportunity to play with GeoGebra, but it's not as flexible as I'd hoped. It forces 1:1 axes, and doesn't like numbers more or less than 10^±10, so this isn't the greatest view, but I removed the CV^2 term (C is on the order of 10^-15) and added a 0.1 scaling factor to make it visible on the 1:1 axes.

View attachment 2037891
I tried manually fitting a simple quadratic (black) and cubic (blue) to the function (magenta), but the curvature doesn't fit terribly well. It falls somewhere between quadratic and cubic at higher clocks and both fits underestimate the power at lower clocks. I did try f^2.5, and the results aren't much different. Maybe Mathematica can help you massage it to a better fit with more terms, but the underlying function isn't a simple power function so it will always be an imperfect fit.

But it all depends on the goal. In this case, I think the goal is just to have a mental framework to reason within, so I'd say "power increases with between the square and the cube of frequency" and leave it at that. We've simplified the problem so much by this point that anything more detailed is really just false precision.

It sounds like you're saying it's more physically meaningful to choose a constant value for RC based on RC=1/(2f), rather than setting RC equal to 1/(2f). If so, and you use f=3 GHz, you get this, which is the expression you plotted:

If I set Cdyn and Vgate to arbitrary constants to get the scaling behavior (as above, I used Cdyn = Vgate = 1), then I get approximately quadratic scaling between 1.5 GHz and 6 Ghz. The plot below compares the above equation (blue) to a power law with P ~ f^2.15 (dashed red):

But, again, do you know what kind of power vs. frequency scaling we see in real processors?

Analog Kid said:
I've been looking for an opportunity to play with GeoGebra, but it's not as flexible as I'd hoped. It forces 1:1 axes, and doesn't like numbers more or less than 10^±10, so this isn't the greatest view, but I removed the CV^2 term (C is on the order of 10^-15) and added a 0.1 scaling factor to make it visible on the 1:1 axes.

You should pick up Mathematica

. That's what I've been using here. It also gives very precise control over numerical accuracy. Plus it's fun.

theorist9 · Aug 1, 2022

altaic · Aug 1, 2022

Analog Kid said:
At the analysis point, a specific value of f you want to analyze around, RC=1/(2f), but you're looking at a range of f values above and below that. So if you're interested in what the behavior is around 3GHz, you'd set RC to 1/(6*10^9) and then evaluate:

P=(fCV^2) / (1 - e^(-6*10^9 / (2f)))^2

This oh so very loosely describes the power consumption of a transistor designed to operate at 3GHz but clocked above and below that value.

I've been looking for an opportunity to play with GeoGebra, but it's not as flexible as I'd hoped. It forces 1:1 axes, and doesn't like numbers more or less than 10^±10, so this isn't the greatest view, but I removed the CV^2 term (C is on the order of 10^-15) and added a 0.1 scaling factor to make it visible on the 1:1 axes.

View attachment 2037891
I tried manually fitting a simple quadratic (black) and cubic (blue) to the function (magenta), but the curvature doesn't fit terribly well. It falls somewhere between quadratic and cubic at higher clocks and both fits underestimate the power at lower clocks. I did try f^2.5, and the results aren't much different. Maybe Mathematica can help you massage it to a better fit with more terms, but the underlying function isn't a simple power function so it will always be an imperfect fit.

But it all depends on the goal. In this case, I think the goal is just to have a mental framework to reason within, so I'd say "power increases with between the square and the cube of frequency" and leave it at that. We've simplified the problem so much by this point that anything more detailed is really just false precision.

If the capacitance is on the order of 10^-15 F, the resistance would be in the 10s of kOhms for RC to be on the order of 10^-10 s. Is the gate resistance really that high?

Analog Kid · Aug 2, 2022

theorist9 said:
It sounds like you're saying it's more physically meaningful to choose a constant value for RC based on RC=1/(2f), rather than setting RC equal to 1/(2f). If so, and you use f=3 GHz, you get this, which is the expression you plotted:

Yep, that's what I meant. Assume the physical design is constant if you want to see how over and underclocking will impact power. Also, we don't know the physical process parameters (I'd imagine TSMC holds them very close to their collective chests), but the M-series is an existence proof of a 3GHz design that we can assume was well optimized, so it's a way of inferring the parameters from the evidence.

theorist9 said:
If I set Cdyn and Vgate to arbitrary constants to get the scaling behavior (as above, I used Cdyn = Vgate = 1), then I get approximately quadratic scaling between 1.5 GHz and 6 Ghz. The plot below compares the above equation (blue) to a power law with P ~ f^2.15 (dashed red):

Yeah, that lines up with "between square and cubed" if you're looking to avoid the extra math, and don't need to extrapolate too far from the where the fit was done. It looks like maybe 15-30% error on the low frequency end, a bit better on the high side? I'd guess another octave in either direction will look a bit worse. I can't imagine you'd ever raise the clock 4x from the design point though unless you're looking for YouTube views and some liquid nitrogen under the sink...

theorist9 said:
But, again, do you know what kind of power vs. frequency scaling we see in real processors?

It's hard to answer that with anything but "it depends" but this is a reasonable model for a digital circuit under modest clock rate changes. I haven't done a lot of experimentation myself with overclocking PC processors like this. I did some digging through overclocking threads to see what data I could find and, anecdotally, it looks like power increases a bit more than the square of the voltage, but then it gets murky what's CPU power versus other stuff on the board.

Remember, this all started by trying to understand a rule of thumb-- and yes that rule of thumb for a single transistor is a reasonable place to start from given how little we know about the process or the logic. The engineering team will be estimating power consumption with physical simulations that will include physics information about the process parameters from TSMC and logic information about toggle rates of the various transistors inside the design when operating.

P=fCV^2 is probably as close to right as we'll get without access to those simulations though. I think you might be looking for a more exact formula than you'll be able to find.

Oh, and since this is in a thread about differences between M1 and M2, the process parameters changed between N5 and N5P, so changing clock rates isn't the only variable in play.

theorist9 said:
You should pick up Mathematica . That's what I've been using here. It also gives very precise control over numerical accuracy. Plus it's fun.

Yeah, I've seen people put it to really good use, and I've tried going to Mathematica several times, but I always get frustrated with it. It's not really a CAS, it's a programming language all its own and as nostalgic as the LISP-like symbolic processing is, I've always felt like I spent more time learning the system than solving problems. I keep bouncing between systems, but currently tend to favor Jupyter/Python and sympy. It doesn't have the sexy 3D plotting that Mathematica likes to show, but it's turned out I don't need that so much...

I tried Maple for a while too, and had the same experience.

altaic said:
If the capacitance is on the order of 10^-15 F, the resistance would be in the 10s of kOhms for RC to be on the order of 10^-10 s. Is the gate resistance really that high?

It's not the resistance of the gate, but the resistance to the gate that matters.

I'm really not familiar with the details of these bleeding edge processes, but as an order of magnitude, this doesn't seem crazy. I'm just spitballing the gate capacitance, as a start:

RC=1/6e9
C=10e-15
R=1e14/6e9=~17kΩ

R=r*L/A

Resistance is resistivity times length over cross section of the wire. Copper at 100C, the apparent operating temp of an M1, has a resistivity of about 2.2e-8 Ωm. It's hard to find estimates of interconnect width at 5nm, but something on the order of 10nm, so a cross sectional area of something like 1e-16.

L= RA/r = 17e3*1e-16/2.2e-8 = 75µm

So a 75µm wire into a 10fF capacitor has a time constant of 1/6e9. The M2 die is something like 12,000µm across, so the length doesn't seem crazy given that very few routes will be dead straight. This also ignores the source resistance of the driving stage and the resistance at the silicon interface.

Every transistor in the chip is different but the design has to accommodate the worst case timing path under worst case conditions.

theorist9 · Aug 2, 2022

Analog Kid said:
P=fCV^2 is probably as close to right as we'll get without access to those simulations though. I think you might be looking for a more exact formula than you'll be able to find.

Not really; just trying to understand what the scaling behavior of that model was in terms of frequency, and how closely real processors (at least roughly) obey it—all of which you addressed.

Analog Kid said:
It's hard to answer that with anything but "it depends" but this is a reasonable model for modest clock rate changes.

All the M1 devices—from the Air to the Studio—have the same 3.2 GHz clock speed. Some have suggested that Apple wasn't able to offer any sort of Intel-style turbo boost, even in boxes that could handle the thermals (e.g., allowing the CPU in the Studio to go to 4.0 GHz with one core, 3.8 GHz with up to two cores, 3.5 GHz with up to four cores, and so on) (the specific numbers aren't important here), because the processor was designed to operate efficiently only up to a certain clock speed, and above that there are significant efficiency losses—the implication being that the M1 has two markedly different power-frequency scaling reigmes in the GHz range (e.g., that it goes from, say, f^2.3 to f^3.6 when you exceed 3.2 GHz). Does this seem plausible?

Certainly, since most day-to-day tasks are single-threaded, and since it's mostly ST performance that determines how responsive and snappy a machine feels, higher ST performance would be very nice.

Analog Kid said:
Yeah, I've seen people put it to really good use, and I've tried going to Mathematica several times, but I always get frustrated with it. It's not really a CAS, it's a programming language all its own and as nostalgic as the LISP-like symbolic processing is, I've always felt like I spent more time learning the system than solving problems. I keep bouncing between systems, but currently tend to favor Jupyter/Python and sympy. It doesn't have the sexy 3D plotting that Mathematica likes to show, but it's turned out I don't need that so much...

I tried Maple for a while too, and had the same experience.

There's a certain big-picture logic to the program that one needs to understand as a starting point, and that can be laid out in about an hour. Yet none of the documentation or online help provides this, leaving most new users feeling confused (like I was when I learned it). When I TA'd a group of stat mech grad students I offered them a 90 minute Mathematica training session in which I provided exactly that, and at the end they all had enough basic fluency to start using it for their next homework set.

For numerical work I'd imagine Python is as good as Mathematica and probably faster, but I suspect Mathematica is still the best, overall, for symbolic math. I've read that SymPy is more a substitute for MatLab (which, according to the CW, is stronger than Mathematica for numerics but weaker for symbolics) than Mathematica. [I haven't used either SymPy or MatLab, so this is just hearsay on my part

.]

Analog Kid · Aug 2, 2022

theorist9 said:
All the M1 devices—from the Air to the Studio—have the same 3.2 GHz clock speed. Some have suggested that Apple wasn't able to offer any sort of Intel-style turbo boost, even in boxes that could handle the thermals (e.g., allowing the CPU in the Studio to go to 4.0 GHz with one core, 3.8 GHz with up to two cores, 3.5 GHz with up to four cores, and so on) (the specific numbers aren't important here), because the processor was designed to operate efficiently only up to a certain clock speed, and above that there are significant efficiency losses—the implication being that the M1 has two markedly different power-frequency scaling reigmes in the GHz range (e.g., that it goes from, say, f^2.3 to f^3.6 when you exceed 3.2 GHz). Does this seem plausible?

Certainly, since most day-to-day tasks are single-threaded, and since it's mostly ST performance that determines how responsive and snappy a machine feels, higher ST performance would be very nice.

Yeah, I've heard the same thing about the M series not being designed to scale up the clock. I can't remember where I'd heard that first but I remember thinking it was credible. The more so because we're not seeing Apple make any efforts to upclock the M1.

I don't think it's a power efficiency limitation. As you say, the desktops can handle the thermals. The chips could consume much more power and generate much more heat without raising the internal temperature, and it's the internal temperature that's really the concern.

If the M1 really was designed with a maximum clock, then there's probably something else limiting it. Could be any number of internal design decisions. It could also be a macro-architecture limitation-- maybe there's a limit imposed by interfaces between internal blocks.

We know the frequency doesn't need to be fixed, because the clock appears to scale down when it thermal throttles. It just doesn't seem to want to scale up beyond the nominal design spec.

I suppose it would make some amount of sense if the efficiency they're trying to protect isn't the efficiency in the overclock domain (as I said, they have that big copper heat sink in the Studio to handle that), but if leaving room to overclock makes the chip less efficient at it's optimal frequency. That would mean the mobile variants would suffer for the benefit of the desktop, which is a tradeoff they probably wouldn't see as worthwhile (sacrificing iPad battery life to allow faster desktops). I'm not sure I can think of a technical reason that would be true though...

theorist9 said:
There's a certain big-picture logic to the program that one needs to understand as a starting point, and that can be laid out in about an hour. Yet none of the documentation or online help provides this, leaving most new users feeling confused (like I was when I learned it). When I TA'd a group of stat mech grad students I offered them a 90 minute Mathematica training session in which I provided exactly that, and at the end they all had enough basic fluency to start using it for their next homework set.

For numerical work I'd imagine Python is as good as Mathematica and probably faster, but I suspect Mathematica is still the best, overall, for symbolic math. I've read that SymPy is more a substitute for MatLab (which, according to the CW, is stronger than Mathematica for numerics but weaker for symbolics) than Mathematica. [I haven't used either SymPy or MatLab, so this is just hearsay on my part .]

Yeah, that's probably true. Wolfram the man certainly has a particularly strong world view, so it's no surprise his products would... 😄

I got spoiled early by using MathCAD. It set my opinion of what I wanted as far as a mix of symbolics and numerics and graphics. It was intuitive to use, gave me typeset equations. It's been a long time since I've been able to use it though (no more Windows) so I may be romanticizing it...

Matlab is another one that felt like it was forcing me into thinking a certain way. It reaaalllly wants you to think of everything in life as a matrix, and gods help the soul who just wants a for loop. And the Fortran roots still show through in subtle ways that make it easy to mistranscribe code into C based languages.

Numpy is the Matlab-like library (numerical Python). Sympy is for symbolics. It's almost certainly not a match for Mathematica or Maple for symbolic manipulation, but it covers enough. Combined with the Jupyter notebook interface, it makes it reasonably easy to start from equations then convert to Python functions for numerical analysis. And with Python being trendy these days there's a plethora of easily accessible libraries to build on, without having to go through the cost and delays of buying toolboxes.

I don't recommend it for everyone. Matlab is deeply seated in the engineering world, and that's not going to change quickly. Mathematica is a great academic and theory tool. It would probably be better for what I'm trying to do in that it can almost certainly handle larger and more complex sets of equations than sympy and can probably also handle the transition to numerics just as seamlessly. But it's just that challenge of getting over the learning curve when I have something else I'm trying to accomplish that keeps forcing me to abandon it.

Xiao_Xi · Aug 2, 2022

What about Julia? It seems that libraries can be accelerated through Apple's GPU via Metal.jl as they do with Nvidia's GPUs via CUDA.jl.

Analog Kid · Aug 3, 2022

Xiao_Xi said:
What about Julia? It seems that libraries can be accelerated through Apple's GPU via Metal.jl as they do with Nvidia's GPUs via CUDA.jl.

It's on my list to look at, for sure. Symbolic support is still pretty immature, though, I think.

Search

Search

What microarchitecture choices have made M2 less efficient than M1?

Analog Kid

macrumors G3

theorist9

macrumors 601

theorist9

macrumors 601

altaic

macrumors 6502a

Analog Kid

macrumors G3

theorist9

macrumors 601

Analog Kid

macrumors G3

Xiao_Xi

macrumors 68000

Analog Kid

macrumors G3

Our Staff