[CPU only] Apple M1/2(Max/Ultra) (TSMC 5nm) vs AMD Zen 4 (TSMC 5nm) - Technical Analysis

Citizen45 · Sep 29, 2022

mr_roboto said:
I think it needs to be said explicitly for all the CB23 pushers: Cinebench uses hand-optimized x86 SIMD code. Instead of actually rewriting that for Arm, Cinebench's Arm port relies on a library which autotranslates every x86 SIMD instruction to an equivalent sequence of NEON SIMD instructions.

This is a very quick and dirty way to stand up a port with okay performance. It is extremely far from being a true native port that is well optimized for Arm. If the situation were reversed, you'd be screaming to high heavens that x86 CPUs were being treated unfairly in the comparison - and you'd be right!

Stop using CB23 for crossplatform comparisons between x86 and Apple Silicon. It's simply pointless. Unless you like trolling, I guess.

So who on the Cinebench team do we gotta talk to in order to get them to properly optimize Cinebench for Apple Silicon?

It’s in their best interest to take the time to do this, because then Cinebench could be used as the gold standard CPU benchmark for any system without any asterisks.

senttoschool · Sep 29, 2022

Xiao_Xi said:
It's crazy that the fight for the gaming crown has AMD and Intel pushing their CPUs past their sweet spot.

That's because the vast majority of people buying these CPUs are building gaming PCs.

Even people who buy these CPUs for productivity will game on the side.

Gerdi · Sep 29, 2022

theorist9 said:
Sorry, not following--those release notes go back to GB 5.0 and don't mention NEON. And the release notes for 5.1.0 don't mention NEON:

Geekbench 5.1 - Geekbench Blog

www.geekbench.com

In addition a Google search of the entire primatelabs.com website for NEON doesn't turn up anything (at least that I could see):

siterimatelabs.com neon

[Emoticon was not intentional!]

...other than this from 2013:
View attachment 2084109

Could you please provide a link?

Also, how much accleration does NEON provide compared to AVX512?

They probably have not mentioned it, because NEON is the only way to get crypto acceleration from ARM CPUs.
Not sure what the question is about NEON vs. AVX512? Are you asking about crypto acceleration in particular?

theorist9 · Sep 29, 2022

Gerdi said:
They probably have not mentioned it, because NEON is the only way to get crypto acceleration from ARM CPUs.
Not sure what the question is about NEON vs. AVX512? Are you asking about crypto acceleration in particular?

I was thinking specifically of crypto. If you look at the ratio of the crypto subscore to the sum of the integer and FP subscores, you'll see it's: x86/AVX512 > x86/no AVX512 > AS/NEON, suggesting NEON falls short compared to AVX512 for accelerating this task:

crypto/(integer+FP)

i9-12900K (no AVX512): 1.23
Ryzen 5950X (no AVX512): 1.25

i9-11900K (AVX512): 1.55
Ryzen 7950X (AVX512): 1.74

M1 (NEON): 0.78
M2 (NEON): 0.78

Given this, if it's the case that most consumer software doesn't benefit from AVX512 acceleration, it seems if you wanted a single GB-derived "figure of merit" for doing cross-platform comparisons that would match consumer usage, you might want to either exclude the crypto subscore (and use only the sum of the FP & intger subscores), or put very little weight on crypto.

I just did a bit of digging, and GB does the latter. So now I'll have to go back to the Tom's Hardware article and figure out why they think the crypto score had such an effect....

Sources:

M1:

M2:

i9-11900K:

i9-12900K:
View attachment 2084156

leman · Sep 30, 2022

Citizen45 said:
It’s in their best interest to take the time to do this, because then Cinebench could be used as the gold standard CPU benchmark for any system without any asterisks.

No it’s won’t. It’s a very specific workload with very specific demands. Using CB to quantify performance is like judging the speed of a car by how much time it takes to haul a ton of potatoes from A to B. That’s why modern benchmarks use a mix of different workloads with different characteristics.

Gerdi · Sep 30, 2022

theorist9 said:
I was thinking specifically of crypto. If you look at the ratio of the crypto subscore to the sum of the integer and FP subscores, you'll see it's: x86/AVX512 > x86/no AVX512 > AS/NEON, suggesting NEON falls short compared to AVX512 for accelerating this task.

You have to understand the crypto instructions. Crypto algorithms are largely serial and have a very limited instruction level parallelism. So once this instruction level parallelism is fully utilized then the only way to speed things up is increasing the clock frequency. And this is what you see here.
This also means, the conclusion that NEON falls short with respect to Crypto is just wrong. This would only be the case if the discrepancy in scores cannot be explained by the difference in clock frequency.

theorist9 · Sep 30, 2022

Gerdi said:
You have to understand the crypto instructions. Crypto algorithms are largely serial and have a very limited instruction level parallelism. So once this instruction level parallelism is fully utilized then the only way to speed things up is increasing the clock frequency. And this is what you see here.
This also means, the conclusion that NEON falls short with respect to Crypto is just wrong. This would only be the case if the discrepancy in scores cannot be explained by the difference in clock frequency.

You've presented your claims in a way I have a hard time following--it feels like you've omitted a key element.

First, note that I wasn't referring to the "discrepancy in scores", but rather the discrepancy in score ratios (crypto/(int+FP)), i.e., in the relative crypto performance. And clock speed is going to affect all three of those components. For instance, if clock speed matters for relative crypto performance, why do the M1 (3.2 GHz) and M2 (3.5 GHz) have the same ratios?

Second, even if clock speed did matter for relative cypto performance, the score ratio of the i9-11900K (AVX512) is 2x that of the M2 (NEON). Yet the i9's max clock speed is only 1.5x faster. And thus we can't explain the discrepancy in scores purely based on clock frequency.

mi7chy · Sep 30, 2022

A bit more 7950x 65W and 125W ECO mode data including temperatures. Starting to see a trend with asymmetric cores like Alder Lake and M1 Ultra doing worse on multi-threaded workload like Cinebench R23.

https://www.club386.com/amd-ryzen-9-7950x-vs-intel-core-i9-12900k-at-125w-and-65w/

Lastly, not all cooling are created equal. Here's one that runs the 7950x full turbo at <95C.

Forward to 10:35

kvic · Sep 30, 2022

Worth re-posting here from this thread..

Question

Are you comparing total package power* of the M1 Ultra to just the power draw** of the 7950X processor?

Answer

I think what you suggested is more than a fair comparison to Apple silicon.

Apple silicon is monolithic and superb efficient. When GPU/NPU/etc not used, they consume close to zero power. Ryzen 7950x due to its chiplet design, the I/O die eats into the power budget in a non-trivial way.

For example, you cap both (M1 Ultra and 7950x) at 65W power limit (not TDP but actual power). M1 Ultra could make almost full use of the 65W power budget for 20 CPU cores. On the other hand, at 65W power budget, the I/O die in 7950x will consume ~15W, leaving behind around 50W for the 16 Zen 4 cores.

Zen 4 is very efficient once off the ramp of the peak power limit (~230W). I won't be surprised if 7950x on par with/if not better than M1 Ultra at the similar power budget.. running Geekbench 5.

Footnotes

* "total package power" as in "CPU package power" reported on Apple silicon by Apple's Powermetric tool.

** "just the power draw" as in "CPU package power" reported from Zen 4's on-die SMU.

Gerdi · Sep 30, 2022

theorist9 said:
You've presented your claims in a way I have a hard time following--it feels like you've omitted a key element.

First, note that I wasn't referring to the "discrepancy in scores", but rather the discrepancy in score ratios (crypto/(int+FP)), i.e., in the relative crypto performance. And clock speed is going to affect all three of those components. For instance, if clock speed matters for relative crypto performance, why do the M1 (3.2 GHz) and M2 (3.5 GHz) have the same ratios?

Second, even if clock speed did matter for relative cypto performance, the score ratio of the i9-11900K (AVX512) is 2x that of the M2 (NEON). Yet the i9's max clock speed is only 1.5x faster. And thus we can't explain the discrepancy in scores purely based on clock frequency.

I do not believe, that i did omit something. The performance is essentially the product of instruction level parallelism metric (say IPC) and frequency. For INT+FP the ILP/IPC for Apple cores is much higher than for x64, hence a relatively high score despite low frequency. For Crypto IPC is limited, and hence probably very similar between Apple Silicon and x64, so the performance difference for crypto between x64 and Apple Silicon just reflects the frequency difference.
In summary IPC is higher for INT+FP for Apple Silicon but very similar for crypto compared to x64. From this you can conclude about the ratios in question.

Gerdi · Sep 30, 2022

kvic said:
Zen 4 is very efficient once off the ramp of the peak power limit (~230W). I won't be surprised if 7950x on par with/if not better than M1 Ultra at the similar power budget.. running Geekbench 5.

You do not have to be surprised, because we have numbers for this and the M1 Ultra is significantly ahead. In the bigger picture - Zen 4 might be efficient for an x64 core but it is miles behind when comparing to more modern architectures like ARM or RISC-V.

kvic · Sep 30, 2022

Gerdi said:
You do not have to be surprised, because we have numbers for this and the M1 Ultra is significantly ahead. In the bigger picture - Zen 4 might be efficient for an x64 core but it is miles behind when comparing to more modern architectures like ARM or RISC-V.

Show us your GB5 number for M1 Ultra and 7950x at similar power budget.

theorist9 · Sep 30, 2022

Gerdi said:
For Crypto IPC is limited, and hence probably very similar between Apple Silicon and x64, so the performance difference for crypto between x64 and Apple Silicon just reflects the frequency difference.

OK, let's use the above as a starting point:

M2 (NEON): ST Crypto = 3001; Max clock speed = 3.5 Ghz
i9-11900K (AVX512): ST Crypto = 5201; Max clock speed = 5.3 GHz

If NEON were as effective as Intel's previous-gen AVX512 implementation for crypto, you would expect the i9's performance to be only 5.3/3.5 –1 = 51% higher than the M2's (based on clock speed alone). But in fact it's 5201/3001 – 1 = 73% higher, indicating that Intel's previous-gen AVX512 implementation provides more benefit than NEON for this task.

Or if we consider the 7950X:
7950X (AVX512): ST Crypto = 7140; Max clock speed = 5.7 GHz

If NEON were as effective as AMD's current AVX512 implementation for crypto, you would expect the 7950X's performance to be only 5.7/3.5 –1 = 63% higher than the M2's (based on clock speed alone). But in fact it's 7140/3001 – 1 = 138% higher, indicating that AMD's current-gen AVX512 implementation provides more benefit than NEON for this task.

In summary, based on simple math, you can see the differences in crypto performance cannot be explained by clock speed alone.

theorist9 · Sep 30, 2022

Andrew Cunningham of ARS Technica provided some good data for the actual power consumption of the new Zen 4 processors. With Handbrake CPU encode, they found the package power consumption in "65W Eco Mode" was a consistent 90W for the 7600X and 7950X, as well as the 5600X Zen 3:

Ryzen 7600X and 7950X review: Zen 4 starts off expensive but impressive

Thermal behavior is a bit odd, but there’s lots to like about these new CPUs.

arstechnica.com

kvic · Sep 30, 2022

Let's try to avoid confuse yourselves, and the audience. Avoid using marketing terms such as ECO mode and blah, but quote the CPU package power or power budget in absolute sense instead.

For example,

mi7chy said:
A bit more 7950x 65W and 125W ECO mode data including temperatures. Starting to see a trend ...

what this review actually used is 65W, and 125W power budget which are very close to "CPU package power" at 65W and 125W respectively. It's not the "ECO mode" or TDP non-sense.

Gerdi · Sep 30, 2022

theorist9 said:
OK, let's use the above as a starting point:

M2 (NEON): ST Crypto = 3001; Max clock speed = 3.5 Ghz
i9-11900K (AVX512): ST Crypto = 5201; Max clock speed = 5.3 GHz

If NEON were as effective as Intel's previous-gen AVX512 implementation for crypto, you would expect the i9's performance to be only 5.3/3.5 –1 = 51% higher than the M2's (based on clock speed alone). But in fact it's 5201/3001 – 1 = 73% higher, indicating that Intel's previous-gen AVX512 implementation provides more benefit than NEON for this task.

Or if we consider the 7950X:
7950X (AVX512): ST Crypto = 7140; Max clock speed = 5.7 GHz

If NEON were as effective as AMD's current AVX512 implementation for crypto, you would expect the 7950X's performance to be only 5.7/3.5 –1 = 63% higher than the M2's (based on clock speed alone). But in fact it's 7140/3001 – 1 = 138% higher, indicating that AMD's current-gen AVX512 implementation provides more benefit than NEON for this task.

In summary, based on simple math, you can see the differences in crypto performance cannot be explained by clock speed alone.

You are right, it is not explainable. To be honest i was thinking more "crc-32" and "sha-256" but not aes - and thats what they are using here for crypto. AES does indeed feature a much higher parallelism than crc and sha and my statement does not hold anymore.

mi7chy · Sep 30, 2022

kvic said:
Let's try to avoid confuse yourselves, and the audience. Avoid using marketing terms such as ECO mode and blah, but quote the CPU package power or power budget in absolute sense instead.

For example,

what this review actually used is 65W, and 125W power budget which are very close to "CPU package power" at 65W and 125W respectively. It's not the "ECO mode" or TDP non-sense.

That's not accurate either. Power limits are set by inputting values for PPT, TDC and EDC. Someone mentioned taking hours to input a single temperature limit value so good luck. ECO mode is an AMD term for PPT/TDC/EDC presets that are below balls to the wall stock performance.

kvic · Sep 30, 2022

mi7chy said:
That's not accurate either. Power limits are set by inputting values for PPT, TDC and EDC. Someone mentioned taking hours to input a single temperature limit value so good luck. ECO mode is an AMD term for PPT/TDC/EDC presets that are below balls to the wall stock performance.

I'm not sure what you're arguing against my clarification. I simply borrowed a sentence from your post as an illustration. At the same time, helped you to highlight the review you linked to is actually using absolute power, not TDP. Hence, in that article, 65W is 65W; 125W is 125W. No 'if' or 'but' or other inflation required to adjust the power figures.

In general, for AMD & Intel processors, once power limit is set in UEFI, CPU package power tracks to that limit very closely.

EntropyQ3 · Oct 1, 2022

theorist9 said:
I disagree. It's nice to have a single coarse-grained value you can use to scan through numerous processors, just so long as you maintain an awareness it is coarse-granined. It's just lazy thinking, that doesn't maintain that awareness, that I think we want to avoid.

Besides, the only way to be serious about benchmarking is to analyze your own workflow, app-by-app, and develop your own benchmarks that correspond directly to your workflow, and who actually does that? I do*, but I know I'm in the minority, and I'm not going to designate those who don't as lazy.

The fundamental problem is that computer ”performance” simply isn’t a one dimensional property.

Even from a very coarse grained perspective perceived performance may hinge on core performance, number of cores, presence (and usage…) of dedicated accelerators, cache sizes, main memory bandwidth, RAM size, SSD speed….

And even when you limit yourself to a single one of those, typically core performance, there’s still a ton of caveats that to some extent is illuminated by for instance the variance in the subtest results in GeekBench when comparing different processors.

So if Processor A scores 10% higher than Processor B in GeekBench, what does it tell you about which computer will be faster?
Nothing. It’s just fodder for marketing and fruitless bickering on forums.

Now sometimes, in good company, that bickering can actually turn up some interesting tidbits! But those are invariably in the domain of what isn’t shown by the number itself.

As you point out yourself, the key is understanding the requirements of the actual usage, which requires knowledge. A benchmark score doesn’t help with that.

pshufd · Oct 1, 2022

EntropyQ3 said:
The fundamental problem is that computer ”performance” simply isn’t a one dimensional property.

Even from a very coarse grained perspective perceived performance may hinge on core performance, number of cores, presence (and usage…) of dedicated accelerators, cache sizes, main memory bandwidth, RAM size, SSD speed….

And even when you limit yourself to a single one of those, typically core performance, there’s still a ton of caveats that to some extent is illuminated by for instance the variance in the subtest results in GeekBench when comparing different processors.

So if Processor A scores 10% higher than Processor B in GeekBench, what does it tell you about which computer will be faster?
Nothing. It’s just fodder for marketing and fruitless bickering on forums.

Now sometimes, in good company, that bickering can actually turn up some interesting tidbits! But those are invariably in the domain of what isn’t shown by the number itself.

As you point out yourself, the key is understanding the requirements of the actual usage, which requires knowledge. A benchmark score doesn’t help with that.

On Reddit, and sometimes MacRumors, someone comes in and says that they have a 2010-2017 iMac, mini, MacBook Air, MacBook Pro and wondered what they should upgrade to with Apple Silicon and if a Mac Studio or 2021 MacBook Pro would be enough. I usually look up the Geekbench scores for their current model and then tell them what the scores of the Apple Silicon Mac is that they are looking at.

If the Geekbench 5 score is 2 to 3 times what it is on what they are using, then it will likely be far more than they need unless there are other circumstances. Giving them the comparison on Geekbench 5 generally assures them that the new stuff will be able to handle their old workload unless they have some particular software that doesn't run well on Apple Silicon or they need a ton of RAM or external displays.

theorist9 · Oct 1, 2022

EntropyQ3 said:
The fundamental problem is that computer ”performance” simply isn’t a one dimensional property.

Even from a very coarse grained perspective perceived performance may hinge on core performance, number of cores, presence (and usage…) of dedicated accelerators, cache sizes, main memory bandwidth, RAM size, SSD speed….

And even when you limit yourself to a single one of those, typically core performance, there’s still a ton of caveats that to some extent is illuminated by for instance the variance in the subtest results in GeekBench when comparing different processors.

So if Processor A scores 10% higher than Processor B in GeekBench, what does it tell you about which computer will be faster?
Nothing. It’s just fodder for marketing and fruitless bickering on forums.

Now sometimes, in good company, that bickering can actually turn up some interesting tidbits! But those are invariably in the domain of what isn’t shown by the number itself.

As you point out yourself, the key is understanding the requirements of the actual usage, which requires knowledge. A benchmark score doesn’t help with that.

The US Federal Government requires all cars list their EPA City, Highway, and combined MPG (miles per gallon) figures. It uses this to help consumers gauge operating costs, to promote the purcase of more efficient vehicles, and in setting federal efficiency requirements that all vehicle makers must meet.

Of course, none of these will tell an individual consumer what their actual fuel mileage will be, since everyone has different driving habits. Thus your position is that these should be tossed out the window. My argument, by contrast, is that these should be made available to consumers, with the caveat that "YMMV", which is what is actually done.

I.e., you're saying that, because they're not perfect, they should be discarded. I'm saying that the perfect is the enemy of the good.

falainber · Oct 2, 2022

Gerdi said:
You do not have to be surprised, because we have numbers for this and the M1 Ultra is significantly ahead. In the bigger picture - Zen 4 might be efficient for an x64 core but it is miles behind when comparing to more modern architectures like ARM or RISC-V.

There is nothing modern about ARM. It started in 1983.

leman · Oct 2, 2022

falainber said:
There is nothing modern about ARM. It started in 1983.

Aarch64 is a clean slate design released in October 2011. It's as modern as it gets.

Gerdi · Oct 2, 2022

falainber said:
There is nothing modern about ARM. It started in 1983.

I should have been more precise - I am talking about Aarch64, as Leman is correctly pointing out.

EntropyQ3 · Oct 2, 2022

theorist9 said:
I.e., you're saying that, because they're not perfect, they should be discarded. I'm saying that the perfect is the enemy of the good.

Not really. Testing is well and good, and necessary to keep manufacturers at least borderline honest! SPEC in its early days ( and still to some extent) improved the industry significantly in that respect. It was a wild, wild West. 😉

But that benchmark was really meant as a tool for computer professionals. I have quite a bit of respect for John Poole, but I still feel that the aggregate score was a mistake. It’s not a flaw of the benchmark per se, but given how it is used it’s unfortunate. I won’t budge from that opinion, though it may be regarded as elitist, because I would really prefer that people were encouraged to understand their computing needs as opposed to comparing a number.

Put me on the side of education vs. tribalism. I hope we can see eye to eye on that.

[CPU only] Apple M1/2(Max/Ultra) (TSMC 5nm) vs AMD Zen 4 (TSMC 5nm) - Technical Analysis

macrumors member

macrumors 68030

macrumors 6502

macrumors 601

Attachments

macrumors Core

macrumors 6502

macrumors 601

Suspended

macrumors 6502a

macrumors 6502

macrumors 6502

macrumors 6502a

macrumors 601

macrumors 601

macrumors 6502a

macrumors 6502

Suspended

macrumors 6502a

macrumors 6502a

macrumors G4

macrumors 601

macrumors 68040

macrumors Core

macrumors 6502

macrumors 6502a

Our Staff