Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Citizen45

macrumors member
Apr 9, 2022
49
48
I think it needs to be said explicitly for all the CB23 pushers: Cinebench uses hand-optimized x86 SIMD code. Instead of actually rewriting that for Arm, Cinebench's Arm port relies on a library which autotranslates every x86 SIMD instruction to an equivalent sequence of NEON SIMD instructions.

This is a very quick and dirty way to stand up a port with okay performance. It is extremely far from being a true native port that is well optimized for Arm. If the situation were reversed, you'd be screaming to high heavens that x86 CPUs were being treated unfairly in the comparison - and you'd be right!

Stop using CB23 for crossplatform comparisons between x86 and Apple Silicon. It's simply pointless. Unless you like trolling, I guess.

So who on the Cinebench team do we gotta talk to in order to get them to properly optimize Cinebench for Apple Silicon?

It’s in their best interest to take the time to do this, because then Cinebench could be used as the gold standard CPU benchmark for any system without any asterisks.
 

senttoschool

macrumors 68030
Nov 2, 2017
2,626
5,482
It's crazy that the fight for the gaming crown has AMD and Intel pushing their CPUs past their sweet spot.
That's because the vast majority of people buying these CPUs are building gaming PCs.

Even people who buy these CPUs for productivity will game on the side.
 

Gerdi

macrumors 6502
Apr 25, 2020
449
301
Sorry, not following--those release notes go back to GB 5.0 and don't mention NEON. And the release notes for 5.1.0 don't mention NEON:


In addition a Google search of the entire primatelabs.com website for NEON doesn't turn up anything (at least that I could see):

site:primatelabs.com neon

[Emoticon was not intentional!]

...other than this from 2013:
View attachment 2084109

Could you please provide a link?

Also, how much accleration does NEON provide compared to AVX512?

They probably have not mentioned it, because NEON is the only way to get crypto acceleration from ARM CPUs.
Not sure what the question is about NEON vs. AVX512? Are you asking about crypto acceleration in particular?
 

theorist9

macrumors 68040
May 28, 2015
3,880
3,059
They probably have not mentioned it, because NEON is the only way to get crypto acceleration from ARM CPUs.
Not sure what the question is about NEON vs. AVX512? Are you asking about crypto acceleration in particular?
I was thinking specifically of crypto. If you look at the ratio of the crypto subscore to the sum of the integer and FP subscores, you'll see it's: x86/AVX512 > x86/no AVX512 > AS/NEON, suggesting NEON falls short compared to AVX512 for accelerating this task:

crypto/(integer+FP)

i9-12900K (no AVX512): 1.23
Ryzen 5950X (no AVX512): 1.25

i9-11900K (AVX512): 1.55
Ryzen 7950X (AVX512): 1.74

M1 (NEON): 0.78
M2 (NEON): 0.78


Given this, if it's the case that most consumer software doesn't benefit from AVX512 acceleration, it seems if you wanted a single GB-derived "figure of merit" for doing cross-platform comparisons that would match consumer usage, you might want to either exclude the crypto subscore (and use only the sum of the FP & intger subscores), or put very little weight on crypto.

I just did a bit of digging, and GB does the latter. So now I'll have to go back to the Tom's Hardware article and figure out why they think the crypto score had such an effect....


1664524562230.png



Sources:
1664504636134.png

M1:
1664504878068.png

M2:
1664505265326.png

i9-11900K:
1664506015152.png

i9-12900K:
View attachment 2084156
 

Attachments

  • 1664505437028.png
    1664505437028.png
    43.9 KB · Views: 57
Last edited:

leman

macrumors Core
Oct 14, 2008
19,521
19,673
It’s in their best interest to take the time to do this, because then Cinebench could be used as the gold standard CPU benchmark for any system without any asterisks.

No it’s won’t. It’s a very specific workload with very specific demands. Using CB to quantify performance is like judging the speed of a car by how much time it takes to haul a ton of potatoes from A to B. That’s why modern benchmarks use a mix of different workloads with different characteristics.
 

Gerdi

macrumors 6502
Apr 25, 2020
449
301
I was thinking specifically of crypto. If you look at the ratio of the crypto subscore to the sum of the integer and FP subscores, you'll see it's: x86/AVX512 > x86/no AVX512 > AS/NEON, suggesting NEON falls short compared to AVX512 for accelerating this task.

You have to understand the crypto instructions. Crypto algorithms are largely serial and have a very limited instruction level parallelism. So once this instruction level parallelism is fully utilized then the only way to speed things up is increasing the clock frequency. And this is what you see here.
This also means, the conclusion that NEON falls short with respect to Crypto is just wrong. This would only be the case if the discrepancy in scores cannot be explained by the difference in clock frequency.
 
Last edited:

theorist9

macrumors 68040
May 28, 2015
3,880
3,059
You have to understand the crypto instructions. Crypto algorithms are largely serial and have a very limited instruction level parallelism. So once this instruction level parallelism is fully utilized then the only way to speed things up is increasing the clock frequency. And this is what you see here.
This also means, the conclusion that NEON falls short with respect to Crypto is just wrong. This would only be the case if the discrepancy in scores cannot be explained by the difference in clock frequency.
You've presented your claims in a way I have a hard time following--it feels like you've omitted a key element.

First, note that I wasn't referring to the "discrepancy in scores", but rather the discrepancy in score ratios (crypto/(int+FP)), i.e., in the relative crypto performance. And clock speed is going to affect all three of those components. For instance, if clock speed matters for relative crypto performance, why do the M1 (3.2 GHz) and M2 (3.5 GHz) have the same ratios?

Second, even if clock speed did matter for relative cypto performance, the score ratio of the i9-11900K (AVX512) is 2x that of the M2 (NEON). Yet the i9's max clock speed is only 1.5x faster. And thus we can't explain the discrepancy in scores purely based on clock frequency.
 
Last edited:

kvic

macrumors 6502a
Sep 10, 2015
516
460
Worth re-posting here from this thread..

Question

Are you comparing total package power* of the M1 Ultra to just the power draw** of the 7950X processor?

Answer

I think what you suggested is more than a fair comparison to Apple silicon.

Apple silicon is monolithic and superb efficient. When GPU/NPU/etc not used, they consume close to zero power. Ryzen 7950x due to its chiplet design, the I/O die eats into the power budget in a non-trivial way.

For example, you cap both (M1 Ultra and 7950x) at 65W power limit (not TDP but actual power). M1 Ultra could make almost full use of the 65W power budget for 20 CPU cores. On the other hand, at 65W power budget, the I/O die in 7950x will consume ~15W, leaving behind around 50W for the 16 Zen 4 cores.

Zen 4 is very efficient once off the ramp of the peak power limit (~230W). I won't be surprised if 7950x on par with/if not better than M1 Ultra at the similar power budget.. running Geekbench 5.

Footnotes

* "total package power" as in "CPU package power" reported on Apple silicon by Apple's Powermetric tool.

** "just the power draw" as in "CPU package power" reported from Zen 4's on-die SMU.
 

Gerdi

macrumors 6502
Apr 25, 2020
449
301
You've presented your claims in a way I have a hard time following--it feels like you've omitted a key element.

First, note that I wasn't referring to the "discrepancy in scores", but rather the discrepancy in score ratios (crypto/(int+FP)), i.e., in the relative crypto performance. And clock speed is going to affect all three of those components. For instance, if clock speed matters for relative crypto performance, why do the M1 (3.2 GHz) and M2 (3.5 GHz) have the same ratios?

Second, even if clock speed did matter for relative cypto performance, the score ratio of the i9-11900K (AVX512) is 2x that of the M2 (NEON). Yet the i9's max clock speed is only 1.5x faster. And thus we can't explain the discrepancy in scores purely based on clock frequency.

I do not believe, that i did omit something. The performance is essentially the product of instruction level parallelism metric (say IPC) and frequency. For INT+FP the ILP/IPC for Apple cores is much higher than for x64, hence a relatively high score despite low frequency. For Crypto IPC is limited, and hence probably very similar between Apple Silicon and x64, so the performance difference for crypto between x64 and Apple Silicon just reflects the frequency difference.
In summary IPC is higher for INT+FP for Apple Silicon but very similar for crypto compared to x64. From this you can conclude about the ratios in question.
 
  • Like
Reactions: ArkSingularity

Gerdi

macrumors 6502
Apr 25, 2020
449
301
Zen 4 is very efficient once off the ramp of the peak power limit (~230W). I won't be surprised if 7950x on par with/if not better than M1 Ultra at the similar power budget.. running Geekbench 5.

You do not have to be surprised, because we have numbers for this and the M1 Ultra is significantly ahead. In the bigger picture - Zen 4 might be efficient for an x64 core but it is miles behind when comparing to more modern architectures like ARM or RISC-V.
 

kvic

macrumors 6502a
Sep 10, 2015
516
460
You do not have to be surprised, because we have numbers for this and the M1 Ultra is significantly ahead. In the bigger picture - Zen 4 might be efficient for an x64 core but it is miles behind when comparing to more modern architectures like ARM or RISC-V.

Show us your GB5 number for M1 Ultra and 7950x at similar power budget.
 

theorist9

macrumors 68040
May 28, 2015
3,880
3,059
For Crypto IPC is limited, and hence probably very similar between Apple Silicon and x64, so the performance difference for crypto between x64 and Apple Silicon just reflects the frequency difference.
OK, let's use the above as a starting point:

M2 (NEON): ST Crypto = 3001; Max clock speed = 3.5 Ghz
i9-11900K (AVX512): ST Crypto = 5201; Max clock speed = 5.3 GHz

If NEON were as effective as Intel's previous-gen AVX512 implementation for crypto, you would expect the i9's performance to be only 5.3/3.5 –1 = 51% higher than the M2's (based on clock speed alone). But in fact it's 5201/3001 – 1 = 73% higher, indicating that Intel's previous-gen AVX512 implementation provides more benefit than NEON for this task.

Or if we consider the 7950X:
7950X (AVX512): ST Crypto = 7140; Max clock speed = 5.7 GHz

If NEON were as effective as AMD's current AVX512 implementation for crypto, you would expect the 7950X's performance to be only 5.7/3.5 –1 = 63% higher than the M2's (based on clock speed alone). But in fact it's 7140/3001 – 1 = 138% higher, indicating that AMD's current-gen AVX512 implementation provides more benefit than NEON for this task.

In summary, based on simple math, you can see the differences in crypto performance cannot be explained by clock speed alone.
 
Last edited:

theorist9

macrumors 68040
May 28, 2015
3,880
3,059
Andrew Cunningham of ARS Technica provided some good data for the actual power consumption of the new Zen 4 processors. With Handbrake CPU encode, they found the package power consumption in "65W Eco Mode" was a consistent 90W for the 7600X and 7950X, as well as the 5600X Zen 3:

1664590087125.png

 
Last edited:

kvic

macrumors 6502a
Sep 10, 2015
516
460
Let's try to avoid confuse yourselves, and the audience. Avoid using marketing terms such as ECO mode and blah, but quote the CPU package power or power budget in absolute sense instead.

For example,
A bit more 7950x 65W and 125W ECO mode data including temperatures. Starting to see a trend ...
what this review actually used is 65W, and 125W power budget which are very close to "CPU package power" at 65W and 125W respectively. It's not the "ECO mode" or TDP non-sense.
 

Gerdi

macrumors 6502
Apr 25, 2020
449
301
OK, let's use the above as a starting point:

M2 (NEON): ST Crypto = 3001; Max clock speed = 3.5 Ghz
i9-11900K (AVX512): ST Crypto = 5201; Max clock speed = 5.3 GHz

If NEON were as effective as Intel's previous-gen AVX512 implementation for crypto, you would expect the i9's performance to be only 5.3/3.5 –1 = 51% higher than the M2's (based on clock speed alone). But in fact it's 5201/3001 – 1 = 73% higher, indicating that Intel's previous-gen AVX512 implementation provides more benefit than NEON for this task.

Or if we consider the 7950X:
7950X (AVX512): ST Crypto = 7140; Max clock speed = 5.7 GHz

If NEON were as effective as AMD's current AVX512 implementation for crypto, you would expect the 7950X's performance to be only 5.7/3.5 –1 = 63% higher than the M2's (based on clock speed alone). But in fact it's 7140/3001 – 1 = 138% higher, indicating that AMD's current-gen AVX512 implementation provides more benefit than NEON for this task.

In summary, based on simple math, you can see the differences in crypto performance cannot be explained by clock speed alone.

You are right, it is not explainable. To be honest i was thinking more "crc-32" and "sha-256" but not aes - and thats what they are using here for crypto. AES does indeed feature a much higher parallelism than crc and sha and my statement does not hold anymore.
 
  • Like
Reactions: theorist9

mi7chy

macrumors G4
Oct 24, 2014
10,620
11,294
Let's try to avoid confuse yourselves, and the audience. Avoid using marketing terms such as ECO mode and blah, but quote the CPU package power or power budget in absolute sense instead.

For example,

what this review actually used is 65W, and 125W power budget which are very close to "CPU package power" at 65W and 125W respectively. It's not the "ECO mode" or TDP non-sense.

That's not accurate either. Power limits are set by inputting values for PPT, TDC and EDC. Someone mentioned taking hours to input a single temperature limit value so good luck. ECO mode is an AMD term for PPT/TDC/EDC presets that are below balls to the wall stock performance.
 

kvic

macrumors 6502a
Sep 10, 2015
516
460
That's not accurate either. Power limits are set by inputting values for PPT, TDC and EDC. Someone mentioned taking hours to input a single temperature limit value so good luck. ECO mode is an AMD term for PPT/TDC/EDC presets that are below balls to the wall stock performance.

I'm not sure what you're arguing against my clarification. I simply borrowed a sentence from your post as an illustration. At the same time, helped you to highlight the review you linked to is actually using absolute power, not TDP. Hence, in that article, 65W is 65W; 125W is 125W. No 'if' or 'but' or other inflation required to adjust the power figures.

In general, for AMD & Intel processors, once power limit is set in UEFI, CPU package power tracks to that limit very closely.
 

EntropyQ3

macrumors 6502a
Mar 20, 2009
718
824
I disagree. It's nice to have a single coarse-grained value you can use to scan through numerous processors, just so long as you maintain an awareness it is coarse-granined. It's just lazy thinking, that doesn't maintain that awareness, that I think we want to avoid.

Besides, the only way to be serious about benchmarking is to analyze your own workflow, app-by-app, and develop your own benchmarks that correspond directly to your workflow, and who actually does that? I do*, but I know I'm in the minority, and I'm not going to designate those who don't as lazy.
The fundamental problem is that computer ”performance” simply isn’t a one dimensional property.

Even from a very coarse grained perspective perceived performance may hinge on core performance, number of cores, presence (and usage…) of dedicated accelerators, cache sizes, main memory bandwidth, RAM size, SSD speed….

And even when you limit yourself to a single one of those, typically core performance, there’s still a ton of caveats that to some extent is illuminated by for instance the variance in the subtest results in GeekBench when comparing different processors.

So if Processor A scores 10% higher than Processor B in GeekBench, what does it tell you about which computer will be faster?
Nothing. It’s just fodder for marketing and fruitless bickering on forums.

Now sometimes, in good company, that bickering can actually turn up some interesting tidbits! But those are invariably in the domain of what isn’t shown by the number itself.

As you point out yourself, the key is understanding the requirements of the actual usage, which requires knowledge. A benchmark score doesn’t help with that.
 

pshufd

macrumors G4
Oct 24, 2013
10,146
14,572
New Hampshire
The fundamental problem is that computer ”performance” simply isn’t a one dimensional property.

Even from a very coarse grained perspective perceived performance may hinge on core performance, number of cores, presence (and usage…) of dedicated accelerators, cache sizes, main memory bandwidth, RAM size, SSD speed….

And even when you limit yourself to a single one of those, typically core performance, there’s still a ton of caveats that to some extent is illuminated by for instance the variance in the subtest results in GeekBench when comparing different processors.

So if Processor A scores 10% higher than Processor B in GeekBench, what does it tell you about which computer will be faster?
Nothing. It’s just fodder for marketing and fruitless bickering on forums.

Now sometimes, in good company, that bickering can actually turn up some interesting tidbits! But those are invariably in the domain of what isn’t shown by the number itself.

As you point out yourself, the key is understanding the requirements of the actual usage, which requires knowledge. A benchmark score doesn’t help with that.

On Reddit, and sometimes MacRumors, someone comes in and says that they have a 2010-2017 iMac, mini, MacBook Air, MacBook Pro and wondered what they should upgrade to with Apple Silicon and if a Mac Studio or 2021 MacBook Pro would be enough. I usually look up the Geekbench scores for their current model and then tell them what the scores of the Apple Silicon Mac is that they are looking at.

If the Geekbench 5 score is 2 to 3 times what it is on what they are using, then it will likely be far more than they need unless there are other circumstances. Giving them the comparison on Geekbench 5 generally assures them that the new stuff will be able to handle their old workload unless they have some particular software that doesn't run well on Apple Silicon or they need a ton of RAM or external displays.
 

theorist9

macrumors 68040
May 28, 2015
3,880
3,059
The fundamental problem is that computer ”performance” simply isn’t a one dimensional property.

Even from a very coarse grained perspective perceived performance may hinge on core performance, number of cores, presence (and usage…) of dedicated accelerators, cache sizes, main memory bandwidth, RAM size, SSD speed….

And even when you limit yourself to a single one of those, typically core performance, there’s still a ton of caveats that to some extent is illuminated by for instance the variance in the subtest results in GeekBench when comparing different processors.

So if Processor A scores 10% higher than Processor B in GeekBench, what does it tell you about which computer will be faster?
Nothing. It’s just fodder for marketing and fruitless bickering on forums.

Now sometimes, in good company, that bickering can actually turn up some interesting tidbits! But those are invariably in the domain of what isn’t shown by the number itself.

As you point out yourself, the key is understanding the requirements of the actual usage, which requires knowledge. A benchmark score doesn’t help with that.
The US Federal Government requires all cars list their EPA City, Highway, and combined MPG (miles per gallon) figures. It uses this to help consumers gauge operating costs, to promote the purcase of more efficient vehicles, and in setting federal efficiency requirements that all vehicle makers must meet.

Of course, none of these will tell an individual consumer what their actual fuel mileage will be, since everyone has different driving habits. Thus your position is that these should be tossed out the window. My argument, by contrast, is that these should be made available to consumers, with the caveat that "YMMV", which is what is actually done.

I.e., you're saying that, because they're not perfect, they should be discarded. I'm saying that the perfect is the enemy of the good.
 
  • Like
Reactions: pshufd

falainber

macrumors 68040
Mar 16, 2016
3,539
4,136
Wild West
You do not have to be surprised, because we have numbers for this and the M1 Ultra is significantly ahead. In the bigger picture - Zen 4 might be efficient for an x64 core but it is miles behind when comparing to more modern architectures like ARM or RISC-V.
There is nothing modern about ARM. It started in 1983.
 

EntropyQ3

macrumors 6502a
Mar 20, 2009
718
824
I.e., you're saying that, because they're not perfect, they should be discarded. I'm saying that the perfect is the enemy of the good.
Not really. Testing is well and good, and necessary to keep manufacturers at least borderline honest! SPEC in its early days ( and still to some extent) improved the industry significantly in that respect. It was a wild, wild West. 😉

But that benchmark was really meant as a tool for computer professionals. I have quite a bit of respect for John Poole, but I still feel that the aggregate score was a mistake. It’s not a flaw of the benchmark per se, but given how it is used it’s unfortunate. I won’t budge from that opinion, though it may be regarded as elitist, because I would really prefer that people were encouraged to understand their computing needs as opposed to comparing a number.

Put me on the side of education vs. tribalism. I hope we can see eye to eye on that.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.