Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
By the way, using this as a projection: this R23 score represents a ~45% reduction in performance agains the full i9 12900K. If we apply the same to the SPEC scores, we are in the ballpark of 44 (int) and 45 (fp), which would be 20% and 45% slower than M1 Pro, respectively.

I think the final binned mobile i9 at 45W will be more or less as fast as M1 Pro in SPECint (maybe 5-10% slower) and around 30-40% slower in SPECfp. But it will have a slight edge in short burst workloads, and of course, Cinebench.

The SPEC memory pressure subtests should be somewhat more insensitive to watts than the compute. So if the proper compute scaling is indeed 45%, the overall SPEC int/fp scores should be reduced by less. Having said that I have my doubts that the proper compute MT scaling to a mere 35W is only 45% for an i9 ADL. Based on other numbers I’m seeing for the desktop i9 ADL that doesn’t make sense to me.

Edit: Ah I see @Kpjoslee already made the above point about SPEC memory pressure tests
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
Cinebench is popular and it’s a decent tool to test throttling under high load. But it was never a particularly good benchmark, especially given its limited scope. But it was only last year that it became apparent how „not good“ it is really, when people looked into CPU utilization.

On x86, utilization doesn’t seem to be a problem for CB23. Anandtech updated their article and posted new MT results including power usage in different scenarios. One thing Andrei noted: ADL i9 only uses 200+W in specific workloads like CB23. Otherwise it’s mostly in the 160W range when going full tilt. Part of the difference for some of these workloads? AVX2.


Based on this: my hypothesis is that it’s employing AVX2 vector instructions in x86 and isn’t using the Neon vector instructions in arm64. So the firestorm cores aren’t being pushed like they should be.
 
  • Like
Reactions: Stratus Fear

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,678
On x86, utilization doesn’t seem to be a problem for CB23. Anandtech updated their article and posted new MT results including power usage in different scenarios. One thing Andrei noted: ADL i9 only uses 200+W in specific workloads like CB23. Otherwise it’s mostly in the 160W range when going full tilt. Part of the difference for some of these workloads? AVX2.

It was posted on Anandtech forums that Zen3 also shows surprisingly low power consumption during R23 runs… and of course, Cinebench scales extremely well from SMT. This leans me to believe that it relies on long dependent chains of computation.
Based on this: my hypothesis is that it’s employing AVX2 vector instructions in x86 and isn’t using the Neon vector instructions in arm64. So the firestorm cores aren’t being pushed like they should be.

If Cinebench is not using Neon, it would be an extremely unfair benchmark for M1. I am sure it does use SIMD on ARM, maybe not very efficiently, but it has to. There is no way M1 can match AVX2-enabled workload otherwise.

P.S. I just checked, R23 uses Intel Embree which supports Neon. But it’s likely not fully optimized.

P.P.S. Looked some more and it seems Embry is using SSE2Neon library under the hood, basically it’s coded using Intel SIMD with an auxiliary library that implements Intel SIMD intrinsics on top of Neon. So there is definitely going to be some performance loss on M1. Frankly, this benchmark should not be used for M1 comparisons anymore.
 
Last edited:

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
It was posted on Anandtech forums that Zen3 also shows surprisingly low power consumption during R23 runs… and of course, Cinebench scales extremely well from SMT. This leans me to believe that it relies on long dependent chains of computation.


If Cinebench is not using Neon, it would be an extremely unfair benchmark for M1. I am sure it does use SIMD on ARM, maybe not very efficiently, but it has to. There is no way M1 can match AVX2-enabled workload otherwise.

Hmmm interesting. I’d not seen that about Zen 3 … maybe only optimized for Intel then? Because on Intel it can go flat out. It’s one of the highest power workloads.

However that Zen 3 comment stands in stark contrast to the same reviewer @Kpjoslee references who said that i9 ADL’s efficiency at 35W matches the M1 Max. Now it looks like his 5950x might be overclocked because its score is higher than the new ADL i9 whereas on Anandtech it is lower. But in his test it’s pulling 240+W. Also his 5800 is definitely pulling near its peak.

It’s a pity Andrei’s Zen3 mobile test machine died before his M1 Max article we might’ve gotten an answer as Anandtech didn’t explicitly measure wattage for CB23.
 
Last edited:

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
P.S. I just checked, R23 uses Intel Embree which supports Neon. But it’s likely not fully optimized.

P.P.S. Looked some more and it seems Embry is using SSE2Neon library under the hood, basically it’s coded using Intel SIMD with an auxiliary library that implements Intel SIMD intrinsics on top of Neon. So there is definitely going to be some performance loss on M1. Frankly, this benchmark should not be used for M1 comparisons anymore.

Got it. You know when I saw those power figures for Intel CB23 and Andrei’s comment, I figured that it had to be something like that, that the vector instructions were either missing or very suboptimal. Makes sense now. There may be other things wrong with it of course. But that’s probably a big part of it.

How does Intel embree perform on Zen3? Do you know?
 

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,678
Hmmm interesting. I’d not seen that about Zen 3 … maybe only optimized for Intel then? Because on Intel it can go flat out. It’s one of the highest power workloads.

There were folks who claimed that single-threaded CB only pulls 9 watts on mobile AMD. Regardless, I don’t see much contradiction here. Of course power draw will increase dramatically as you bump the clocks and use SMT.

How does Intel embree perform on Zen3? Do you know?

Well, Cinebench seems to perform amazingly well on Zen3, so there shouldn’t be a problem here. There appear to be has major gains from AVX2 and SMT. The M1 path with emulated intrinsics likely uses SSE path which has a lower branching factor if I understood it correctly, so the x86 and ARM paths are probably not even running the same workload.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
There were folks who claimed that single-threaded CB only pulls 9 watts on mobile AMD. Regardless, I don’t see much contradiction here. Of course power draw will increase dramatically as you bump the clocks and use SMT.

Yeah given that it’s clearly OC’d the 5950x didn’t set off any bells but his 5800 also looked like running at peak wattage and looked like a normal 5800, but maybe not. SMT generally doesn’t increase power if the core is already on full. Of course if it isn’t (which is the claim), then yeah SMT would make it so and increase power as the 2nd thread increases the core resources used …

[Edit: yeah this might explain why the Zen 3 cores are so far behind in CB23 ST wrt ADL i9 but catch up in MT - though that could also be because of how good AMD’s multicore design still is relative to Intel. This would mean CB23 is optimized for Intel first, maybe somewhat for AMD, and not so much for the M1 - not great for a “multi-platform” benchmark]

Anyway regardless of the Zen 3 issue, I think you’ve gotten to the bottom of the M1 issue pretty well.

Well, Cinebench seems to perform amazingly well on Zen3, so there shouldn’t be a problem here. There appear to be has major gains from AVX2 and SMT. The M1 path with emulated intrinsics likely uses SSE path which has a lower branching factor if I understood it correctly, so the x86 and ARM paths are probably not even running the same workload.

Huh that’s almost even worse. Source code is available? Where are you seeing this?
 
Last edited:
  • Like
Reactions: Stratus Fear

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,665
OBX
Intel is pulling compiler shens again?!

40f.png
 

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,678
Huh that’s almost even worse. Source code is available? Where are you seeing this?

I'm just thinking out loud! Not accusing anyone of anything and not claiming any foul play :D It's just that some embree papers I've seen suggest that branching factors are different on SSE and AVX implementations but I have never used embree and I have no idea whether this is still the case.

But regardless of all this, yes, there is good reason to think that Cinebench is not a good test for M1. It relies on a SSE-to-Neon translation layer instead of ARM SIMD directly, it uses the older SSE codepath and it probably is not written with Apple's four 128-bit SIMD units in mind (x86 CPUs usually have two SIMD units). Especially the latter part could explain why we see underutilization on Apple's hardware.
 
Last edited:
  • Like
Reactions: psychicist

JMacHack

Suspended
Mar 16, 2017
1,965
2,424
This conversation got balls deep into bench discussion, I’m gonna wait until people have alder lake laptops to make any opinions. I think this is the best course of action.
 

ADGrant

macrumors 68000
Mar 26, 2018
1,689
1,059
This conversation got balls deep into bench discussion, I’m gonna wait until people have alder lake laptops to make any opinions. I think this is the best course of action.
Yes, particularly as no mobile Alder Lake CPUs have even been announced yet.
 

zarathu

macrumors 6502a
May 14, 2003
652
362
How will the conversation go when Apple puts out a MacPro with the M1 mid next year?
 

pearvsapple

macrumors 6502
Feb 1, 2012
421
180
12900k set to 30w to match M1 Max and it was still able to beat it despite using an inferior process node. Alder Lake is clearly superior, and Meteor Lake will completely destroy everything in the market. Chipzilla is fully woke.

 
  • Haha
Reactions: Romain_H

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,678
12900k set to 30w to match M1 Max and it was still able to beat it despite using an inferior process node. Alder Lake is clearly superior, and Meteor Lake will completely destroy everything in the market. Chipzilla is fully woke.

Does that make you happy? ;) Sure, according to this video, 12900k at 35W CPU power (45W package power) outperforms M1 Pro at 34W package power (which includes GPU and RAM!!!) in a software suite that is known to not run well on M1 architecture. In the meantime the M1 Pro/Max performs as well as the unlocked i9-12900k on subsets of the industry-standard SPEC benchmark suite.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
It relies on a SSE-to-Neon translation layer instead of ARM SIMD directly, it uses the older SSE codepath and it probably is not written with Apple's four 128-bit SIMD units in mind (x86 CPUs usually have two SIMD units).

How do you know this? I’m having trouble finding info.
 

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,678
How do you know this? I’m having trouble finding info.


There doesn't seem to be a direct Neon implementation. Embree relies on https://github.com/embree/embree/blob/master/common/simd/arm/emulation.h to implement x86 SIMD intrinsics. And, https://github.com/embree/embree/blob/master/common/simd/arm/sse2neon.h it used does not support AVX, so Embree running on M1 will be limited to the SSE codepath.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229

There doesn't seem to be a direct Neon implementation. Embree relies on https://github.com/embree/embree/blob/master/common/simd/arm/emulation.h to implement x86 SIMD intrinsics. And, https://github.com/embree/embree/blob/master/common/simd/arm/sse2neon.h it used does not support AVX, so Embree running on M1 will be limited to the SSE codepath.

I was looking for CB23 source code! ? Thanks. :)
 

Rigby

macrumors 603
Aug 5, 2008
6,257
10,215
San Jose, CA
P.P.S. Looked some more and it seems Embry is using SSE2Neon library under the hood, basically it’s coded using Intel SIMD with an auxiliary library that implements Intel SIMD intrinsics on top of Neon. So there is definitely going to be some performance loss on M1. Frankly, this benchmark should not be used for M1 comparisons anymore.
Maybe. But it could also be an intrinsic weakness of the M1 SIMD instructions compared to AVX2. For example, Neon is only 128-bit, while AVX2 is 256.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.