M1 single core not so fast at it seems in benchmarks vs X86

cocoua · Nov 28, 2021

I’ve just saw this, and seems legit.

Exclusive: Why Apple M1 Single "Core" Comparisons Are Fundamentally Flawed (With Benchmarks)

I have something pretty exciting for our readers today; something that almost everyone appears to have missed in the clamor for Apple M1 benchmark comparisons. What if I told you that pretty much all of the single-core benchmark comparisons between the Apple M1 and modern x86 processors you see...

wccftech.com

It says using both threads on 1core in X86 would beat M1 in single core.

What do think?

cmaier · Nov 28, 2021

cocoua said:
I’ve just saw this, and seems legit.

Exclusive: Why Apple M1 Single "Core" Comparisons Are Fundamentally Flawed (With Benchmarks)

I have something pretty exciting for our readers today; something that almost everyone appears to have missed in the clamor for Apple M1 benchmark comparisons. What if I told you that pretty much all of the single-core benchmark comparisons between the Apple M1 and modern x86 processors you see...

wccftech.com

It says using both threads on 1core in X86 would beat M1 in single core.

What do think?

The point of benchmarking single tasking is to benchmark single tasking, not the total bandwidth of a “core.”

The benchmarks are single-task vs. multitask, not “single core” vs “multicore.”

If you are going to allow 2 threads to run in your benchmark for Intel, then you need to allow 2 threads to run in your benchmark for M1 (and M1 will still destroy Intel).

ADGrant · Nov 28, 2021

Agreed, the article referred to includes this flawed assumption: "(assuming the intent is to see which core is the fastest)?"

The intent is to see how fast a given processor can run a single core workload since many desktop workloads are single threaded. Even for applications which are multi-threaded, most of the work could still occur on a single thread since most GUI toolkits require that all UI events are processed on a single thread.

leman · Nov 28, 2021

cocoua said:
I’ve just saw this, and seems legit.

Exclusive: Why Apple M1 Single "Core" Comparisons Are Fundamentally Flawed (With Benchmarks)

I have something pretty exciting for our readers today; something that almost everyone appears to have missed in the clamor for Apple M1 benchmark comparisons. What if I told you that pretty much all of the single-core benchmark comparisons between the Apple M1 and modern x86 processors you see...

wccftech.com

It says using both threads on 1core in X86 would beat M1 in single core.

What do think?

I read that bit when it was published a year ago and it’s still one of the dumbest things I’ve ever seen. Especially since it pretends to use some sort of “higher knowledge” to argue its nonsensical point.

Granted, it might make sense to say that a performance of a SMT-capable core is higher than its peak single-thread throughout. After all, an SMT-capable core can (sometimes) run multiple threads with slightly higher efficiency (e.g. a SMT2 core might complete two simultaneous threads of work faster than it would complete two same threads of work in a sequence). But here is the thing… such statement is void of any practical meaning since the performance of a modern CPU does not remain constant during operation.

In a real world, ou either have multiple threads to run at the same time or not. If you don’t, why do you even care that your CPU has SMT? If you do - great, but then you have multiple threads to run and you should run them in the same way in all hardware when doing comparisons. At the end of the day, M1 might slightly outperform most x86 CPUs in single-threaded workloads. But an n-core core M1 will significantly outperform an n-core x86 SMT CPU when running n*2 threads. Even though the x86 CPU has the theoretical advantage of running two threads per core at once with higher efficiency, it will need to throttle down its performance when doing throughput-oriented work. On the other hand M1 is so efficient that it can operate close to its peak on this workload.

Bottomline: things get complicated when you consider different workloads and dynamic frequency shifting.

dgdosen · Nov 28, 2021

ADGrant said:
Agreed, the article referred to includes this flawed assumption: "(assuming the intent is to see which core is the fastest)?"

The intent is to see how fast a given processor can run a single core workload since many desktop workloads are single threaded. Even for applications which are multi-threaded, most of the work could still occur on a single thread since most GUI toolkits require that all UI events are processed on a single thread.

Talk about chaff vs wheat.

throAU · Nov 28, 2021

Intel only developed SMT because they couldn't fully utilise a core with a single thread due to stalls, etc. so claiming its unfair to x86 to measure single threaded performance is a bit... crap.

If you want to measure single thread, run a single thread and see how fast it goes. That's it. That's all that matters to the end user - the internal problems with Intel CPUs are irrelevant to the end user.

falainber · Nov 28, 2021

throAU said:
Intel only developed SMT because they couldn't fully utilise a core with a single thread due to stalls, etc. so claiming its unfair to x86 to measure single threaded performance is a bit... crap.

If you want to measure single thread, run a single thread and see how fast it goes. That's it. That's all that matters to the end user - the internal problems with Intel CPUs are irrelevant to the end user.

That's oversimplification. Single-threaded benchmarks are important because a lot of apps are single-threaded. But per core performance is meaningful too. A simple case to demonstrate this is the case of one core processors. The total CPU performance obviously benefits ftom MT. Obviously, single core processors are rare these days. But for multi-core chips, per core performance may be viewed as a proxy for performance per [silicon] area metric.

Gnattu · Nov 28, 2021

falainber said:
per core performance may be viewed as a proxy for performance per [silicon] area metric.

Why? They are not even at the same size. If we have to compare area efficiency, then "efficiency cores" almost always wins, but they don't deliver "per core performance" comparable to "performance cores"

throAU · Nov 28, 2021

falainber said:
That's oversimplification. Single-threaded benchmarks are important because a lot of apps are single-threaded. But per core performance is meaningful too.

Sure, and the M1 beats intel (for the most part, pre alder lake; definitely in performance per watt) in both.

What matters is how fast the software runs; trying to do stuff like disable hyper threading to make it "fair" (because intel have an inherent core utilisation problem) or whatever is irrelevant. What mattes is how much I need to spend, and how big/hot the machine is to get the level of performance. Couldn't care if it is 1 core or 100 cores if it runs my workload faster in less power.

edit:
I just find it hilarious how intel (and their fanboys) have been pushing single thread performance as a massively important metric vs. AMD with their huge numbers of extra cores, yet now its "not fair" because hyper threading which the M1 derivatives do not do.

Is single threaded code performance important or not?

Also... if they're going to whine about not fully saturating an Intel CPU and somehow claim the M1 is fully saturated, then they need to remember it can do a bunch of ML or transcoding entirely off the CPU; whereas the intel processor is going to need to use CPU or GPU cores for that. i.e., the M1 is not "fully saturated" either. Not by a long shot.

Furthermore... it's not like a Hyperthreaded core from intel has a full 2 thread capacity per core. Resources are shared. Some things hyper thread well. Some DO NOT. To claim that the other "half" of an x86 core is "unused" is total and utter bollocks. Because the only thing hyperthreading does is try and help use the parts of a single core that are not currently able to be used because the CPU has had a pipeline stall. There are not two cores in one. This is why you do NOT get 100% linear 2x scaling with hyper threading. Often, nowhere near it.

The whole point of ST performance metrics is to give an idea how fast a machine will be using a single thread. This situation will never use hyper threading, and thus will never load two virtual HT cores on an intel machine. SO trying to fudge numbers/benchmarks to better load 2 HT virtual cores on X86 as some sort of contrived "single core" benchmark performance victory is complete bollocks.

There's already a multi-thread benchmark option: turn on multiple threads and let the scheduler make use of everything in the machine.

Smells like extreme desperation to try and spin something that is meant to measure one thing, to measure something else entirely and then claim victory. Wonder how much intel paid.

But for multi-core chips, per core performance may be viewed as a proxy for performance per [silicon] area metric.

This is entirely irrelevant to the end user. What matters is what I can buy off the shelf, how much it costs and how well it runs my workload. How many cores it has, how many it uses, what processing node it is manufactured with, who makes it or whether or not it does hyper threading are all entirely irrelevant.

Pressure · Nov 28, 2021

Single core is very important, especially in something as "mundane" as web browsing.

Just run something like Kraken and Octane and you can immediately see there is a performance difference between current x86 solutions and Apple Silicon.

Run Mozilla Kraken, Google Octane v2, JetStream and Speedometer to get a sense of the performance.

This is something Apple have clearly prioritised in their design and browsing also just feels smooth on iOS / Apple Silicon devices.

senttoschool · Nov 28, 2021

Ah geez... not this stupid article again.

throAU · Nov 28, 2021

Pressure said:
Single core is very important, especially in something as "mundane" as web browsing.

I'd agree but I think we need to be clear what is important is single THREAD.

Single core, when you're trying to use two threads (which the single-threaded software case can't use) is irrelevant. Well, unless you have only a single core processor. Which few do in 2021.

Multi-threaded software can use multiple cores, so trying to use two threads on one intel core because of hyper threading to compare 1 non-ht apple core is disingenuous and entirely irrelevant to the situation in which less than the full number of cores (or at least more than one) can be utilised.

Any real world software that can use both threads on an intel core can use two non-ht cores. And will run faster doing that.

cnnyy20p · Nov 28, 2021

The read a similar article a long time ago and there were a lot of misunderstanding.

To summarize: the “core” means nothing in single core benchmark. Or would rather say “single thread” benchmark instead which is actually the correct description. Single thread benchmark tests how the CPU handles single thread tasks. Whatever how the CPU utilizes it’s resources or how it’s able to use other parts to accelerate the single thread task doesn’t matter. All these benchmarks care is how fast can the whole CPU system run single thread tasks.

It’s the benchmark false for using the word “single core test” when referring to a single thread test. After that controversy article I saw a lot of media articles change their word to “single thread performance” instead of “single core performance“ to avoid misleading.

It was also Apple‘s fault for saying M1 had “the fastest single core“. M1 didn’t had “the fastest single core” but it had “the fastest single thread performance“ by that time.

Also X86 core has been weirdly complex out of the single thread tasks for a long time with it’s hyperthreading feature which only benefits multi-thread workloads.

Edit: spelling

Pressure · Nov 28, 2021

throAU said:
I'd agree but I think we need to be clear what is important is single THREAD.

Single core, when you're trying to use two threads (which the single-threaded software case can't use) is irrelevant. Well, unless you have only a single core processor. Which few do in 2021.

Multi-threaded software can use multiple cores, so trying to use two threads on one intel core because of hyper threading to compare 1 non-ht apple core is disingenuous and entirely irrelevant to the situation in which less than the full number of cores (or at least more than one) can be utilised.

Any real world software that can use both threads on an intel core can use two non-ht cores. And will run faster doing that.

I agree, I should have written single thread ??

leman · Nov 29, 2021

throAU said:
Intel only developed SMT because they couldn't fully utilise a core with a single thread due to stalls, etc.

In the sake of fairness, I also think it is not correct to refer to SMT as some sort of "inferior" technology that is only there because x86 CPUs are slow or inefficient etc. (yes I know that yo didn't claim it but some do and I wanted to comment on it). SMT is a valid technology for boosting multi-treaded throughput and it has been successful in a variety of products. Would Apple benefit from it? Questionable. They can already reach very high core resource utilization, SMT sounds like adding a lot of complexity (and compromising the security model!) with no advantages.

throAU said:
Any real world software that can use both threads on an intel core can use two non-ht cores. And will run faster doing that.

This is very important point. When an x86 SMT core runs two threads, both of them run significantly slower than each thread would do in isolation. That's why x86 CPUs will first spawn active threads on separate cores and will only do SMT if all cores are already occupied.

cnnyy20p said:
It was also Apple‘s fault for saying M1 had “the fastest single core“. M1 didn’t had “the fastest single core” but it had “the fastest single thread performance“ by that time.

I think this is overly pedantic. Sure, one should technically talk about single threaded performance but if we want to be accurate we should also stop talking about CPU utilization etc. It is very clear what is mean when one mentions single-core performance, and it's definitely valid to talk about this way since that's the basic hardware model (each hardware thread is exposed as a logical CPU core to the system). I don't see how Apple's claims were inaccurate. Their core was definitely the fastest low-power core when doing what CPU cores usually do

throAU · Nov 29, 2021

leman said:
In the sake of fairness, I also think it is not correct to refer to SMT as some sort of "inferior" technology that is only there because x86 CPUs are slow or inefficient etc. (yes I know that yo didn't claim it but some do and I wanted to comment on it). SMT is a valid technology for boosting multi-treaded throughput and it has been successful in a variety of products. Would Apple benefit from it? Questionable. They can already reach very high core resource utilization, SMT sounds like adding a lot of complexity (and compromising the security model!) with no advantages.

Yeah to be clear, another way of saying they couldn't fully use the core is to say the core has more execution resources than a single thread can reliably utilise.

Potato/potato.

It's a method of speeding up multi-threaded execution and as valid as any other method at doing what it does.

But to claim that we need to use some sort of contrived 2 threaded benchmark and restrict it to one HT core to be "fair" or level the playing field is complete bollocks.

What matters in reality is single threaded performance and multi-threaded performance. Things that are single threaded can't make use of HT. Bad luck intel. But you do win back 20-30% or whatever on multi core with the same number of cores due to increased utilisation due to pipeline stalls. Which are super bad on intel due to the long pipelines required for high clock speeds.

All these things are design trade-offs for one reason or another but they're inherent to the platform. Apple probably don't need HT due to the simpler, less warty ISA, lower clocks and better/more memory bandwidth. But as a result they can't run their CPU clock so high and give up 30-40 years of software compatibility.

?‍♂️

leman · Nov 29, 2021

throAU said:
But to claim that we need to use some sort of contrived 2 threaded benchmark and restrict it to one HT core to be "fair" or level the playing field is complete bollocks.

Amen to that. And if someone her still doesn't see the ridiculousness, let me rephrase the claim in that article: "technically, my Ford is faster than a Ferrari since it has a bigger luggage compartment, so I need fewer trips when transporting my roadkill"

throAU · Nov 29, 2021

leman said:
Amen to that. And if someone her still doesn't see the ridiculousness, let me rephrase the claim in that article: "technically, my Ford is faster than a Ferrari since it has a bigger luggage compartment, so I need fewer trips when transporting my roadkill"

Pretty accurate

Single threaded speed is like how fast you can get a single person from A to B

This HT claim is like saying your 4 door family sedan is faster than a superbike because whilst slower it can carry 4 people... in 2-3x the time.

It's still slower at getting one person from A to B. Doesn't matter that it can carry four...

leman · Nov 29, 2021

throAU said:
Pretty accurate

Single threaded speed is like how fast you can get a single person from A to B

This HT claim is like saying your 4 door family sedan is faster than a superbike because whilst slower it can carry 4 people... in 2-3x the time.

It's still slower at getting one person from A to B. Doesn't matter that it can carry four...

And the funny thing is that a quad-core M1 will still get eight people faster from A to B than a quad-core x86 CPU with SMT.

yitwail · Nov 29, 2021

throAU said:
I just find it hilarious how intel (and their fanboys) have been pushing single thread performance as a massively important metric vs. AMD with their huge numbers of extra cores, yet now its "not fair" because hyper threading which the M1 derivatives do not do.

And in any benchmark where Intel or AMD surpass Apple Silicon, it could be claimed that it's unfair because Intel/AMD consume far greater power than the Apple cores.

leman · Nov 29, 2021

yitwail said:
And in any benchmark where Intel or AMD surpass Apple Silicon, it could be claimed that it's unfair because Intel/AMD consume far greater power than the Apple cores.

Well, this depends what you are looking at and what you are comparing. For example, it is a popular thing to compare the vanilla M1 (4P+4E) to AMD Cezanne (8P). The later wins on many throughput-oriented tests. On one hand this comparison does make sense, since both CPUs come in products at comparable price range. On the other hand these SKUs do target different product categories at different power consumption levels and if one takes low-power Cezanne SKUs, suddenly it doesn't perform nearly as well.

casperes1996 · Nov 29, 2021

throAU said:
Furthermore... it's not like a Hyperthreaded core from intel has a full 2 thread capacity per core. Resources are shared. Some things hyper thread well. Some DO NOT. To claim that the other "half" of an x86 core is "unused" is total and utter bollocks. Because the only thing hyperthreading does is try and help use the parts of a single core that are not currently able to be used because the CPU has had a pipeline stall. There are not two cores in one. This is why you do NOT get 100% linear 2x scaling with hyper threading. Often, nowhere near it.

I'm getting flashbacks to AMD's FX series of CPUs marketed as up to 8 cores when really it was just 4 cores where each core had two ALUs, haha

leman said:
In the sake of fairness, I also think it is not correct to refer to SMT as some sort of "inferior" technology that is only there because x86 CPUs are slow or inefficient etc. (yes I know that yo didn't claim it but some do and I wanted to comment on it). SMT is a valid technology for boosting multi-treaded throughput and it has been successful in a variety of products. Would Apple benefit from it? Questionable. They can already reach very high core resource utilization, SMT sounds like adding a lot of complexity (and compromising the security model!) with no advantages.

Indeed. I've always found IBM's 8-way SMT especially interesting. That is, IBM's POWER 8 chips came with SMT2, 4 and 8 - Always thought it was a wicked amount of assumed under-utilisation of the core for single threaded tasks if they thought it potentially beneficial to have 8 threads on one core. But it is also a very throughput oriented system

Do we have a clear picture of how good utilisation of Firestorm cores tends to be with 1 thread and how much could potentially be left on the table for something like SMT? I don't imagine it would be much but do we have any concrete evidence for it?

throAU · Nov 29, 2021

casperes1996 said:
Do we have a clear picture of how good utilisation of Firestorm cores tends to be with 1 thread and how much could potentially be left on the table for something like SMT? I don't imagine it would be much but do we have any concrete evidence for it?

Not sure if anyone has done analysis or at least published it, but would wager Apple have a very good idea of how well their cores are utilised with the code they run from the iPhone.

I'd also suspect that the huge caches and large memory bandwidth on the A and M series processors for their size are aimed at keeping the CPU as busy as possible and are a result of analysing the code their products are executing.

UBS28 · Nov 29, 2021

The article is right though. If synthetic benchmarks test single thread rather than single-core performance, it is not a fair comparison as the M1 doesn’t support HT. Basically 1 thread = 1 core on M1. While 1 core = 2 threads on Intel.

casperes1996 · Nov 29, 2021

UBS28 said:
The article is right though. If synthetic benchmarks test single thread rather than single-core performance, it is not a fair comparison as the M1 doesn’t support HT. Basically 1 thread = 1 core on M1. While 1 core = 2 threads on Intel.

The goal of single threaded benchmarks is to test the performance of single threaded code running on a given system. Not to test the total performance of a core. If your code can be parallelized then multi threaded benchmarks tend to be more relevant anyway

M1 single core not so fast at it seems in benchmarks vs X86

macrumors 65816

Suspended

macrumors 68000

macrumors Core

macrumors 68030

macrumors G4

macrumors 68040

macrumors 65816

macrumors G4

macrumors 603

macrumors 68030

macrumors G4

macrumors 6502

macrumors 603

macrumors Core

macrumors G4

macrumors Core

macrumors G4

macrumors Core

macrumors 6502

macrumors Core

macrumors 604

macrumors G4

macrumors 68030

macrumors 604

Our Staff