M1 single core not so fast at it seems in benchmarks vs X86

Bug-Creator · Nov 29, 2021

falainber said:
A simple case to demonstrate this is the case of one core processors.

O.k. fair enough.

Now just find my a single core dual thread processor.....

..... and tell me how it is relevant today.

leman · Nov 29, 2021

casperes1996 said:
Do we have a clear picture of how good utilisation of Firestorm cores tends to be with 1 thread and how much could potentially be left on the table for something like SMT? I don't imagine it would be much but do we have any concrete evidence for it?

Maybe @name99 has some data?

UBS28 said:
The article is right though. If synthetic benchmarks test single thread rather than single-core performance, it is not a fair comparison as the M1 doesn’t support HT. Basically 1 thread = 1 core on M1. While 1 core = 2 threads on Intel.

Single-core performance is single-treaded performance. Anything else and you are entering nonsense land. By that logic IMB makes the fastest CPU cores, but when you actually try running something on them they you'd get the performance of a wet noodle.

You want to know how fast a CPU can run stuff, not obfuscate test results by mixing it arbitrary hardware details. A single M1 core will be faster than a single Tiger Lake core no matter how many threads you run on it (can be one, two or one thousand).

Gnattu · Nov 29, 2021

The logical core stuff reminds me the time of A10 SoC, the first Apple SoC with "big-little" design. One efficiency core and one performance core are grouped together and exposed to OS as a single core. This is something like a "reverse SMT" to expose two CPU cores as one logical core. A big limitation of such design is that only one core type, either performance or efficiency, can be activated in the pair but not both, the benchmark MT score is therefore not faster than a dual-core chip. I cannot say those benchmarks are unfair because they cannot activate more cores using MT workloads because there is no way to do that.

MauiPa · Nov 29, 2021

I’ll say this in simple language that even you can understand (shutout to old 60s documentaries- yah they actually said that). The only reason for hyper-threading at all is to help fill wait states from inefficiencies in the x86 instruction set. Why hold the processor at idle when it is waiting for complex instructions to be finished, when you could have the processor run another thread to fill-in the waits? You could also overcome this waiting problem with more efficient instructions that wait less - the Arm or RISC approach. Who cares if it takes 1 instruction or 20 instructions to complete a task, you only care which approach finishes the task quicker AKA more efficiently.

Hyper-threading is a reasonable approach, but so is reducing the complexities of instruction sets. One requires multiple threads to work, the other is just inherently more efficient

Finally, let’s not forget that in single core Intel is using that look ahead scheme (which has proven to be a security vulnerability) to also increase efficiency. You can only get so much

It will be interesting to see if x86 chips hold on - or everyone migrates to a simpler - more efficient model. Lots of ARM development out there. Of course Qualcomm will probably require you to pay license fees on toasters if you want to use their socs, but I would expect their offerings to be substantial nonetheless

leman · Nov 29, 2021

Gnattu said:
The logical core stuff reminds me the time of A10 SoC, the first Apple SoC with "big-little" design. One efficiency core and one performance core are grouped together and exposed to OS as a single core. This is something like a "reverse SMT" to expose two CPU cores as one logical core. A big limitation of such design is that only one core type, either performance or efficiency, can be activated in the pair but not both, the benchmark MT score is therefore not faster than a dual-core chip. I cannot say those benchmarks are unfair because they cannot activate more cores using MT workloads because there is no way to do that.

Exactly. You have a test workload, you run it, you measure the result — that's what you get. Reframing it in the context "but this CPU can theoretically run X threads with improved efficiency so running one thread is not representative" is at best opportunism and at worst blatant manipulation. Want to talk about performance running multiple threads? Run multiple threads and measure the results! Coming up with some sort of hypothetical "core" performance instead (whatever that might be) is not useful in the least.

leman · Nov 29, 2021

MauiPa said:
I’ll say this in simple language that even you can understand (shutout to old 60s documentaries- yah they actually said that). The only reason for hyper-threading at all is to help fill wait states from inefficiencies in the x86 instruction set. Why hold the processor at idle when it is waiting for complex instructions to be finished, when you could have the processor run another thread to fill-in the waits? You could also overcome this waiting problem with more efficient instructions that wait less - the Arm or RISC approach. Who cares if it takes 1 instruction or 20 instructions to complete a task, you only care which approach finishes the task quicker AKA more efficiently.

It's hardly this simple. Power ISA is pretty much RISC — and yet Power10 has 8-way SMT! SMT is a design option, plain and simple.

crazy dave · Nov 29, 2021

Oh this article again … everyone else has already covered the salient technical points, but I’ll just add that the Anandtech writers Ian and Andre tried to correct this guy on Twitter and he just … didn’t get it.

Personally it was really eye opening how many “tech journalists” repeated it and gave it credence all based on not understanding simple terminology.

cmaier · Nov 29, 2021

leman said:
It's hardly this simple. Power ISA is pretty much RISC — and yet Power10 has 8-way SMT! SMT is a design option, plain and simple.

Alpha had SMT before Intel, and that was clearly RISC.

eicca · Nov 29, 2021

I have concluded benchmarks don’t really mean much. My 2020 work MacBook Air has benchmarks nearly double my old 2011 MacBook Pro, but the Air is far and above the slowest computer I use (and it only has one third-party app on it, which is Firefox). No idea why. But it ain’t benchmarks.

Another example: my Mac Pro has an even lower single core benchmark than my 2011 MacBook Pro, but single thread tasks are still somehow worlds faster.

The only real way to judge a computer is actual usage cases.

EDIT: Failed to specify, my 2020 MBA is the I5 model.

leman · Nov 29, 2021

eicca said:
I have concluded benchmarks don’t really mean much.

Of course benchmarks matter. But one needs to understand how to interpret them and whether they will apply to a specific use case. If you are a regular home/office user, the only benchmark that is relevant to you is how quickly the system responds to your action and that's not really measurable in the first place.

jonblatho · Nov 29, 2021

eicca said:
I have concluded benchmarks don’t really mean much. My 2020 work MacBook Air has benchmarks nearly double my old 2011 MacBook Pro, but the Air is far and above the slowest computer I use (and it only has one third-party app on it, which is Firefox). No idea why. But it ain’t benchmarks.

Another example: my Mac Pro has an even lower single core benchmark than my 2011 MacBook Pro, but single thread tasks are still somehow worlds faster.

The only real way to judge a computer is actual usage cases.

Assuming that this is an M1 MacBook Air and you experience that slowness in Firefox…not that Firefox is known for stellar performance/efficiency, but are you sure you’re using the Apple silicon version? Browsers can tend to struggle in Rosetta 2 translation.

futbalguy · Nov 29, 2021

eicca said:
I have concluded benchmarks don’t really mean much. My 2020 work MacBook Air has benchmarks nearly double my old 2011 MacBook Pro, but the Air is far and above the slowest computer I use (and it only has one third-party app on it, which is Firefox). No idea why. But it ain’t benchmarks.

Another example: my Mac Pro has an even lower single core benchmark than my 2011 MacBook Pro, but single thread tasks are still somehow worlds faster.

The only real way to judge a computer is actual usage cases.

Your M1 MacBook Air should crush the 2011 MacBook Pro. Check that you are running an M1 native app. Another possibility is the MacBook Air may have less memory and could be using swap space on the ssd which is much slower. The only other thing I can think of is the GPU on the MacBook Pro is better than MacBook Air, but 2011 is so old I don’t think it should be the case.

dgdosen · Nov 29, 2021

futbalguy said:
Your M1 MacBook Air should crush the 2011 MacBook Pro. Check that you are running an M1 native app. Another possibility is the MacBook Air may have less memory and could be using swap space on the ssd which is much slower. The only other thing I can think of is the GPU on the MacBook Pro is better than MacBook Air, but 2011 is so old I don’t think it should be the case.

Unless it's an early 2020 Intel version... I think those are particularly thermally constrained.

eicca · Nov 29, 2021

jonblatho said:
Assuming that this is an M1 MacBook Air and you experience that slowness in Firefox…not that Firefox is known for stellar performance/efficiency, but are you sure you’re using the Apple silicon version? Browsers can tend to struggle in Rosetta 2 translation.

futbalguy said:
Your M1 MacBook Air should crush the 2011 MacBook Pro. Check that you are running an M1 native app. Another possibility is the MacBook Air may have less memory and could be using swap space on the ssd which is much slower. The only other thing I can think of is the GPU on the MacBook Pro is better than MacBook Air, but 2011 is so old I don’t think it should be the case.

I failed to specify my 2020 MBA is the I5 model. Which still benchmarks double my 2011, but man that Air is just molasses. I'm trying to talk our IT guy into upgrading me to an M1.

eicca · Nov 29, 2021

dgdosen said:
Unless it's an early 2020 Intel version... I think those are particularly thermally constrained.

You know, I wonder if that's the thing. I almost never hear the fans on my MBA but it's catastrophically slow. Whereas my 2011 MBP spins up the fans pretty quick but still stays much faster.

tonyz123456 · Nov 29, 2021

I randomly came across this thread while browsing but am curious - what mainstream apps or games today are still single threaded where this matters? I don't think I've had a single core CPU computer in a very very long-term - maybe 10+ years so isn't single threaded benchmarks pointless since most software worth paying for have supported multi-core for years?

Whether it's optimized for the M1 chip or not is a separate topic.

Spindel · Nov 29, 2021

MauiPa said:
I’ll say this in simple language that even you can understand (shutout to old 60s documentaries- yah they actually said that). The only reason for hyper-threading at all is to help fill wait states from inefficiencies in the x86 instruction set. Why hold the processor at idle when it is waiting for complex instructions to be finished, when you could have the processor run another thread to fill-in the waits? You could also overcome this waiting problem with more efficient instructions that wait less - the Arm or RISC approach. Who cares if it takes 1 instruction or 20 instructions to complete a task, you only care which approach finishes the task quicker AKA more efficiently.

while I agree with you to a large degree it’s not only CISC cpus that have inefficiencies.

In example Power architecture has up to 8 threads per core because, even if the instruction set is reduced it has one hell of a lot for example multiplication units that can not be filled all the time. Thus it has HT/SMT.

jonblatho · Nov 29, 2021

eicca said:
I failed to specify my 2020 MBA is the I5 model. Which still benchmarks double my 2011, but man that Air is just molasses. I'm trying to talk our IT guy into upgrading me to an M1.

Ah yes, I forgot about the early 2020 Intel refresh. Yeah, there are some pretty serious thermal constraints on that so it’s probably just the 2015–2020 pattern of Apple asking too much of the CPUs and corresponding cooling they put into their machines, especially notebooks.

name99 · Nov 29, 2021

leman said:
Maybe @name99 has some data?

Single-core performance is single-treaded performance. Anything else and you are entering nonsense land. By that logic IMB makes the fastest CPU cores, but when you actually try running something on them they you'd get the performance of a wet noodle.

You want to know how fast a CPU can run stuff, not obfuscate test results by mixing it arbitrary hardware details. A single M1 core will be faster than a single Tiger Lake core no matter how many threads you run on it (can be one, two or one thousand).

What's the question?

I'm not interested in tribal idiocy.
You want to compare the performance of a SINGLE-THREADED M1 against a SINGLE-THREADED x86, well, look at the GB5 or AnandTech SPEC numbers.
You want to compare the multi-threaded performance of a particular SoC (M1 Pro 6 core or whatever) against a particular x86 SoC (Tiger Lake i7-1185G7) or whatever, again GB5 and AnandTech SPEC numbers give the results (spoiler alert -- more cores gives more throughput! -- and cost more! -- and use more energy!)

But when you want to start playing games where you say "I will insist that my unit of computation is whatever makes my team look best" that's where I lose patience. Why is the appropriate unit of comparison the "x86 hyperthreaded core" and not, for example, "the M1 P-cluster"?
If you're going to play that sort of game, more sensible targets are:
- performance per dollar or
- performance per watt.

I write, and explain, for people who want to understand. Not for people who are ONLY interested in dick-measuring.
I'd urge you to do the same. It's vastly more interesting figuring out how an M1 L1D cache works compared to a recent Intel cache, than wasting time trying to explain things to people who have zero interest in understanding.

cmaier · Nov 29, 2021

tonyz123456 said:
I randomly came across this thread while browsing but am curious - what mainstream apps or games today are still single threaded where this matters? I don't think I've had a single core CPU computer in a very very long-term - maybe 10+ years so isn't single threaded benchmarks pointless since most software worth paying for have supported multi-core for years?

Whether it's optimized for the M1 chip or not is a separate topic.

No. Many problems are simply not parallelizable. Even in multi-threaded apps, single thread performance matters.

casperes1996 · Nov 29, 2021

tonyz123456 said:
I randomly came across this thread while browsing but am curious - what mainstream apps or games today are still single threaded where this matters? I don't think I've had a single core CPU computer in a very very long-term - maybe 10+ years so isn't single threaded benchmarks pointless since most software worth paying for have supported multi-core for years?

Whether it's optimized for the M1 chip or not is a separate topic.

Yes and no. Let's take an extreme example to illustrate the point. Same principles apply in more realistic examples but we'll make it extreme to really illustrate it.

Let's say you want to play a YouTube video. Normally a lot of this will be offloaded to dedicated hardware and real world video codecs don't really parallelise exactly this way but let's say as an example that we can make 1 thread for every frame that needs to be decoded in the video. It's a short video so let's say there are 600. At 30FPS that's 20 seconds.
We have a pretty awesome beast of a multi-core machine with 2 hundred million cores 600 of which go to work on this task at the same time. Awesome, whole video should be decoded in no time. But actually, each core is super slow and takes about 12 minutes to decode one frame. Now because you had that many cores you can watch the whole video in 12 minutes; All frames will be ready. But it also takes 12 minutes to get just one frame ready. But you have super good multithreaded benchmarks with your 2 million cores!
The essence of the problem is that even if things are logically parallelised, it still matters how fast each individual task can be finished. Furthermore, some tasks are not possible to perform in parallel so the program may be multi-threaded where possible but it isn't possible everywhere. Some operations are logically dependant on the results of prior operations. For example, if you run a program that automates a task by checking for new emails and then grouping all your new emails in two piles depending on who they were from. Then the process of grouping emails logically depends on fetching new emails. So you have a sequence. We must firsts, as a single threaded task, fetch new emails. Once we have the emails however, we can check which group to throw them in in parallel inspecting each email as a unique task or whatever subdivision makes sense.

Did that clear it up?

leman · Nov 29, 2021

name99 said:
What's the question?

The question was this:

casperes1996 said:
Do we have a clear picture of how good utilisation of Firestorm cores tends to be with 1 thread and how much could potentially be left on the table for something like SMT? I don't imagine it would be much but do we have any concrete evidence for it?

As to the rest, I completely agree with you. The main reason why I even bother replying to this kind of nonsense is to try to stop a flow of misinformation. Even if it's a futile effort.

Anyway, I can't wait to get my 16" M1 and finally do some proper GPU programming

casperes1996 · Nov 29, 2021

leman said:
As to the rest, I completely agree with you. The main reason why I even bother replying to this kind of nonsense is to try to stop a flow of misinformation. Even if it's a futile effort.

I assume this relates to the thread in general and not my comment?

leman said:
Anyway, I can't wait to get my 16" M1 and finally do some proper GPU programming

What kind of GPU programming do you do? I know a little Discord community that mostly focuses around Metal, though it's predominantly graphics, not so much GPGPU, but I can recommend the 2etime Discord - It's mostly intended as a learning environment for newcomers to Metal but there are also more advanced users on there

; Including someone from Apple's Xcode GPU Debugger team

MysticCow · Nov 29, 2021

throAU said:
Is single threaded code performance important or not?

It's only important if it advances your agenda...

Analog Kid · Nov 29, 2021

falainber said:
But for multi-core chips, per core performance may be viewed as a proxy for performance per [silicon] area metric.

Just spitballing here, but wouldn’t a better proxy be taking overall performance and dividing by silicon area?

In general, I’m a fan of measuring something rather than measuring not-that-thing and calling it the thing.

M1 single core not so fast at it seems in benchmarks vs X86

macrumors 68000

macrumors Core

macrumors 65816

macrumors 68040

macrumors Core

macrumors Core

macrumors 68000

Suspended

Suspended

macrumors Core

macrumors 68030

macrumors 6502

macrumors 68030

Suspended

Suspended

macrumors member

macrumors 6502a

macrumors 68030

macrumors 68030

Suspended

macrumors 604

macrumors Core

macrumors 604

macrumors 68000

macrumors G3

Our Staff