Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

jdb8167

macrumors 601
Original poster
Nov 17, 2008
4,859
4,599
Here is a pretty good, semi-technical video on what makes the M1 fast. It is pretty balanced in that it examines what various benchmarks may actually show. It also makes an interesting point about single thread performance in Geekbench and Cinebench. Because many Intel CPUs use a two-thread design per CPU, the single thread tests may be disadvantaging those CPUs. But it also makes the point that the Intel Hyper-threading is a bit of a hack to work around a design that leaves starved execution units. All in all, I thought it was a pretty good explainer.

Apple's M1 isn't witchcraft, it's good chip design | Engadget
 
  • Like
Reactions: Tenkaykev

thenewperson

macrumors 6502a
Mar 27, 2011
992
912
It also makes an interesting point about single thread performance in Geekbench and Cinebench. Because many Intel CPUs use a two-thread design per CPU, the single thread tests may be disadvantaging those CPUs.
Hardly. A single-threaded test works on a single thread. That those other CPUs have 2 threads per core isn't really of anyone's concern and isn't a disadvantage to anyone. What would be a disadvantage would be if anyone took this seriously and it would be a disadvantage to the M1 (and other chips like it).
 
  • Like
Reactions: jdb8167

leman

macrumors Core
Oct 14, 2008
19,521
19,678
Because many Intel CPUs use a two-thread design per CPU, the single thread tests may be disadvantaging those CPUs. But it also makes the point that the Intel Hyper-threading is a bit of a hack to work around a design that leaves starved execution units.

This is an argument that recently gets thrown around when comparing Intel and M1, and I aways though it is incredibly awkward. What one is basically saying is that one a single Intel SMT core is capable of running two threads at less than 100% overhead, its 'real" single core performance is somehow better (?). It's somehow like saying that I am better at dancing than I actually am since I can also whistle as the same time. Single threaded performance is single threaded performance and Intel SMT won't help me run my critical single-threaded code better. The main reason why SMT exists in the first place is because x86 designs are unable to achieve good execution unit utilization in the first place (as you mention), be it because of x86 ISA limitations or just historic reasons of "traditional" CPU design. I can imagine that Apple could also squeeze some more performance out of SMT (since they have much wider backend), but at the same time their performance is so good that they probably don't need it — and it's not like SMT is free lunch anyway.

The funniest part about this argument is that folks making it don't bother subject M1 to the same treatment, that is, test the overhead of running two threads on the same CPU core (admittedly it's probably not really feasible with macOS anyway). And SMT doesn't really help Intel to even out the odds, since they have to severely underclock the CPU under sustained load.
 
  • Like
Reactions: jdb8167

jdb8167

macrumors 601
Original poster
Nov 17, 2008
4,859
4,599
That those other CPUs have 2 threads per core isn't really of anyone's concern and isn't a disadvantage to anyone.

The funniest part about this argument is that folks making it don't bother subject M1 to the same treatment, that is, test the overhead of running two threads on the same CPU core (admittedly it's probably not really feasible with macOS anyway). And SMT doesn't really help Intel to even out the odds, since they have to severely underclock the CPU under sustained load.
I found the argument interesting because I hadn’t heard it before and I’m still working through the implications. As you say, a two thread test isn’t a single thread test but it would still be a single core test on Intel. Testing a unit of the CPU’s hardware resources vs. a more abstract (software) designation of a thread might be considered legitimate. But it also isn’t really something that an OS supports being something of a hack.

If an Intel OS had an CPU affinity setting that allowed such a test, I’m not sure what it would show. I suspect that in most cases it would only be a very modest performance enhancement at best. I know tests frequently show that Intel Hyper-Threading actually degrades performance depending on compiler optimization.
 

Gnattu

macrumors 65816
Sep 18, 2020
1,107
1,671
If an Intel OS had an CPU affinity setting that allowed such a test
This cannot be done by OS because SMT is implemented below the OS, and OS sees two SMT threads are equally CPUs to OS's view. The OS cannot magically split works to multiple cores because a single serial instruction stream won't generate a second thread automatically unless instructed, which makes the comparison of two SMT threads' performance to one non-SMT core's performance pointless because these are different kind of tasks.

A single threaded test is a single threaded test, the test to benchmark how fast a CPU executes linear series of instructions. People may argue that some CPU architecture will never use up all pipelines in "single-thread mode", which limits the throughput lower than what the full potential of that CPU core, but this only means that CPU core is not designed to perform single-threaded job good enough.
 
Last edited:

jdb8167

macrumors 601
Original poster
Nov 17, 2008
4,859
4,599
This cannot be done by OS because SMT is implemented below the OS, and OS sees two SMT threads are equally CPUs to OS's view. The OS cannot magically split works to multiple cores because a single serial instruction stream won't generate a second thread automatically unless instructed, which makes the comparison of two SMT threads' performance to one non-SMT core's performance pointless because these are different kind of tasks.

A single threaded test is a single threaded test, the test to benchmark how fast a CPU executes linear series of instructions. People may argue that some CPU architecture will never use up all pipelines in "single-thread mode", which limits the throughput lower than what the full potential of that CPU core, but this only means that CPU core is not designed to perform single-threaded job good enough.
I was thinking of a two thread test where some sort of CPU affinity flag ties the two threads to a single CPU core. I realize this is almost certainly impossible since it would have no real world use. Again, I found the question interesting but not necessarily a compelling argument. And it doesn't seem like there is a way to test a single CPU core vs core in a fair way.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,678
I found the argument interesting because I hadn’t heard it before and I’m still working through the implications. As you say, a two thread test isn’t a single thread test but it would still be a single core test on Intel. Testing a unit of the CPU’s hardware resources vs. a more abstract (software) designation of a thread might be considered legitimate.

Isn't that a logical fallacy though? A thread is not just some software abstraction, it is a cornerstone of the CPU execution model. Computation has revolved around a notion of executing a serial stream of operations with ordered dependencies (threads) since the time immemorial. Now, that's obviously not the only possible computation model (contrast it with things like nondeterministic automata), but in the end it's the only one that matters, because that's how the hardware works.

At the end of the day, CPU's ability to execute a single stream of operations fast is a very important property. And that's what single-threaded tests measure. If your single CPU core has a hardware ability to interleave two such streams simultaneously on a single core with less then 100% overhead - great! But it's only useful for multi-threaded code, not for single-threaded one. Arguing that SMT results in a higher amortized ST performance is a substitution of notions. In the end, it's like arguing that a car has a great luggage compartment because if you and your spouse both drove a car at the same time, you'd be able to transport a lot of suitcases.

And finally, one can always find a way to abuse or emphasize a given hardware implementation. An expert with good micro-architectural knowledge could write a toy program that will be able to fully utilize M1's execution units while bogging an Intel CPU down, and via versa.

If an Intel OS had an CPU affinity setting that allowed such a test, I’m not sure what it would show. I suspect that in most cases it would only be a very modest performance enhancement at best. I know tests frequently show that Intel Hyper-Threading actually degrades performance depending on compiler optimization.

These APIs exist (not on macOS though), and the SMT behavior has been studied in detail. Modern SMT works very well and the cases where it would slow things down are mostly in the past. A great read: https://www.anandtech.com/show/1626...of-multithreading-on-zen-3-and-amd-ryzen-5000
 

Toutou

macrumors 65816
Jan 6, 2015
1,082
1,575
Prague, Czech Republic
The thing is, there's no "single-core performance", there's only "single-thread performance".
We programmers don't give a damn about what our threads run on, whether it's two per "core", one per "core", six per "ultra snake hammersword exxxxecution unit" or 723 per "half-unicorn".

We only care how fast the single thread is executed.

edit:
Just to clarify, yes I'm saying that the whole part where he discusses cores and hyperthreading is nonsense and the claim that HT-enabled chips are somehow disadvantaged by single-threaded benchmarks is a load of bull from a person who only kind of understands CPUs, but not really.
 
Last edited:
  • Like
Reactions: leman

NotTooLate

macrumors 6502
Jun 9, 2020
444
891
What a weird comparison to make , please decide how many threads you want to run in your test and the job you are trying to complete , and run it. You pick one thread ? Let’s compare results , you pick 2 ? Let’s compare results , you allow max utilization of the cores , let’s compare results. The reason to measure single thread performance is to have some kind of prediction on how single threaded applications will run , those are still popular workloads , when you go multithread , we should test that as well , you want to use 2 and utilize SMT ? Be my guest , but my machine will use its resources the best it can, SMT might shine when you have more threads then the amount of cores , so you can see M1 falling behind the x86 larger core count cpus in applications that can leverage those cores.

TLDR - always run your use case and measure results , don’t get hung up on the internal implementation of the design
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.