Good balanced video on Apple's M1 chip design

jdb8167 · Feb 8, 2021

Here is a pretty good, semi-technical video on what makes the M1 fast. It is pretty balanced in that it examines what various benchmarks may actually show. It also makes an interesting point about single thread performance in Geekbench and Cinebench. Because many Intel CPUs use a two-thread design per CPU, the single thread tests may be disadvantaging those CPUs. But it also makes the point that the Intel Hyper-threading is a bit of a hack to work around a design that leaves starved execution units. All in all, I thought it was a pretty good explainer.

Apple's M1 isn't witchcraft, it's good chip design | Engadget

thenewperson · Feb 8, 2021

jdb8167 said:
It also makes an interesting point about single thread performance in Geekbench and Cinebench. Because many Intel CPUs use a two-thread design per CPU, the single thread tests may be disadvantaging those CPUs.

Hardly. A single-threaded test works on a single thread. That those other CPUs have 2 threads per core isn't really of anyone's concern and isn't a disadvantage to anyone. What would be a disadvantage would be if anyone took this seriously and it would be a disadvantage to the M1 (and other chips like it).

leman · Feb 9, 2021

jdb8167 said:
Because many Intel CPUs use a two-thread design per CPU, the single thread tests may be disadvantaging those CPUs. But it also makes the point that the Intel Hyper-threading is a bit of a hack to work around a design that leaves starved execution units.

This is an argument that recently gets thrown around when comparing Intel and M1, and I aways though it is incredibly awkward. What one is basically saying is that one a single Intel SMT core is capable of running two threads at less than 100% overhead, its 'real" single core performance is somehow better (?). It's somehow like saying that I am better at dancing than I actually am since I can also whistle as the same time. Single threaded performance is single threaded performance and Intel SMT won't help me run my critical single-threaded code better. The main reason why SMT exists in the first place is because x86 designs are unable to achieve good execution unit utilization in the first place (as you mention), be it because of x86 ISA limitations or just historic reasons of "traditional" CPU design. I can imagine that Apple could also squeeze some more performance out of SMT (since they have much wider backend), but at the same time their performance is so good that they probably don't need it — and it's not like SMT is free lunch anyway.

The funniest part about this argument is that folks making it don't bother subject M1 to the same treatment, that is, test the overhead of running two threads on the same CPU core (admittedly it's probably not really feasible with macOS anyway). And SMT doesn't really help Intel to even out the odds, since they have to severely underclock the CPU under sustained load.

jdb8167 · Feb 9, 2021

thenewperson said:
That those other CPUs have 2 threads per core isn't really of anyone's concern and isn't a disadvantage to anyone.

leman said:
The funniest part about this argument is that folks making it don't bother subject M1 to the same treatment, that is, test the overhead of running two threads on the same CPU core (admittedly it's probably not really feasible with macOS anyway). And SMT doesn't really help Intel to even out the odds, since they have to severely underclock the CPU under sustained load.

I found the argument interesting because I hadn’t heard it before and I’m still working through the implications. As you say, a two thread test isn’t a single thread test but it would still be a single core test on Intel. Testing a unit of the CPU’s hardware resources vs. a more abstract (software) designation of a thread might be considered legitimate. But it also isn’t really something that an OS supports being something of a hack.

If an Intel OS had an CPU affinity setting that allowed such a test, I’m not sure what it would show. I suspect that in most cases it would only be a very modest performance enhancement at best. I know tests frequently show that Intel Hyper-Threading actually degrades performance depending on compiler optimization.

Gnattu · Feb 9, 2021

jdb8167 said:
If an Intel OS had an CPU affinity setting that allowed such a test

This cannot be done by OS because SMT is implemented below the OS, and OS sees two SMT threads are equally CPUs to OS's view. The OS cannot magically split works to multiple cores because a single serial instruction stream won't generate a second thread automatically unless instructed, which makes the comparison of two SMT threads' performance to one non-SMT core's performance pointless because these are different kind of tasks.

A single threaded test is a single threaded test, the test to benchmark how fast a CPU executes linear series of instructions. People may argue that some CPU architecture will never use up all pipelines in "single-thread mode", which limits the throughput lower than what the full potential of that CPU core, but this only means that CPU core is not designed to perform single-threaded job good enough.

jdb8167 · Feb 9, 2021

Gnattu said:
This cannot be done by OS because SMT is implemented below the OS, and OS sees two SMT threads are equally CPUs to OS's view. The OS cannot magically split works to multiple cores because a single serial instruction stream won't generate a second thread automatically unless instructed, which makes the comparison of two SMT threads' performance to one non-SMT core's performance pointless because these are different kind of tasks.

A single threaded test is a single threaded test, the test to benchmark how fast a CPU executes linear series of instructions. People may argue that some CPU architecture will never use up all pipelines in "single-thread mode", which limits the throughput lower than what the full potential of that CPU core, but this only means that CPU core is not designed to perform single-threaded job good enough.

I was thinking of a two thread test where some sort of CPU affinity flag ties the two threads to a single CPU core. I realize this is almost certainly impossible since it would have no real world use. Again, I found the question interesting but not necessarily a compelling argument. And it doesn't seem like there is a way to test a single CPU core vs core in a fair way.

Gnattu · Feb 9, 2021

jdb8167 said:
CPU affinity flag ties the two threads to a single CPU core

In fact you can. We do have such technology, it is used mostly in virtualization.

leman · Feb 9, 2021

jdb8167 said:
I found the argument interesting because I hadn’t heard it before and I’m still working through the implications. As you say, a two thread test isn’t a single thread test but it would still be a single core test on Intel. Testing a unit of the CPU’s hardware resources vs. a more abstract (software) designation of a thread might be considered legitimate.

Isn't that a logical fallacy though? A thread is not just some software abstraction, it is a cornerstone of the CPU execution model. Computation has revolved around a notion of executing a serial stream of operations with ordered dependencies (threads) since the time immemorial. Now, that's obviously not the only possible computation model (contrast it with things like nondeterministic automata), but in the end it's the only one that matters, because that's how the hardware works.

At the end of the day, CPU's ability to execute a single stream of operations fast is a very important property. And that's what single-threaded tests measure. If your single CPU core has a hardware ability to interleave two such streams simultaneously on a single core with less then 100% overhead - great! But it's only useful for multi-threaded code, not for single-threaded one. Arguing that SMT results in a higher amortized ST performance is a substitution of notions. In the end, it's like arguing that a car has a great luggage compartment because if you and your spouse both drove a car at the same time, you'd be able to transport a lot of suitcases.

And finally, one can always find a way to abuse or emphasize a given hardware implementation. An expert with good micro-architectural knowledge could write a toy program that will be able to fully utilize M1's execution units while bogging an Intel CPU down, and via versa.

jdb8167 said:
If an Intel OS had an CPU affinity setting that allowed such a test, I’m not sure what it would show. I suspect that in most cases it would only be a very modest performance enhancement at best. I know tests frequently show that Intel Hyper-Threading actually degrades performance depending on compiler optimization.

These APIs exist (not on macOS though), and the SMT behavior has been studied in detail. Modern SMT works very well and the cases where it would slow things down are mostly in the past. A great read: https://www.anandtech.com/show/1626...of-multithreading-on-zen-3-and-amd-ryzen-5000

Toutou · Feb 10, 2021

The thing is, there's no "single-core performance", there's only "single-thread performance".
We programmers don't give a damn about what our threads run on, whether it's two per "core", one per "core", six per "ultra snake hammersword exxxxecution unit" or 723 per "half-unicorn".

We only care how fast the single thread is executed.

edit:
Just to clarify, yes I'm saying that the whole part where he discusses cores and hyperthreading is nonsense and the claim that HT-enabled chips are somehow disadvantaged by single-threaded benchmarks is a load of bull from a person who only kind of understands CPUs, but not really.

NotTooLate · Feb 11, 2021

What a weird comparison to make , please decide how many threads you want to run in your test and the job you are trying to complete , and run it. You pick one thread ? Let’s compare results , you pick 2 ? Let’s compare results , you allow max utilization of the cores , let’s compare results. The reason to measure single thread performance is to have some kind of prediction on how single threaded applications will run , those are still popular workloads , when you go multithread , we should test that as well , you want to use 2 and utilize SMT ? Be my guest , but my machine will use its resources the best it can, SMT might shine when you have more threads then the amount of cores , so you can see M1 falling behind the x86 larger core count cpus in applications that can leverage those cores.

TLDR - always run your use case and measure results , don’t get hung up on the internal implementation of the design

Search

Search

Good balanced video on Apple's M1 chip design

jdb8167

macrumors 601

thenewperson

macrumors 65816

leman

macrumors Core

jdb8167

macrumors 601

Gnattu

macrumors 65816

jdb8167

macrumors 601

Gnattu

macrumors 65816

leman

macrumors Core

Toutou

macrumors 65816

NotTooLate

macrumors 6502

Our Staff