No it doesn’t. It helps *x86* CPUs with multicore performance, because in x86 CPUs it’s difficult to keep the pipelines full (because of narrow issue, small register files, and difficulty with instruction reordering caused by too small a look-ahead window due to the difficulty of decoding).
That's not really the core issue.
We have already seen that M1 does a remarkable job of keeping the pipelines full. So if the cores are already completely busy in M1, which they essentially are (when running a big multi-core job), how can HT add anything further? The processor would need to stop a running thread to substitute in another, even though the first thread hasn’t hit a bubble.
That is in part is true becaue the M1's just simpily avoid workloads where SMT ( "hyperthreading" ) has more traction rather than it is vastly more immune or inherently better. Apple also has power gating where they can just turn off stuff that isn't being filled.
HT is a solution to problems caused by difficulties caused by CISC, and is of little benefit to CPUs that have heterogeneous cores that already run at very high IPC.
"Hyperthreading" is a marketing name that Intel created partially to offset the reality that they didn't invent SMT.
The initial research into SMT was done on a modification of Dec Alpha. The first paper's abstract.
"...
The increase in component density on modern microprocessors has led to a substantial increase in on-chip parallelism. In particular, modern superscalar RISCs can issue several instructions to independent functional units each cycle. However, the benefit of such superscalar architectures is ultimately limited by the parallelism available in a single thread.
This paper examines
simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar's multiple functional units in a single cycle. In the most general case, the binding between thread and functional unit is completely dynamic. We present several models of simultaneous multithreading and compare them with wide superscalar, fine-grain multithreaded, and single-chip, multiple-issue multiprocessing architectures. To perform these evaluations, we simulate a simultaneous multithreaded architecture based on the DEC Alpha 21164 design, and execute code generated by the Multiflow trace scheduling compiler. ..."
So the notion that this is some CISC versus RISC thing is mostly revisionist history. Every RISC implementation that stuck around in the "big iron" market into the current century picked up SMT ( Power , CISC and RISC code doesn't haven't inherent more single thread parallelism in them. There are limits to out of order and prediction that can be done on both. The gap between those two isn't as material as the order of magnitude gaps in the access memory lateness between the versus level of memory hierarchies.
Apple systems tend to have one and only one storage drive attached. The iOS devices never had SATA storage drives. The Macs are for the most part purged them. APFS runs slowly on HDDs and eschews any notion of RAID or large volume management. Apple's mac OS now also is pretty aggresive at caching persistent storage into memory to take that additional memory heirarchy level out of the picture for average workloads.
SMT's utility often doesn't show up in tech porn benchmarks that are more easily sucked into the L3 cache ( out of RAM memory even). So Apple gets some wins there where aspects of their target market are driven by those. Single user with single , largely sequential data streams are also low traction areas. (versus high number of users with aggregate random data streams. )
Arm put SMT on their E1 server baseline design.
"... Simultaneous multithread (SMT) enables Neoverse E1 to execute two threads concurrently resulting in improved aggregate throughput performance. ..."
https://www.arm.com/products/silicon-ip-cpu/neoverse/neoverse-e1
There is
no huge instruction set change between N1 and E1. There is a targeted workload change. That workload focus is the key issue. Not the instruction sets.
With the N2 they aren't chasing SMT so much because have a relatively smaller area of implementation for their cores. So just more , but lower power consuming , cores is an offset to not having SMT.
"... In fact, Arm says the ratio is around 3:1 in terms of replacing traditional SMT threads with cores, power-wise, which allows a large core-count-based Neoverse N2 SoC to compete well against traditional x86 SoCs with comparable thread count. ..."
Arm launches its next-generation server CPUs - Neoverse N2 and Neoverse V1 (formerly Perseus and Zeus). Targeting high-performance servers and the HPC market, the new cores bring 1.4-1.5x higher IPC , SVE support, BFloat16, and the ARMv9 architecture.
fuse.wikichip.org
One some workloads that will pan out. Other more effective random access ones it may not.
IBM Power has SMT8 mode option. That is a CISC processor? Not even close.
Intel has it because they need to stretch their baseline micro86_64 is architecture over a far wider set of CPU products than Apple does. Apple's got less than 64 cores limits weaved into their kernel. They don't even compete in some markets where the x86_64 is doing essential workloads every day.
However, to save implementation space and power, Intel's Alder Lake is using a relatively high number of E cores ( Gracemount , "Atom" class) cores. Those don't have SMT but also save on those two factors. If the Windows and Linux schedulers can be made to effectively use those cores in the most appropriate places then that will probably keep Intel "in the game" on generic consumer workloads until they can sort out their fab process issues.
SMT is not in Apple path far more so because they aren't trying to be a "everything for everyboad" CPU implementation. Apple is quite content to detour around some workloads that they don't consider to be important.