If the load-store vs reg/mem is the crucial distinction point, why not talk about this instead of using non-transparent labels like RISC and CISC? There is nothing "reduced" about the ARM instruction set, nor is the x86-64 particularly "complex" in comparison (messy and convoluted, maybe). Modern ARM certain has more addressing modes and instructions than x86. And there is a huge difference in complexity between ARM and core RISC-V, for example.
The problem is that the original acronym was misleading, and as a result, what the general public thinks it means is very different from how it's used by the small community of people who get to design ISAs.
RISC wasn't ever merely about reducing instruction count. Perhaps the most important innovation of the RISC movement was bringing much more analytical rigor to ISA design. For example, at one time lots of people believed that future ISAs should close the "semantic gap" between CPU instructions and high level languages by making the former much more like the latter. This wasn't based on much beyond feeling it was the right thing to do. RISC rejected this in favor of a more scientific, data-driven approach.
As for the "reduced" theme, that wasn't primarily about counting up instructions, even though that has been a very popular interpretation suggested by the acronym. The important thing to reduce is the number of implementation pain points. This makes it easier and cheaper to design high performance implementations, and reduces their gate count and power draw. By itself, instruction count isn't a great predictor of implementation complexity.
In the CPU design community, RISC is also used as a shorthand for ISAs which are recognizably in the family tree of the original 1980s RISC ISAs: load/store, usually 32 general purpose registers, usually 32-bit fixed size instruction word, limited yet sufficient addressing modes, and several other things.
32-bit Arm was an outlier among those early RISCs; you could make an argument that it shouldn't have been lumped in with the rest. However, modern arm64 is a very orthodox RISC ISA. Ignore the high instruction count, that's not important. Most arm64 instructions are just variations on a theme, and none look very complex to implement.
Here's an example. I've frequently seen people cite arm64's "Javascript instruction" as evidence that it's not really a RISC. When you look this instruction up, it's a variant of floating point to integer conversion with a specialized rounding mode. For reasons I won't go into here, this variant is extremely important to Javascript performance.
The extra gate required by this instruction is almost nothing: it's a low impact extra mode for execution resources which have to exist for other FP-int conversion instructions. The impact is high, thanks to how important JS is in today's world. So arm64's ISA architects decided it was worth it to burn a single opcode (one of the most precious ISA resources, from their perspective) on it.
arm64 is full of things like that. They clearly did a ton of homework trying to figure out places where they could offer high-leverage, low-cost variants of common operations. It doesn't mean that the resulting ISA isn't RISC, as it's still an extremely regular and simple ISA design.
I'd say Leman is right here. The number of general purpose registers available is not standardised among chips claiming to be either RISC or CISC. AArch64 IIRC has 32 GPRs. x64 has 16. 68K has 8 and 8 address registers. And what about vector registers like AVX, NEON, SVE, etc.? There's no point grouping the ISAs together in CISC or RISC camps when you need to consider each chip individually anyway. Each different ISA also has vastly different instructions available and much to Leman's point the only thing that really seems to determine whether something categorises as CISC or RISC is whether you perform load/store or can do register-memory operations directly. With things like SVE2 it's hardly like AArch64 is that reduced after all; Still quite advanced and numerous instructions in there.
With what I've written above, do you see that the important thing is not how many instructions or even whether they are "advanced"? Keep in mind that some things which seem 'advanced' from the software perspective are dead easy when designing gates, and some things which seem trivial are a giant pain in the butt.
This hasn’t been the case for many years. ARM instructions on current CPUs do not execute any faster than x86 instructions (if anything, it’s a property of the implementation and not the ISA).
There's two factors at work here.
One is that while x86 can and should be classified as a CISC ISA, reality is messier than a pure binary one-or-the other kind of thing. x86 was one of the RISC-iest of the CISC ISAs. You noted that x86 doesn't have tons of addressing modes, and addressing modes are one of the key metrics which can make an ISA more or less "CISCy". Just like everything else, one mustn't get hung up on the
number of modes, it's really about implementation complexity. Do any of the modes make life really difficult for hardware designers? Mostly by accident, x86 avoided some of the common addressing mode pitfalls other pre-RISC ISAs fell into, and that was very important to x86 managing to survive the 1980s.
And ARM has its fair share of instructions that execute in multiple steps. For example, shift+add is one instruction in ARM, but pretty much every modern implementation (including Apple) executes it in two steps. And ARM designs until very recently even used micro-ops, just like x86 CPUs. So I really don’t see how this applies to the current situation.
I don't think there's any significant use of microcode in mainstream high performance 64-bit Arm core. Maybe in those which still have support for AArch32, but cores like Apple's (where AArch32 is a distant and unsupported memory), not so much.
More importantly, you have to look at all the outcomes, not just clock speed. For example, consider Zen 4 vs M1, as that's as close to the same process node as we can compare. The Zen 4 core is much larger than Apple's Firestorm core. Zen 4 scales to higher frequencies, but Apple's core delivers profoundly better perf/Hz and perf/W. If ISA doesn't matter at all, one would expect such differences to be far less pronounced.
I suspect the story is that x86 did offer a few “high-level” instructions at some point, to simplify programming in assembler. These instructions were never much used and have been made obsolete long time ago n
No, x86 was never particularly high-level.
The 8086 was the successor of Intel's 8080 and 8085. The biggest new feature was support for a 20-bit (1MB) address space, up from 16-bit (64KB). 8086 wasn't binary compatible with the 8085, but was intentionally mostly assembly language source compatible, as that was an important selling feature in many of the markets 8080 and 8085 had sold into.
Because the 8080 and 8085 were designed in the early 1970s, there just wasn't the transistor budget to do anything fancy. 8086 wasn't much more than that because when that project kicked off, Intel already had a team working on their extremely ambitious all-new 32-bit architecture of the future, iAPX 432. 432 was a "close the semantic gap" design: it had HLL features (capabilities, objects, garbage collection) baked into the ISA and microcode. 8086 was just a side project to keep existing 8085 customers loyal to Intel while the 432 team finished their work.
But the 432 was a dismal failure. Extremely late, incredibly slow, and ironically, its advanced ISA features made it extremely difficult to port existing operating systems and applications. It was a complete disaster, far worse than Itanium.
Concurrent with the 432 beginning to fail, x86 received the windfall of IBM selecting it for the IBM PC, and the PC's success meant x86 got allocated resources for some upgrades. After some false steps in the 286, Intel came up with some decent ideas for cleaning up the ugliest aspects of the 8086 ISA in the 386, and perhaps even more importantly, didn't succumb to the temptation to add too much.
If any of this had gone a little bit differently, we wouldn't have x86 as we know it today. For example, if 8086 had been regarded as the important project, it might have gotten the resources to be more ambitious, and that might have resulted in the inclusion of base ISA features too difficult to paper over in the long term. Designing microprocessor ISA features for ease of pipelined, superscalar, and out-of-order implementation was not something on anyone's mind in the 1970s; it really was a weird historical accident that x86 managed to avoid problems common to its contemporaries.