Fair enough: everyone values compact, efficient code. But, Arm made the decision to value fetch and decode efficiency more highly than bytes per instruction while Intel made the decision that letting instruction length vary between 1 and 15 bytes was the way to go.
If you do a quick web search, you'll find any number of sources still arguing that the advantage of the variable length instructions and CISC mentality is higher code density. Having never actually thought about it much, I've happily gone along without questioning that conventional wisdom. What we're seeing in practice though is that it is largely a myth. There's a lesson there.
There remains a difference to discuss between fixed length and variable length instructions. x86 still achieves fewer bytes per instruction compared to Arm, though that advantage appears to have further faded going to 64bit. I think that's two parts of the lesson: fewer bytes per instruction doesn't naturally equate to higher code density, and don't just look at what's best for today look at how the architecture will need to evolve and plan for it.
@Sydde's reply is excellent, as usual. From my side, I like to analyse an ISA from two perspectives. One regards information packaging — how much useful information does an instruction carry? how well can the ISA be used to express common algorithmic problems and patterns? Another one is the ease of hardware implementation: how much work does the CPU need to perform to decode an instruction? how much overhead is there in tracking dependencies? special rules the hardware needs to be aware of? how much additional processing/repackaging to be done before the instruction can to be executed?
For example, the big motivating point for early CISC designs (and in fact, their very justification) was to encode common algorithmic patterns as shorter instructions while making special rules for other common patterns (like multiply instruction that hard-coded where the operations result go). This kind of design both improved information packaging (you could express more intent with the same amount of binary code), which was great at the time when the RAM was at high premium, and facilitated simplified hardware implementation (multiplication didn't have to deal with combinatorial complexity of register outputs), which was great when the transistors and die area was at high premium.
But both software and hardware have evolved. On the software side, the algorithms and control flow patterns are vastly more complex compared to what people did in the 70ties and 80ties. On the hardware side, we have superscalar execution, where hundreds of instructions and their dependencies are tracked simultaneously and several are executed in parallel (and that's on single-threaded, serial code). So requirements changed too. What consisted good design in the 80-ties (saving some wires and execution unit complexity) was not a good design anymore in 2010, where register dependencies were already tracked in hardware. So compact and carefully crafted instruction spaces of early CISC ISAs stopped being an advantage and turned into an annoyance. Because the problems they were designed to solve didn't exist anymore, and they didn't do much to help with the new problems — like decoding multiple instructions in parallel to feed all the superscalar machinery or simplifying dependency tracking. And as an effect, the code became more RISC-like. When you look at x86-64 disassembly of modern complex applications, you will barely even see any register-memory operations (probably the signature feature of x86 architecture), simply because it's not that useful in practice. And as
@Sydde already mentioned, the more complex "CISC" instructions like ENTER and LEAVE are basically extinct.
This is where the beauty of 64-bit ARM comes in. It has been meticulously designed to optimise both useful information per instruction and ease of implementation, all of it fine-tuned for the needs of both the contemporary software and the superscalar hardware. The ISA is highly complex, because it aims to exploit frequently used software patterns as well as provide expression symmetry for more flexibility. The fact that the ISA is fixed-size is easily misleading — one immediately thinks that this is just for the ease of decoding, the obvious drawback being code density, and leaves it there. But one has to look at what's actually there. ARM64 addressing modes are designed to express common pointer arithmetic as compactly as possible (e.g. fetching a struct field from an array and incrementing a loop counter is one single instruction). It has things like multi-register loads (in one instruction), more architectural registers (reducing memory spills and moves), link register for more efficient calling of small routines, and many other things. And despite all this complexity the instructions can be decoded very efficiently and operations can be issued to the execution hardware swiftly, without extra processing, since the dependencies are clearly inferable from the instruction itself. Just one illustration: ARM64 can perform operand shift as part of an arithmetic instruction (this is a common pattern for address computation, for example). This will be executed in multiple steps on most hardware (first shift and then addition), but cheaper to have it as one instructions, where the operation dependency is clearly specified, than to execute it as a separate instruction pair, where you have to do separate register dependency tracking.
And as a final example (in what ends up being a way too long post) I want to mention is RISC-V. The design focus here is simplicity and ease of implementation on simple hardware. They don't go for the dense information packaging like ARM does, their instructions do only one operation (most of the time at least). This is pretty much the "classical RISC" as it is commonly described and understood. This makes an elegant and minimalist ISA, but it creates a problem, as common software patterns (like addressing into arrays of structs) now require multiple instructions. This problem becomes apparent when you want to make a really fast CPU, because you have to a higher cost for decoding multiple instructions and tracking dependencies to get to the same information state as an ARM CPU can with just a single instruction. And even more, you have to perform some additional information repackaging (aka plumbing) to get the format you need for efficient execution. Say your memory load unit can perform address shifts (it's a common thing as you often have to generate addresses line 8*i + k). Well, RISC-V does not have shifts as part of its load instruction, only immediate offset. The shift has to be done as a preceding arithmetic instruction. So if you want to make use of your fancy hardware address generation unit, you need to track the shift instruction, check if it uses a shift amount your address generation unit supports, eliminate this instruction from the flow, and forward the information to the address generation unit. That's quite a lot of work to do to just get the information parity with ARM!