Ah, I didn't look up the actual semantics and assumed something likeThat is backwards to the operation, though. The semantics of it is
A = B <op> (C:<shift_ct>)so the shift happens first. In the original AArch32, there was no discrete shift operation: it simply performed a move with a shifted/rotated source. AArch64 is structured a little differently and does have discrete shift ops. But the 64-bit version ought to be able to just feed the C operand through the barrel shifter on its way to the ALU – why it takes an extra cycle is unclear (unless it is for energy efficiency reasons, to reduce excess logic at the cost of 1 cycle for a composition that is only moderately used).
A = (B op C) shift N
.Barrel shifters are relatively expensive (substantially bigger than an adder), so I think that yeah, this probably just wasn't compelling enough to spend the resources on it.