@Sledneck52 I won’t try to change your conclusion. However, I will attempt to explain why the “wow factor” varies.
Processing Cores
You mention your iMac is a
2013 21.5” i7 model, which has a quad-core CPU. The M1 is an octo-core CPU, four high-performance and four high-efficiency cores. However, let’s ignore the high-efficiency cores as they’ll be assigned small, background tasks such as I/O traffic (e.g., Wi-Fi, USB, SSD), Notifications Center operations, etc. It’s important to also note, multiple processors designs have a limitation, not every workload can and does utilize them all.
CPU Frequency
Your iMac’s Core i7 has a clock speed of up to 3.1GHz per core. The
M1’s four high-performance cores are up to 3.2GHz each. Considering just these values, the M1 would be only ~3% faster. In contrast, my current Mac has a 2.5GHz dual-core CPU. Sticking with simple arithmetic, 2 x 2.5 = 5 vs. 3.2 x 4 = 12.8 (a 156% increase). Intel CPUs do have Turbo Boost, automatic overclocking, but again, let’s forget about it because that feature only applies to a single core and is normally applied in bursts. Lastly, every processor manufacturer has reached a clock speed ceiling (~5.5 GHz), which is why we’ve been seeing CPU and GPU designs with more and more cores.
CPU Efficiency
Beyond adding more cores, CPU engineers increased the onboard memory cache and the amount of instructions/operations cores can execute per cycle.
On consumer-class CPUs, the cache amount isn’t extremely significant (.e.g., 3 to 16MB). On the other hand, workstation-class CPUs can include 128MB+ but also have dozens of processing cores.
Unfortunately, the instructions per cycle (IPC) improvements have similarly been minor for each CPU generation.
CPU Grade has a nice visualization.
CPU Grade said:
Cinebench R15 — Normalized to 3.00 GHz Single-Threaded Performance
In order to initiate an instruction throughput comparison, we must first normalize the processors to one fixed frequency. All models will comfortably operate at 3.00 GHz, and in the case that we might want to add older architectures to the graphs at a later date, the lower frequency also ensures that we don't need to start from scratch with what we have already. All architectures dating back to AMD's K8 and Intel's NetBurst are capable of reaching 3.00 GHz. Perfect.
To simplify a bit, there has only been ~30% IPC increase by Intel over seven years (the
Sandy Bridge to Coffee Lake microarchitectures) — according to a single benchmark, though I don’t think you’ll disagree with these results.
Optimization
Instruction set extensions (e.g., SSE4, AVX) are designed for specific tasks (e.g., video conversion, floating point calculations, machine learning) aiding software developers in performing these operations as efficiently as possible. Beyond CPUs, this type of optimization has been available in add-on card format (e..g, graphics cards, audio cards, video capture cards). Apple’s M1 combines these implementations. While the Arm architecture has its own versions of instruction set extensions, the M1 system on a chip (SoC) integrates hardware additions that behave like accelerator add-on cards (e.g., Neural Engine). When software developers fully utilize these, performance will/should be as optimal as possible (e.g., Apple used Intel’s SSE in iTunes for audio file conversions), therefore, creating the biggest boost in performance. Of course, proper and full implementation requires significant time and effort, plus whether or not there is an applicable, optimized component available for the use case.