The subjective "snappiness" boils down to latency, which is this case is the ability of the system to react to user input (or things like animation timers) as quickly as possible. This is not the same as "performance".
Apple has worked for years to build systems with extremely low latency. Here are some things we know (or can reasonably speculate) about Apple Silicon in general, and M1 in particular:
- Very fast CPU interrupts and inter-CPU communication. Interrupts are the mechanism of how the CPU reacts to various hardware events (which includes timers, I/O and other things). Interrupts are traditionally heavy, since they require the CPU to pause the execution of the current task, save the state, switch to the interrupt handler code and do whatever needs to be done. Apple Silicon heavily relies on a custom interrupt system based on ARM FIQs, which are blazing fast compared to x86 interrupts
- Fast context switches for threads running on the same CPU core (although that is speculation, and I don't think it has been measured yet).
- Likely very fast CPU and GPU power mode switching and frequency adjustment. This is very important because power-hungry silicon is usually shut down between work packets to save energy. Cycling the power state can take time, and it usually takes milliseconds to adjust the boost clock if I remember correctly, and that is the time that CPU needs before it can even start processing requests at full speed. GPUs tend to be particularly slow here because it's not something that GPU manufacturers optimize for. Apple does. Just think about how quickly display wakes up from sleep on M1 machines.
- Efficiency cores. I agree with
@Ritsuka that they alone cannot account for the low latency we observe with M1 Macs, but they do help since they free up the main cores for more performance-critical work. That is why I believe that efficiency cores are as important for desktop use as they are for mobile performance — they free up system resources and allow the system to divert performance to where it's needed
- Fast single threaded performance (this goes without saying)
- Hardware GPU scheduling and full GPU async, which in combination with cache-coherent unified memory allows the GPU to process rendering requests with lower latency than traditional hardware (due to lower overhead in handling those requests)