The topic on whether M1 "get's more of it's RAM" has sparkled a lot of heated debate both on these forums and other platforms. This discussion is driven by user testimonies who can do work with a 8GB config that would have allegedly brought an Intel Mac to a crawl, and various YouTube vides (especially by Max Tech) that show M1 machines being surprisingly responsive under RAM pressure situations. There are some who claim that 8GB on M1 give you as much benefit as 16GB on Intel machines, those who claim that this all is completely ridiculous, and finally those who claim that perceived differences can be explained by a) M1 machines generally being snappier and b) macOS on M1 having more agile memory paging (swap). Particularly the last feature would allows the OS to quickly load user-facing applications in and out of the RAM to the SSD with very little interruption, creating the illusion that the machine has more RAM than it actually has.
I don't really have the means to "solve" this discussion, but I did come across something interesting that could give us a possible way to approach the problem scientifically. There is a recent tool called pmbench that aims to analyze the performance of OS paging. Unfortunately, it is only available for Linux and Windows, but the accompanying 2018 paper by Yang & Seymour focuses on pain using low-latency SSDs (like the one's found in modern Macs) and offers some very interesting discussion. Some quotes from the paper (highlighted by me).
Section 4.2.:
Section 4.3.:
Basically, the author conclude that the OS overhead of paging is higher than the cost to access the SSD itself. Furthermore, they have observed many spikes in access latency across the tests (over 10% of page faults took more than 1ms to process). Now, these delays are significant. If you have multiple (dozens or hundreds, which is quite possible when switching between apps in a memory starved system) page faults in a rapid succession, these delays can accumulate to fractions of a second, which will absolutely wreck you user experience. And these are not hardware problems — it's all just the software overhead.
So let's get back to Apple Silicon Macs. Now, I can't run pmbench on macOS, so I can't verify any of this, but I don't think it would be that far-fetched to assume that Apple was aware of these problems and that they have optimized both hardware and software of their new Macs for low-latency swapping. Things that we do know is that M1 features low-latency hardware interrupts (with the overhead of just few machine cycles), 16KB memory pages (so it can process 4x as much memory in a single swap operation compared to x86 systems) and an on-chip SSD controller with a NVMe SSD presumably connected via some sort of proprietary bus. If this is not a setup for low-latency swap, I don't know what would be. All that it takes is OS software that is aware of these hardware features and is taking advantage of them. This is not trivial, but also not too complex when you think that they only need to support a single hardware configuration.
If pmbench were to be ported to Mac (maybe someone would be interested in taking a swing at that?), I'd expect to see significantly lower software overhead and a much much shorter tail. If all paging operations can be carried out in under 20μs the system would stay responsive even with constant swapping. I think there is enough circumstantial evidence to claim that this is indeed what we see happening on these new Macs.
I don't really have the means to "solve" this discussion, but I did come across something interesting that could give us a possible way to approach the problem scientifically. There is a recent tool called pmbench that aims to analyze the performance of OS paging. Unfortunately, it is only available for Linux and Windows, but the accompanying 2018 paper by Yang & Seymour focuses on pain using low-latency SSDs (like the one's found in modern Macs) and offers some very interesting discussion. Some quotes from the paper (highlighted by me).
Section 4.2.:
As shown by the peak of the graph, the most frequent access latency is at 14.1 μs. This implies that the majority of major faults that go through the frequent path have software overhead of about 9 μs – almost twice the SSD latency.
Furthermore, there are too many long-latency faults: analysis shows that 33% of total execution time is spent on faults taking longer than 100 μs to handle, and 12.5% on faults taking 1 ms or more.
Section 4.3.:
Windows is 8.7 times slower than Linux in processing faults in low-memory conditions. Windows suffers from slow frequent-path and inefficient scheduling policy as discussed in Section 4.2. Linux, though faring better than Windows, still has room for improvement: the average 3.6 μs OS overhead translates to a substantial 14,400 clock cycles
Basically, the author conclude that the OS overhead of paging is higher than the cost to access the SSD itself. Furthermore, they have observed many spikes in access latency across the tests (over 10% of page faults took more than 1ms to process). Now, these delays are significant. If you have multiple (dozens or hundreds, which is quite possible when switching between apps in a memory starved system) page faults in a rapid succession, these delays can accumulate to fractions of a second, which will absolutely wreck you user experience. And these are not hardware problems — it's all just the software overhead.
So let's get back to Apple Silicon Macs. Now, I can't run pmbench on macOS, so I can't verify any of this, but I don't think it would be that far-fetched to assume that Apple was aware of these problems and that they have optimized both hardware and software of their new Macs for low-latency swapping. Things that we do know is that M1 features low-latency hardware interrupts (with the overhead of just few machine cycles), 16KB memory pages (so it can process 4x as much memory in a single swap operation compared to x86 systems) and an on-chip SSD controller with a NVMe SSD presumably connected via some sort of proprietary bus. If this is not a setup for low-latency swap, I don't know what would be. All that it takes is OS software that is aware of these hardware features and is taking advantage of them. This is not trivial, but also not too complex when you think that they only need to support a single hardware configuration.
If pmbench were to be ported to Mac (maybe someone would be interested in taking a swing at that?), I'd expect to see significantly lower software overhead and a much much shorter tail. If all paging operations can be carried out in under 20μs the system would stay responsive even with constant swapping. I think there is enough circumstantial evidence to claim that this is indeed what we see happening on these new Macs.
Last edited: