Concerning the "M1 is better with RAM" myth

leman · Apr 5, 2021

The topic on whether M1 "get's more of it's RAM" has sparkled a lot of heated debate both on these forums and other platforms. This discussion is driven by user testimonies who can do work with a 8GB config that would have allegedly brought an Intel Mac to a crawl, and various YouTube vides (especially by Max Tech) that show M1 machines being surprisingly responsive under RAM pressure situations. There are some who claim that 8GB on M1 give you as much benefit as 16GB on Intel machines, those who claim that this all is completely ridiculous, and finally those who claim that perceived differences can be explained by a) M1 machines generally being snappier and b) macOS on M1 having more agile memory paging (swap). Particularly the last feature would allows the OS to quickly load user-facing applications in and out of the RAM to the SSD with very little interruption, creating the illusion that the machine has more RAM than it actually has.

I don't really have the means to "solve" this discussion, but I did come across something interesting that could give us a possible way to approach the problem scientifically. There is a recent tool called pmbench that aims to analyze the performance of OS paging. Unfortunately, it is only available for Linux and Windows, but the accompanying 2018 paper by Yang & Seymour focuses on pain using low-latency SSDs (like the one's found in modern Macs) and offers some very interesting discussion. Some quotes from the paper (highlighted by me).

Section 4.2.:

As shown by the peak of the graph, the most frequent access latency is at 14.1 μs. This implies that the majority of major faults that go through the frequent path have software overhead of about 9 μs – almost twice the SSD latency.

Furthermore, there are too many long-latency faults: analysis shows that 33% of total execution time is spent on faults taking longer than 100 μs to handle, and 12.5% on faults taking 1 ms or more.

Section 4.3.:

Windows is 8.7 times slower than Linux in processing faults in low-memory conditions. Windows suffers from slow frequent-path and inefficient scheduling policy as discussed in Section 4.2. Linux, though faring better than Windows, still has room for improvement: the average 3.6 μs OS overhead translates to a substantial 14,400 clock cycles

Basically, the author conclude that the OS overhead of paging is higher than the cost to access the SSD itself. Furthermore, they have observed many spikes in access latency across the tests (over 10% of page faults took more than 1ms to process). Now, these delays are significant. If you have multiple (dozens or hundreds, which is quite possible when switching between apps in a memory starved system) page faults in a rapid succession, these delays can accumulate to fractions of a second, which will absolutely wreck you user experience. And these are not hardware problems — it's all just the software overhead.

So let's get back to Apple Silicon Macs. Now, I can't run pmbench on macOS, so I can't verify any of this, but I don't think it would be that far-fetched to assume that Apple was aware of these problems and that they have optimized both hardware and software of their new Macs for low-latency swapping. Things that we do know is that M1 features low-latency hardware interrupts (with the overhead of just few machine cycles), 16KB memory pages (so it can process 4x as much memory in a single swap operation compared to x86 systems) and an on-chip SSD controller with a NVMe SSD presumably connected via some sort of proprietary bus. If this is not a setup for low-latency swap, I don't know what would be. All that it takes is OS software that is aware of these hardware features and is taking advantage of them. This is not trivial, but also not too complex when you think that they only need to support a single hardware configuration.

If pmbench were to be ported to Mac (maybe someone would be interested in taking a swing at that?), I'd expect to see significantly lower software overhead and a much much shorter tail. If all paging operations can be carried out in under 20μs the system would stay responsive even with constant swapping. I think there is enough circumstantial evidence to claim that this is indeed what we see happening on these new Macs.

NotTooLate · Apr 5, 2021

Nice writeup , enjoyed the read !! I feel this thread will be better suited at Ars

, but I love to get more tech talk at MacRumors , I cannot add any tech insight to this thread , but lets see who can.

leman · Apr 5, 2021

NotTooLate said:
Nice writeup , enjoyed the read !! I feel this thread will be better suited at Ars , but I love to get more tech talk at MacRumors , I cannot add any tech insight to this thread , but lets see who can.

Thanks! I am too lazy to join new platforms, I already spend way too much time on MR

You are welcome to cross-post it to whatever forum you think would be relevant.

ylluminate · Apr 5, 2021

One must ask: Is it possible to take advantage of the integrated SSD as a paging (virtual memory) device with the OS installed separately? The bus is so wide on this that it seems to stand up to some reason that you could have a really nice experience for larger demands if you were to isolate the OS from the VM (virtual memory)...

Krevnik · Apr 5, 2021

leman said:
16KB memory pages (so it can process 4x as much memory in a single swap operation compared to x86 systems) and an on-chip SSD controller with a NVMe SSD presumably connected via some sort of proprietary bus

I think it was also noted that due to performance characteristics, 16KB operations are more than 4x faster than 4KB operations on the M1 SSD. So not only do you get the benefit of more data per operation, but those operations have less overhead, amplifying the benefit somewhat.

Sadly, about all that is known about the SSD controller is that it shows up as MMIO registers to the CPU. It may not even be on a proprietary bus as such, but may be integrated into the memory bus directly.

ylluminate said:
One must ask: Is it possible to take advantage of the integrated SSD as a paging (virtual memory) device with the OS installed separately? The bus is so wide on this that it seems to stand up to some reason that you could have a really nice experience for larger demands if you were to isolate the OS from the VM (virtual memory)...

I’m not entirely sure what it is you are actually asking about here. If you mean using the OS on a separate drive so the SSD can be pure swap space, that doesn’t work as well as you’d think. VM handles more than just swap files, and to get better performance, pages are not written to swap if they are marked read-only or unmodified.

One example, code pages. I launch a big app like an MS Office app. Not all the code pages are loaded on boot. In fact, you’ll tend to see that they get faulted in and loaded by the VM, not by app boot, which last time I checked, memory mapped the code instead. These pages are read-only, and are also purgable. In fact, these are some of the cheapest memory pages to purge or lazy-load. So if code pages are sitting idle (say, you never use a particular feature in the big app, or use it rarely), the OS will purge them happily since they can simply be marked free and re-used. Not having to write it to disk makes them faster to purge, and just as fast to reload later. It also saves wear on the SSD.

Same thing with memory mapped files, which macOS/iOS heavily uses in certain places (one area I’m aware of is fonts). If a page from a memory mapped file has been faulted in, but not modified, these are also super-cheap to purge. Since again, you don’t need to write anything out to disk to preserve state.

But if you move your OS onto say, an external USB drive capped at around 1GB/sec vs the >2GB/sec internal SSD, then these above operations suffer from the extra overhead of USB, and the slower throughput of the external drive. And because purging the above is cheaper than memory pages the app is actively using, it will attempt to weight these cheaper pages more heavily, and purge them first. And page faults for these still hold up execution of the app while they complete, so it’s not like you are getting parallelism either.

Short answer is: you are more likely to make the experience worse, if it means putting the OS on a slower or higher latency drive.

poorcody · Apr 5, 2021

leman said:
Basically, the author conclude that the OS overhead of paging is higher than the cost to access the SSD itself.

That is fascinating in and of itself. Are we on the cusp of RAM and storage becoming indistinct?

leman · Apr 5, 2021

poorcody said:
That is fascinating in and of itself. Are we on the cusp of RAM and storage becoming indistinct?

Not at all! Latency accessing data in RAM is around 100ns, latency of fast NVMe SSDs is reported in the paper as 5μs, or 50 times slower.

pshufd · Apr 5, 2021

Someone in China replaced the RAM in an M1 chip with more. The article is in Chinese but it's getting a lot of attention on Reddit. I imagine MacRumors will have an article at some point.

bobcomer · Apr 5, 2021

pshufd said:
Someone in China replaced the RAM in an M1 chip with more. The article is in Chinese but it's getting a lot of attention on Reddit. I imagine MacRumors will have an article at some point.

I don't see how that's even remotely possible as the RAM is on the CPU chip itself...

Andropov · Apr 5, 2021

bobcomer said:
I don't see how that's even remotely possible as the RAM is on the CPU chip itself...

It's on the same package but not on the same die

Bodhitree · Apr 5, 2021

Nice analysis. I was thinking the unified memory would save a lot of holding copies, such as buffers of vertex or texture memory that serve as staging areas for copying to graphics memory. Certainly for games it simplifies a lot of things.

pshufd · Apr 5, 2021

Sample article:

Chinese Cracked Apple M1 Chip, Upgrade RAM and SSD - Regard News

The Apple M1 chip, which came with soldered RAM and SSD modules, was broken by Chinese engineers, the Macbook's storage.....

regardnews.com

bobcomer · Apr 5, 2021

Andropov said:
It's on the same package but not on the same die

True, but how would you get it apart and soldered back together? The Bus for the RAM is part of the M1 package. Oh well, never mind, it's not something easy in any case, so there's no upgrade path for us users.

Gigjobs32 · Apr 5, 2021

Just got a silicon Mac today from Best Buy. Last time I bought a MacBook Pro was in 2011.

pshufd · Apr 5, 2021

bobcomer said:
True, but how would you get it apart and soldered back together? The Bus for the RAM is part of the M1 package. Oh well, never mind, it's not something easy in any case, so there's no upgrade path for us users.

I'd buy a hacked MBA with M1 with 32 GB of RAM. Someone would just have to offer a service to do so. Sure, they have some expertise and maybe specialized tools to do so. I don't know how big those RAM chips could get but it would be great with 32 GB of RAM. Of course I could just wait another six or seven months for the real thing.

bobcomer · Apr 5, 2021

pshufd said:
I'd buy a hacked MBA with M1 with 32 GB of RAM. Someone would just have to offer a service to do so. Sure, they have some expertise and maybe specialized tools to do so. I don't know how big those RAM chips could get but it would be great with 32 GB of RAM. Of course I could just wait another six or seven months for the real thing.

I think I'd rather wait! I want more cores and a different form factor too. (Maybe a Mini Pro?) I doubt if I could afford it though.

poorcody · Apr 5, 2021

leman said:
Not at all! Latency accessing data in RAM is around 100ns, latency of fast NVMe SSDs is reported in the paper as 5μs, or 50 times slower.

Well, okay, and maybe I've been around too long, but 50x doesn't sound all that big to me anymore compared to where we used to be. (And the gap is closing faster than linearly.) It wasn't long ago that it would have been pretty hard to write an algorithm so inefficient that it could even compete with the overhead of writing to storage.

pshufd said:
Sample article:

Interesting, thanks. I'm surprised those chiplets can be cracked open and pieces swapped out so "easily." I thought they were more "hard-wired" so to speak.

Internet Enzyme · Apr 5, 2021

leman said:
The topic on whether M1 "get's more of it's RAM" has sparkled a lot of heated debate both on these forums and other platforms. This discussion is driven by user testimonies who can do work with a 8GB config that would have allegedly brought an Intel Mac to a crawl, and various YouTube vides (especially by Max Tech) that show M1 machines being surprisingly responsive under RAM pressure situations. There are some who claim that 8GB on M1 give you as much benefit as 16GB on Intel machines, those who claim that this all is completely ridiculous, and finally those who claim that perceived differences can be explained by a) M1 machines generally being snappier and b) macOS on M1 having more agile memory paging (swap). Particularly the last feature would allows the OS to quickly load user-facing applications in and out of the RAM to the SSD with very little interruption, creating the illusion that the machine has more RAM than it actually has.

I don't really have the means to "solve" this discussion, but I did come across something interesting that could give us a possible way to approach the problem scientifically. There is a recent tool called pmbench that aims to analyze the performance of OS paging. Unfortunately, it is only available for Linux and Windows, but the accompanying 2018 paper by Yang & Seymour focuses on pain using low-latency SSDs (like the one's found in modern Macs) and offers some very interesting discussion. Some quotes from the paper (highlighted by me).

Section 4.2.:

Section 4.3.:

Basically, the author conclude that the OS overhead of paging is higher than the cost to access the SSD itself. Furthermore, they have observed many spikes in access latency across the tests (over 10% of page faults took more than 1ms to process). Now, these delays are significant. If you have multiple (dozens or hundreds, which is quite possible when switching between apps in a memory starved system) page faults in a rapid succession, these delays can accumulate to fractions of a second, which will absolutely wreck you user experience. And these are not hardware problems — it's all just the software overhead.

So let's get back to Apple Silicon Macs. Now, I can't run pmbench on macOS, so I can't verify any of this, but I don't think it would be that far-fetched to assume that Apple was aware of these problems and that they have optimized both hardware and software of their new Macs for low-latency swapping. Things that we do know is that M1 features low-latency hardware interrupts (with the overhead of just few machine cycles), 16KB memory pages (so it can process 4x as much memory in a single swap operation compared to x86 systems) and an on-chip SSD controller with a NVMe SSD presumably connected via some sort of proprietary bus. If this is not a setup for low-latency swap, I don't know what would be. All that it takes is OS software that is aware of these hardware features and is taking advantage of them. This is not trivial, but also not too complex when you think that they only need to support a single hardware configuration.

If pmbench were to be ported to Mac (maybe someone would be interested in taking a swing at that?), I'd expect to see significantly lower software overhead and a much much shorter tail. If all paging operations can be carried out in under 20μs the system would stay responsive even with constant swapping. I think there is enough circumstantial evidence to claim that this is indeed what we see happening on these new Macs.

I purchased the 8GB config based off of some Max Tech videos that you briefly mentioned, and what I’ve found is that 8GB of RAM does not play well with Rosetta in a number of circumstances. Simply through my own anecdotal evidence, it appears that Rosetta, irrespective of how technically miraculous it is, is dog-slow in comparison to native universal code when it comes to managing RAM. This makes even more sense to me after reading those excerpts you included, and perhaps makes me think that, even if I had a 16GB MBA, that I would still see these occasional beachballs. Also, notably—although I am speaking extemporaneously with all of this—it seems like this insane beach-balling occurs whenever you reset the SMC and reboot an M1 Mac, as when viewing Activity Monitor, one can see several processes—whose names I sadly forget at this moment—that consume a ridiculous amount of RAM. These processes, upon a Google search, are owned by Rosetta. I can’t wait until all apps are universal and Rosetta can be abandoned. From my experience with my 8GB M1 MBA, it seems like Rosetta is responsible for the occasional slowdowns that I have witnessed within what has otherwise been a pretty zippy and fast experience.

pshufd · Apr 5, 2021

Internet Enzyme said:
I purchased the 8GB config based off of some Max Tech videos that you briefly mentioned, and what I’ve found is that 8GB of RAM does not play well with Rosetta in a number of circumstances. Simply through my own anecdotal evidence, it appears that Rosetta, irrespective of how technically miraculous it is, is dog-slow in comparison to native universal code when it comes to managing RAM. This makes even more sense to me after reading those excerpts you included, and perhaps makes me think that, even if I had a 16GB MBA, that I would still see these occasional beachballs. Also, notably—although I am speaking extemporaneously with all of this—it seems like this insane beach-balling occurs whenever you reset the SMC and reboot an M1 Mac, as when viewing Activity Monitor, one can see several processes—whose names I sadly forget at this moment—that consume a ridiculous amount of RAM. These processes, upon a Google search, are owned by Rosetta. I can’t wait until all apps are universal and Rosetta can be abandoned. From my experience with my 8GB M1 MBA, it seems like Rosetta is responsible for the occasional slowdowns that I have witnessed within what has otherwise been a pretty zippy and fast experience.

Ideally, you could run Rosetta to translate the entire x86 executable to AS and then it should run close to native speed. FX!32 did this (as well as translate on the fly).

leman · Apr 5, 2021

pshufd said:
I'd buy a hacked MBA with M1 with 32 GB of RAM. Someone would just have to offer a service to do so. Sure, they have some expertise and maybe specialized tools to do so. I don't know how big those RAM chips could get but it would be great with 32 GB of RAM. Of course I could just wait another six or seven months for the real thing.

They only replaced the 8GB with 16GB, and I wouldn’t be surprised if the SSD was taken out from another M1 Mac. It’s a nice way to show off your skills with soldering, but that’s about it.

leman · Apr 5, 2021

Internet Enzyme said:
Simply through my own anecdotal evidence, it appears that Rosetta, irrespective of how technically miraculous it is, is dog-slow in comparison to native universal code when it comes to managing RAM.

I can certainly see a possibility. Rosetta has to use 4KB memory pages to emulate x86 behavior, so it could be on a slow path when it comes to paging...

pshufd · Apr 5, 2021

leman said:
They only replaced the 8GB with 16GB, and I wouldn’t be surprised if the SSD was taken out from another M1 Mac. It’s a nice way to show off your skills with soldering, but that’s about it.

I wonder if the RAM is standard - that is if you could just buy them from an industrial supplier. I've never seen that form-factor available at Amazon.

Runs For Fun · Apr 5, 2021

leman said:
If all paging operations can be carried out in under 20μs the system would stay responsive even with constant swapping. I think there is enough circumstantial evidence to claim that this is indeed what we see happening on these new Macs.

This was always my theory on this. With the faster SSDs, swapping isn't going to slow everything down as much as the previous generation of SSDs. Assuming this is the case I would expect to see the same performance if Apple were to put these faster SSDs in an Intel system. This would prove that, no, an M1 doesn't do magic.

profcutter · Apr 5, 2021

I suspect the folks who upgraded the RAM ran into the same problem as the folks who tried to get 32GB RAM into a 2015 MacBook Pro. The firmware doesn’t support that kind of RAM configuration, so while it’s physically possible, it’s pointless because the machine will never see it, and may not even boot.

profcutter · Apr 5, 2021

2012-2015 rMBP RAM upgrade interest?

Hello, I am posting a thread to see how much, if any, interest there would be for us to offer a mail-in RAM upgrade process for 2012-2015 rMBP (non-touch bar models). We currently offer the 802.11ac upgrade cards for unibody MBPs (just check my post history), so we are on a mission to bring out...

forums.macrumors.com

Concerning the "M1 is better with RAM" myth

macrumors Core

macrumors 6502

macrumors Core

macrumors regular

macrumors 601

macrumors 65816

macrumors Core

macrumors G4

macrumors 601

macrumors 6502a

macrumors 68020

macrumors G4

macrumors 601

macrumors newbie

macrumors G4

macrumors 601

macrumors 65816

macrumors 65816

macrumors G4

macrumors Core

macrumors Core

macrumors G4

macrumors 65816

macrumors 68000

macrumors 68000

Our Staff