Thinking different about the role of storage and RAM in Apple Silicon

quarkysg · Dec 10, 2020

ADGrant said:
I see no reason why Big Sur would treat swap and memory any differently between Intel and Apple Silicon. That would be a significant and unnecessary change to the OS kernel and Apple had plenty of other things to work on.

Well, Apple most definitely started the ARM builds years ago, so I suppose it's probably part of the design road map of macOS for ARM. OS design for different CPU and platform architectures likely demand slightly differing strategies from a performance perspective, and also borrowing from lessons learnt from iOS.

ADGrant said:
BTW almost all current Intel Macs have a T2 chip (I think the 21" iMac is the only exception) and the SSD on a T2 Mac is always encrypted (the T2 chip is a modified A10. running something called bridge os). For older Intel Macs without the T2 chip, encryption is an optional feature. One significant difference between the T2 and other Intel Macs is they are SSD only. Many of the non-T2 Macs shipped with hard drives.

Big Sur supports Macs released from 2013, so it's a fairly big set of Macs to support, with a big gap in performance characteristics. So ensuring a consistent experience for Intel based Macs will be a lot more challenging. The biggest difference is, as you've pointed out, is that not all supported Macs have SSDs as standard, and even then, not all Macs that shipped with SSDs are performant enough. Encrypting the Macs storage (regardless of whether it's magnetic or SSDs storage) using CPU will result in high storage access latencies and lower thruput. So treating swap space as additional 'RAM' is probably out of the question, even for situations that can tolerate lower thruput and higher latencies.

Anyway, Big Sur's low level behaviour could be similar for both Intel and M1 builds, but we'll probably never know unless someone from the core macOS team decides to chime in on the discussion

.

t0pher · Dec 11, 2020

ADGrant said:
What is inaccurate is your suggestion "that the M1 doesn’t need as much RAM as older systems like we’ve seen from Intel, AIM etc" and that the M1 unified memory architecture includes the SSD.

The Unified Memory architecture refers to the CPU, GPU and other co-processors on the SOC. iOS memory management does not work the same as MacOS memory management (iOS does not swap data from memory to storage) but that doesn't mean that MacOS on ARM manages memory like iOS. It works the same way as MacOS on Intel because it's the same operating system.

The 8GB M1 Macs have less total memory than the video memory on some of the high end Intel Macs. Therefore, tasks requiring a lot of GPU memory will run faster on those Macs.

i never suggested or wrote that the Unified Memory includes the storage

t0pher said:
As the storage on M1 is so much faster than traditional spinning disks or even most aftermarket ssd‘s and the M1 has the notion of unified memory, and also remember RAM is really where the CPU caches data from storage so it can access it quicker and process the instructions quicker, M1 is able to put into swap or just read from where it already is in storage bits of programs that would normally be read into RAM.

The SoC does integrate the storage controllers for faster access to storage.

Apple unleashes M1

Apple today announced the biggest leap forward for the Mac with M1, the first system on a chip designed specifically for the Mac.

www.apple.com

A high-performance storage controller with AES encryption hardware for faster and more secure SSD performance.

t0pher · Dec 11, 2020

ADGrant said:
ADGrant said:

I see no reason why Big Sur would treat swap and memory any differently between Intel and Apple Silicon. That would be a significant and unnecessary change to the OS kernel and Apple had plenty of other things to work on.

Click to expand...

Intel is CISC x64 and Arm is RISC & a completely different architecture which does things differently to x64. Apple have optimised their Arm implementation to their needs so why wouldn't they do things differently on Arm if it resulted in better performance?

theluggage · Dec 11, 2020

t0pher said:
Intel is CISC x64 and Arm is RISC & a completely different architecture which does things differently to x64.

First - the 1990s called and want their RISC vs. CISC debate back. That war is over folks, and - spoiler - RISC won, CISC really doesn't mean much nowadays and it's really just x86 vs. everything else. Even x86 chips assimilated RISC design principle and are now, effectively, RISC-like cores driven by a hardware x86-to-internal RISC instruction translator.

A significant part of ARM/ASi's advantage is simply dumping that translator stage, and other legacy stuff reaching back to the 1970s, simplifying and streamlining everything, allowing more cores, more powerful GPUs and other accelerator gizmos (Neural engine, etc) in the same space/thermal constraints.

There are, of course, a million details that differ between x86 and ASi but there's no real fundamental architectural difference between the way they use and address system RAM, virtual memory and storage. Randomly accessing data in system RAM (that's what the R stands for) is still vastly faster than SSD swap (especially for write operations) and if a job on your 8GB M1 is merrily swapping away then it is gonna be hosed by a M1 with 16GB or a M2 with 64GB.

The trouble with anything more radical, architecture wise, is that it would also call for an equally radical change in operating systems and software. ASi is doing it's stuff with basically the same old MacOS that runs on Intel, the same old software written for Intel Macs and, at the moment, mostly running via x86-to-ARM translation by Rosetta.

Hardware changes that need a radical change in software design don't usually end well: See the graves of Itanium, Cell, Transputer... even the original ARM, back in the days when anything that didn't run on DOS/Windows was "radical", had to spend 20 years hiding in an embedded computing/set-top box ghetto...

Toutou · Dec 11, 2020

t0pher said:
Normally and for decades, programs and data are copied from storage into RAM and CPU’S process those programs and data from RAM writing to disk when the data needs to saved.

This is a simplified overview of how most processes get and persist their data, yes.

t0pher said:
The whole point of copying from storage to ram is to reduce the time the cpu has to wait to read the data it needs to perform its work.

t0pher said:
RAM is effectively a faster cache for the storage

These both are incorrect. RAM is a hardware implementation of what we call "main memory". It's basically the only memory the computer cares about. CPU reads instructions and accesses any other data from the main memory. Your IO devices, be it your SSDs, HDDs, peripherals, networks interfaces or anything else, are managed by the kernel and are not really relevant to the high-level architecture of CPU + memory.

t0pher said:
Traditionally reading from RAM was thousands of times quicker than reading from storage.

And it still is. Latency is extremely important and accessing a random file (including the data in your swap partition/file) takes AGES compared to accessing stuff from the main memory (especially when it's already cached).

t0pher said:
programs contain lots of data but most of that data isn’t used most of time, consuming RAM that could be used by something else.

Well, not really, the actual binaries are tiny, a few kilobytes to some tens of megabytes for huge stuff like kernels. Most of the process' memory footprint is what's called the "heap". This is a place where the process can do its stuff, allocate objects, keep their internal states, change stuff around. You can't really put this into swap, because the process NEEDS its heap in order to do basically anything. Then there's the thing called "garbage collection" that, again, absolutely needs to have the whole heap in main memory (RAM) in order to run effectively.

What I'm trying to say is: Most of the stuff in your RAM is, at any moment, what the running processes need to have in RAM. If they didn't need it, it would've been swapped already. It's what operating systems do right now. The processes themselves have no say in what's going into RAM and what's swapped. All they see is virtual memory that is transparently split into pages and swapped in and out of RAM by the kernel. We already have this.

t0pher said:
also remember RAM is really where the CPU caches data from storage so it can access it quicker and process the instructions quicker, M1 is able to put into swap or just read from where it already is in storage bits of programs that would normally be read into RAM.

No, again, RAM is not a cache. RAM is the main memory, the ONLY memory the CPU even knows about. Anything you want the CPU to see or execute absolutely needs to go through RAM. This is how we build our computers. This is what AMD machines do, what Intel machines and also what Apple Silicon machines do. Whole concepts of segmentation, virtual memory, process isolation and filesystems are built upon the notion of a CPU and its memory. The slow persistent storage (tapes, HDD, SSDs) and faster CPU caches are both mere additions to the basic concept. You can absolutely run a computer without any persistent storage and everything will work as usual.

t0pher said:
Programs can run from storage (virtual memory is an example) programs and data is loaded into RAM so the CPU has faster access to it and not wasting cycles waiting for data from slow storage.

t0pher said:
traditional thinking was always storage -> RAM -> CPU, its still that but what is loaded into RAM and what stays on storage isn't exactly the same now.

Again, these are very inaccurate and now you know why. Also "virtual memory" is a different concept and NOT an example of how "programs can run from storage" (which they can't).

t0pher said:
so long as apple silicon and OS X is smart enough to only use RAM for what is actually used then less ram is needed to quickly feed the cpu and typically the disk is fast enough to swap into RAM any bits that are needed to the extent that the user typically won’t notice

t0pher said:
if there is gigs of stuff in RAM that isn’t read frequently it doesn’t need to be in ram It can be in lower tier access like quick storage

t0pher said:
this all means that the M1 doesn’t need as much RAM as older systems like we’ve seen from Intel

What you're trying to say is basically "you won't notice swapping nearly as much when you're swapping onto a really fast SSD", and maybe "SSD equiped systems run faster than HDD ones".

Both of those ideas are absolutely correct, but they are not inherent to M1 Macs or even Macs in general. macOS does have pretty efficient RAM management, yes, but you can get a surprisingly lean and well-tuned Linux environment that negates this. You can get ultra fast SSDs and run Windows off them, which, too, allows you to get away with less RAM.

Maybe memory access is faster on M1 chips, maybe the kernel does something even smarter on Apple Silicon, so the machines run really nice with 8 gigs of RAM, but it's not because we've changed the basic computer architecture that has been around for decades. We're not fusing SSDs and RAM together and there hasn't been a revolution.

Krevnik · Dec 11, 2020

quarkysg said:
The use of UMA also probably freed up system memory that is required to be reserved for Intel based Macs using iGPUs. Having said that, if a workload requires more RAM, it'll need more RAM. No two ways about it. It's encouraging tho. that we have reports from many actual users reporting that the 8GB base models M1 Macs being sufficient for their needs.

It does free up some memory, but you won’t get it all back switching to UMA from Shared. First, you still need to use some of that for texture and frame buffer memory like you were before. Second, recent Intel iGPUs appear to actually use a dynamic allocation for GPU memory, adjusting to need, but maxing out around 1.65GB. But in most cases its using a lot less when just moving around macOS.

So yes, there are definitely savings, but I suspect it’s not quite as big as people expect. But those on 8GB systems are reaping the biggest benefit from the change, with 16GB systems seeing a smaller difference.

Phil A. said:
The problem with that approach is that the reason iOS is so frugal with RAM is that it suspends apps in the background after a few seconds and releases the memory if necessary. That wouldn’t be feasible or acceptable on MacOS

Correct. iOS tends to outright purge, and doesn’t use a swap file, last I checked. With iOS, developers have to save state explicitly and then restore that state on app launch. With macOS, swap is used as a way to keep apps launched and ready. One one hand, iOS writes less out to disk, so less I/O is potentially required to restore state. On the other, macOS doesn’t require applications restore their state manually.

M1 uses the macOS approach.

t0pher said:
what bit is inaccurate?

Programs can run from storage (virtual memory is an example) programs and data is loaded into RAM so the CPU has faster access to it and not wasting cycles waiting for data from slow storage.

This bit is inaccurate for a start, since you misunderstand what virtual memory is, and the purpose of RAM. Yes, RAM is faster than storage, but the CPU can only directly address RAM. RAM is also the “working storage” for all processes. Any data I’m working on has to be in RAM for the CPU to be able to do anything with it. If I have application state, that has to exist in RAM as well.

Virtual memory is a neat trick to let the kernel step in and help manage RAM pages, enabling the swap file to exist, and for the system to use more memory than physically exists in the system as RAM. Yes, there are also other neat little features like memory mapped files, but keep in mind that when you access part of a memory mapped file that’s not already in RAM, the kernel has to step in, read the data from disk into RAM, and update the CPU’s memory map to let it know what it changed.

That said, one of the nice things about memory mapped application binaries that I like is that it means you don’t need the entire application code in RAM at the same time. For very large binaries like projects I’ve worked on, this means you can have something like 50MB in your TEXT segment (the bit that holds the compiled machine code), but then only have say, 16MB of it in RAM because the user is only using a fraction of what the application can do. I’ve worked on projects where we optimized our builds to take advantage of this fact back when stuff like the iPad 2 was still a common device folks used, and having only 512MB of RAM with a 50MB binary is a big deal.

t0pher said:
Just witness how much more useable older machines are when the HDD is swapped with a speedy SSD, programs load into RAM faster and launch quicker.

True. A lot of this has to do with the much better latency of SSDs when having to read/write RAM pages. Because every time the kernel needs to “fault” in order to move pages into memory, processes are frozen. So good latency is a must.

t0pher said:
have a read of RobbieTT's post

his 8GB M1 reduced its RAM pressure after 2 weeks of operation

M1 Real-world performance, multitasking & beachballs

I'm a few weeks in to the M1 experience and before getting into my thoughts on actual performance I must offer a few alibis & excuses. I am probably not the average user (whatever that is) - my machines have to work hard most of the time and work together with my wider network of Macs. My...

forums.macrumors.com

The M1's appear to be operating differently to what conventional wisdom says they should.

Anecdotes are not data. Especially when talking about a single data point with an unreproducible workload spread out over a week. There would need to be hundreds to thousands of runs of this sort of test to deal with the statistical noise.

I’ve observed similar behavior in Intel machines prior to Big Sur, as Apple has updated the macOS memory manager to be more aggressive pushing things to swap when not needed. A lot of that being driven by the easier access to NVMe SSDs which provide some very good latencies to enable this sort of behavior.

The M1 has no new trick up it’s sleeve to magically reduce memory pressure here. It’s using RAM, Swap and Memory Mapped files the same way as under Intel. Honestly, probably the biggest change to M1 memory management is that it now uses 16KiB memory pages instead of 4KiB which have been standard for ages. That makes operations where pages need to be read from, or written to, disk more efficient, and it will cut down on the number of page faults, improving performance that way.

tdar · Dec 11, 2020

As we have seen in other contexts, there is a hole in our understanding of the ways that Apple Silicon is different from the past. This is a entirely new architectural design and it includes things that a cpu has never had before. Things like the neural engine, that allows for onboard Ai and ML. These SoC’s seem to have the ability to learn the workload that they are running and improve memory management as they go. They seem to be able to calculate the performance cost of swapping to very fast SSD connected to fabric over system RAM.
There is much that is not understood about how they work internally. Not at the programming level but at the Architectural level.
But one thing is clear, they work differently. Leave your intel expectations at the door.

Krevnik · Dec 11, 2020

tdar said:
As we have seen in other contexts, there is a hole in our understanding of the ways that Apple Silicon is different from the past. This is a entirely new architectural design and it includes things that a cpu has never had before. Things like the neural engine, that allows for onboard Ai and ML. These SoC’s seem to have the ability to learn the workload that they are running and improve memory management as they go. They seem to be able to calculate the performance cost of swapping to very fast SSD connected to fabric over system RAM.
There is much that is not understood about how they work internally. Not at the programming level but at the Architectural level.
But one thing is clear, they work differently. Leave your intel expectations at the door.

It's been said time is a flat circle.

For those of us who were just old enough to remember the rise of the current PC architecture, or played with older machines that predate the current PC architecture when we were young, recognize that Apple's really going back to the beginning in a lot of ways, where computers were effectively custom-built and designed as a unit with specific I/O in mind.

So is it a new architectural design? Not really. A lot of what Apple is doing is stuff I learned in college a couple decades ago. It's more that Apple has decided to throw out the assumption that the computer needs to be a modular beast and is reaping the benefits from going back to tightly integrated computers.

That said, I do think you might have a point about ML. Two possible catches there though:

1) Doing any sort of ML computation during a page fault is going to be expensive, so any ML is more likely to be used as part of a preemptive "garbage collection" meant to find pages that can be swapped out before the RAM is even needed. Something the Intel systems already do, but potentially not as efficiently.

2) To do this work, the kernel needs the driver to be able to feed models to the neural engine, which would be new work. Some footprint for this should show up in the Darwin OSS project once the Big Sur source is available, so someone could dig in and find out what's going on then. For the most part, Apple has been using ML in user land system services, and Core ML is a user land framework.

At least in principle, there's nothing stopping Apple from using ML in this same way on Intel systems, using more limited models to keep performance reasonable. And to be honest, you wouldn't really want a complicated model here anyways.

mr_roboto · Dec 11, 2020

It might be helpful for those not versed in computer architecture to put some numbers on things.

A decent ballpark figure for the latency of a random DRAM read access is 50ns. This is actually roughly constant across any interface technology, whether it's LPDDR4X or DDR3 or HBM. It's just a fundamental limit of the underlying memory technology; the number has moved very little for decades.

It's harder to track down exact numbers for NAND flash, but it too has certain fundamental limits. As far as I can tell, 10µs random read latency would be a very fast flash chip.

10e-6 / 50e-9 = 200x. That's an enormous ratio. And I was finding lots of references to flash latencies more along the lines of 50µs (or worse), which gives a 1000x ratio.

But wait, it gets worse! Unlike DRAM, you can't access any byte of flash on demand. Instead you have to read an entire page (multiple kilobytes). That 10µs or 50µs is for the first byte to come back, so when the data you need is at the end of the page, too bad. You have to wait for everything else to come out of the memory before you get it.

That's fine for emulating a disk drive, where you're dealing in large blocks of bytes, but not great for CPUs, which want to read anything at any time.

Writing to flash is even worse. Without going into way too much detail, random write latency is both much worse and much more variable than read latency. The only way it's made tolerable on SSDs is through buffering: the SSD has some local volatile memory which buffers up incoming writes and the drive then starts working on making them actually happen.

Flash is just impractical to use like RAM, which is why Apple hasn't done that. M1 SSDs still show up as NVME devices in System Profiler, just like T1, T2, and third party SSDs do on Intel Macs. Swap still works exactly the same way as before.

t0pher · Dec 16, 2020

There are some reports of significant SSD speeds from these M1 macs with speculation that the storage controller built into M1 is perhaps responsible

How fast is an M1 Target Disk?

How to connect your M1 Mac in Target Disk mode, avoiding an endless restart loop, and how fast to expect it to perform. Plus more on benchmarks.

eclecticlight.co

looks like reads in chunks of 8MiB through 64 MiB are seriously fast, reads are consistent from 64 KiB through 64MiB.

i look forward to the further analysis.

macnmac · Dec 16, 2020

hasnt swapping been around for years already?

t0pher · Dec 16, 2020

and this little gem from

How fast is the SSD inside an M1 Mac?

How fast is the internal SSD? Benchmarks range from around 2.7 GB/s up to nearly 20 GB/s. Which are right? Some new figures, in need of more measurements.

eclecticlight.co

definitely something very different going on here

Krevnik · Dec 16, 2020

t0pher said:
and this little gem from

How fast is the SSD inside an M1 Mac?

How fast is the internal SSD? Benchmarks range from around 2.7 GB/s up to nearly 20 GB/s. Which are right? Some new figures, in need of more measurements.

eclecticlight.co

definitely something very different going on here

In terms of swap behavior, the key indicators are 4KiB (Intel) and 16KiB (M1). Those are the page sizes for macOS on the two architectures, so page faults will be using those read/write sizes. Some of the other read metrics are important, but in more specific scenarios where we might be loading many pages into memory at once and can issue larger bulk reads instead of treating them as a series of page faults. App launch comes to mind as a good candidate for this.

So two things to point out:

The 16KiB read speeds are pretty similar between the two platforms. While the 16KiB write on the M1 is much faster. Makes me wonder what sort of write buffer Apple has here.
The 16KiB read speeds on both platforms look about 4x faster than the 4KiB.

So the general take-away is that if I have to fault pages in on the M1, it takes about the same time to fault in a 16KiB page on the M1 as it does a 4KiB page on Intel. So there's two ways a process benefits. One is that they page fault less frequently when working with small chunks that might be nearby other small chunks. Second is that when a process reads in a larger chunk of data that lives in swap, say a 128KiB buffer, the M1 needs only 8 page faults, while the Intel system needs 32. So the M1 will still read that larger buffer into RAM 4x faster, despite not having a distinct read speed advantage at 16KiB read sizes vs Intel.

The higher write speed also probably helps, to a point. But since macOS aggressively writes to swap before it needs to, then it's more likely that a page fault will only ever be reading a new page into RAM, and not doing both a write and read. So the write speed at 16KiB is probably not as big a deal as it seems at first.

But there's read/write latency that would be important here too. If the M1 SSD controller has good latencies at 16KiB, then that's another way to pull even further ahead in swap performance over Intel.

t0pher · Dec 17, 2020

Krevnik said:
In terms of swap behavior, the key indicators are 4KiB (Intel) and 16KiB (M1). Those are the page sizes for macOS on the two architectures, so page faults will be using those read/write sizes. Some of the other read metrics are important, but in more specific scenarios where we might be loading many pages into memory at once and can issue larger bulk reads instead of treating them as a series of page faults. App launch comes to mind as a good candidate for this.

So two things to point out:

The 16KiB read speeds are pretty similar between the two platforms. While the 16KiB write on the M1 is much faster. Makes me wonder what sort of write buffer Apple has here.

The 16KiB read speeds on both platforms look about 4x faster than the 4KiB.

So the general take-away is that if I have to fault pages in on the M1, it takes about the same time to fault in a 16KiB page on the M1 as it does a 4KiB page on Intel. So there's two ways a process benefits. One is that they page fault less frequently when working with small chunks that might be nearby other small chunks. Second is that when a process reads in a larger chunk of data that lives in swap, say a 128KiB buffer, the M1 needs only 8 page faults, while the Intel system needs 32. So the M1 will still read that larger buffer into RAM 4x faster, despite not having a distinct read speed advantage at 16KiB read sizes vs Intel.

The higher write speed also probably helps, to a point. But since macOS aggressively writes to swap before it needs to, then it's more likely that a page fault will only ever be reading a new page into RAM, and not doing both a write and read. So the write speed at 16KiB is probably not as big a deal as it seems at first.

But there's read/write latency that would be important here too. If the M1 SSD controller has good latencies at 16KiB, then that's another way to pull even further ahead in swap performance over Intel.

according to the charts it looks like it would be more efficient to read in bigger chunks starting from 2MiB.

Toutou · Dec 17, 2020

t0pher said:
according to the charts it looks like it would be more efficient to read in bigger chunks starting from 2MiB.

This effect (more throughput with larger reads) was also present and much more pronounced in HDDs, because the seek time and the rotational latency was there no matter how big a chunk you were trying to read.

So the idea of loading big chunks of data to RAM at once is mostly correct, but in order to enjoy the sweet efficiency for swapped-out data you would need to increase the page size even more, to or above 2 MiB, so that a page fault results in the 2 MiB read that we know will be fast. The problem is that page size is always a compromise between fast and memory-efficient. A process that allocates a few kilobytes of memory here and there and runs with a total memory footprint of under a megabyte on a 16KiB page-sized system can very well consume tens of megabytes on a system where each tiny allocation effectively sits in the middle of a huge 2 MB page. And with more than two hundred processes running on a clean, idling macOS install, that's a lot of wasted memory.

Krevnik · Dec 17, 2020

Toutou said:
So the idea of loading big chunks of data to RAM at once is mostly correct, but in order to enjoy the sweet efficiency for swapped-out data you would need to increase the page size even more, to or above 2 MiB, so that a page fault results in the 2 MiB read that we know will be fast. The problem is that page size is always a compromise between fast and memory-efficient. A process that allocates a few kilobytes of memory here and there and runs with a total memory footprint of under a megabyte on a 16KiB page-sized system can very well consume tens of megabytes on a system where each tiny allocation effectively sits in the middle of a huge 2 MB page. And with more than two hundred processes running on a clean, idling macOS install, that's a lot of wasted memory.

Yup, memory fragmentation is a real annoyance. And the larger page sizes also means higher likelihood of unrelated allocations being on the same page. Makes it harder to evict pages from RAM, and makes it more likely you have to swap pages back in sooner than with more appropriately sized pages.

If you went to really large page sizes, you'd probably want to use handles instead of pointers so you could compact the heap to avoid some of the issues mentioned. But that sort of heap management went away with the advent of memory virtualization in the first place. Performance of using pages and swap just beats having to do heap compaction any day of the week. Plus it enabled new tricks that we still use today to deal with larger binaries and the amount of RAM spent on code pages, making it possible to keep chunks of binaries out of RAM entirely until needed, rather than all-or-nothing loading a library or executable.

pshufd · Dec 17, 2020

t0pher said:
what bit is inaccurate?

Programs can run from storage (virtual memory is an example) programs and data is loaded into RAM so the CPU has faster access to it and not wasting cycles waiting for data from slow storage. The CPU caches frequent instructions in its onboard caches which is typically measured in MB or KB not GB. CPU's wait far less for their onboard cache than from RAM.

Just witness how much more useable older machines are when the HDD is swapped with a speedy SSD, programs load into RAM faster and launch quicker.

its all about getting data into the CPU as quick as possible, traditional thinking was always storage -> RAM -> CPU, its still that but what is loaded into RAM and what stays on storage isn't exactly the same now.

have a read of RobbieTT's post

his 8GB M1 reduced its RAM pressure after 2 weeks of operation

M1 Real-world performance, multitasking & beachballs

I'm a few weeks in to the M1 experience and before getting into my thoughts on actual performance I must offer a few alibis & excuses. I am probably not the average user (whatever that is) - my machines have to work hard most of the time and work together with my wider network of Macs. My...

forums.macrumors.com

The M1's appear to be operating differently to what conventional wisdom says they should.

It depends on what you're doing. Adding a lot of RAM to old machines may result in much better performance compared to adding an SSD.

I did just that recently on a Late 2009 iMac. I upgraded it from 4 GB to 16 GB. So all of the programs and a lot of frequently used files are in memory. No need to go to the Hard Disk Drive at all for the vast majority of work. HDD operations are slow but I rarely do HDD operations.

I upgraded a 2008 Dell XPS Studio 435mt to 48 GB of RAM and that thing is quite usable given its age. It has an i7-920 so far more CPU horsepower than the Late 2009 iMac and 48 GB of RAM means that everything is in memory.

My current i7-10700 desktop:

Could I run this with 16 GB of RAM and swap like mad? Sure. But why bother when RAM is dirt cheap.

ChrisA · Dec 17, 2020

t0pher said:
Normally and for decades, programs and data are copied from storage into RAM and CPU’S process those programs and data from RAM writing to disk when the data needs to saved. The whole point of copying from storage to ram is to reduce the time the cpu has to wait to read the data it needs to perform its work. Traditionally reading from RAM was thousands of times quicker than reading from storage.

The average time to access a given piece of data "T" is close to this simplified model

T = Pr * Tr + Ps * Ts

Where:
Pr = probability the data is in RAM
Ps = probability the data is in Storage
Tr = Access time for RAM
Ts = Access time for Storage

Note that in our simplified model Pr + Ps = 1.0

What the OP said in effect is the if Tr and Ts are very close then we need not care about the values of Pr and Ps. This is obviously true. But today with our current SSD and RAM Tr is still maybe three orders of magnitude faster than Ts. The imbalance used to be much worse.

To a first approximation, you can DOUBLE that value of Pr by doubling the size of RAM. This almost cuts T in half.

RAM still matters as long as the Ts/Tr ratio is large and it will always be not less than about 100.

Why the above hardly matters... After saying the above, for many tasks speed is unimportant. If you are watching a 10 minute Youtube clip it is going to take 10 minutes no matter how low is "T". Same goes with web browsering. You have to wait fo the page to download from the server. MOST things that normal users do don't need much speed.

What does need speed is most kinds of content editing, where the computer is actually processing tons of data Simply scrolling 6 tracks of 4K video goes better with more RAM cache

t0pher · Dec 18, 2020

ChrisA said:
What does need speed is most kinds of content editing, where the computer is actually processing tons of data Simply scrolling 6 tracks of 4K video goes better with more RAM cache

i wouldn't disagree, yet a lot of pro's are amazed at the capabilities of the 8GB M1 despite its lack of RAM.

DaVinci Resolve + M1: Can it edit 8K RAW?

armoured · Dec 18, 2020

ChrisA said:
The average time to access a given piece of data "T" is close to this simplified model

T = Pr * Tr + Ps * Ts

Where:
Pr = probability the data is in RAM
Ps = probability the data is in Storage
Tr = Access time for RAM
Ts = Access time for Storage

Note that in our simplified model Pr + Ps = 1.0

What the OP said in effect is the if Tr and Ts are very close then we need not care about the values of Pr and Ps. This is obviously true. But today with our current SSD and RAM Tr is still maybe three orders of magnitude faster than Ts. The imbalance used to be much worse.

To a first approximation, you can DOUBLE that value of Pr by doubling the size of RAM. This almost cuts T in half.

RAM still matters as long as the Ts/Tr ratio is large and it will always be not less than about 100.

Why the above hardly matters... After saying the above, for many tasks speed is unimportant.

Thanks, I found that a helpful framework to think this through (although I think mathematically you have to say that Ps is halved when you double ram, Pr can't go above 1, meaning a different shape for the diff of Pr).

Thinking through the implications I think the other key insight from that to start is that the Ps isn't stable, it's very low in "light" use mode, but starts to become noticeable (swapping) in some intermediate mode, and starts grinding (heavy swapping) when Ps gets to some level (plug in a number, but say anything above 40%). The nature of the Pr + Ps = 1 means that slowdowns can get worse very rapidly i.e. the deterioration in performance from Ps = 20% to 40% may be dramatic.

Users who are always in light mode never have an issue; those who are in medium infrequently / almost never in heavy are probably happy with the amount of ram they have. Those who are in medium often / heavy even once a day really have too little ram. Goal for anyone should be to have the time they spend in heavy be close to zero.

I _think_ the other part of this is that speeding up storage (Ts) becomes most noticeable in the Medium range (or alternatively pushes the 'starts to become noticeable' point further out.

(I was thinking of extending your framework but ended up with too many subjective interrelated concepts, i.e. 'starts to become noticeable' would have to be defined; and frankly I think I've got these Ps variables/use profiles endogenous to each other, but still helped me think through a little the partially objective/partially subject "I have enough ram / don't notice" experiences users here report)

[This has been my daily conceptual geekfest time, sorry if not of interest to others]

1240766 · Dec 18, 2020

It is all about thinking and doing things differently, and Apple does it.

On a MBA 8/8/8/512gb - I loaded an 800mb file XML at the same time - in VS Code, in Text Editor, in Excel, and in the Browser - the swap used at one time was 10gb, the apps loading the data became non-responsive for a few seconds as expected, but the system was fully functional opening other apps and multitasking while the apps were loading the data... this is amazing. On an Intel chip my whole system would've stopped responding.

RAM alone would've chocked my my system. I think for people debating where to get more RAM or more SSD space, if you can get both go for both, otherwise SSD is IMO more importantly so the system has space to swap. A 16gb RAM will still use swap quite a bit, that is how apple does it, and rightfully IMO...

JeepGuy · Dec 18, 2020

pshufd said:
I upgraded a 2008 Dell XPS Studio 435mt to 48 GB of RAM and that thing is quite usable given its age. It has an i7-920 so far more CPU horsepower than the Late 2009 iMac and 48 GB of RAM means that everything is in memory.

I have the same system, with 24gb i7-960, I always thought that was the max ram.

pshufd · Dec 18, 2020

JeepGuy said:
I have the same system, with 24gb i7-960, I always thought that was the max ram.

I did too because every place I looked said it was the max. But they set the max at what they test for and I guess the 8 GB DIMMs weren't out then. So I took a chance and bought two sticks (they're cheap), and it worked, so I bought another four. I't nice for running VMs and it's overall fast because all of your files are basically cached.

matrix07 · Dec 21, 2020

Obviously SSD is now more important than RAM in usual tasks. I'm using base MBA with 8 GB of RAM and it has never slowed down, but for now when SSD is low (~40 GB). Moves files to external and it becomes snappy again.

armoured · Dec 21, 2020

matrix07 said:
Obviously SSD is now more important than RAM in usual tasks. I'm using base MBA with 8 GB of RAM and it has never slowed down, but for now when SSD is low (~40 GB). Moves files to external and it becomes snappy again.

Yes, this is a well-known issue with SSDs, pasting a link below that explains a bit (you may know but in case of interest from others).

But I'd note: if a base machine with 8gb has to swap much, the effects will be cumulative - i.e. the more it writes and reads and deletes items from swap, the more likely the slower write operations of a more-full drive will be noticeable (for the reasons noted in the article). The effect of these cumulative slowdowns are probably not linear - combination of swapping + write slowdowns far worse than either of these individually, because they're going to overlap quite a bit.

In other words: in a sense the performance hit you're seeing is not just the ssd-full issue, and in fact boosting the ram might be an important part of a solution.

Beyond that, it's going to depend on your 'usual tasks' and specific use profile and whether having an external is a good/cost effective solution for you, or more ram, or a larger internal SSD or some combination. For some the external drive may be just what they want, for others, inconvenient. Even the task of moving files off computer and then finding when you need them involves some time and effort, which should also be included in 'slowdowns' in some sense.

Again, I'm not being prescriptive - I know I want / need the 16gb, and will plump for a larger ssd as well. But that's for my needs and use.

Why Solid-State Drives Slow Down As You Fill Them Up

The benchmarks are clear: Solid-state drives slow down as you fill them up.

www.howtogeek.com

Thinking different about the role of storage and RAM in Apple Silicon

macrumors 65816

macrumors regular

macrumors regular

macrumors G3

macrumors 65816

macrumors 601

macrumors 68020

macrumors 601

macrumors 6502a

macrumors regular

macrumors 6502a

macrumors regular

macrumors 601

macrumors regular

macrumors 65816

macrumors 601

macrumors G4

macrumors G5

macrumors regular

macrumors regular

Cancelled

macrumors 6502

macrumors G4

macrumors G3

macrumors regular

Our Staff