Why is 16GB on Big Sur like 8GB on Windows 10 or 4GB on Linux?

Toutou · May 8, 2021

TheSynchronizer said:
However, they are indeed very different and 8GB of RAM on an M1 system is not the same as 8GB on an x86 system. It’s greatly more efficient due to physically being soldered on to the M1 chip itself, allowing it to be used as unified memory therefore both the GPU and the CPU have zero-copy direct access to all the memory they could ever want, which is a lot more efficient than the way x86 systems do it. All the parts of the M1 SoC can access any data in memory they need at the exact same address. The whole overhead of the CPU needing to access the memory of the GPU and vice-versa has been completely eliminated with the M1.

Sorry, I get your enthusiasm for the new platform, but that's not true. And what's worse, it's not even false, most of the sentences just don't make sense.

Greatly more efficient due to being soldered? How? Why?

Allowing to be used as unified memory? How is that implied?

CPU has "zero-copy direct access", what's that? As opposed to what, some kind of indirect access? Which CPU does that?

"overhead of the CPU needing to access the memory of the GPU and vice-versa has been ... eliminated" ? No, it hasn't, it's just not that simple. Also, some of the possible meanings of that sentence are things we've had for years (DMA, Intel's Unified Memory) or things that we still don't have (automatic guaranteed zero-copy buffers with zero additional code).

TheSynchronizer · May 8, 2021

Toutou said:
Sorry, I get your enthusiasm for the new platform, but that's not true. And what's worse, it's not even false, most of the sentences just don't make sense.

Greatly more efficient due to being soldered? How? Why?

Allowing to be used as unified memory? How is that implied?

CPU has "zero-copy direct access", what's that? As opposed to what, some kind of indirect access? Which CPU does that?

"overhead of the CPU needing to access the memory of the GPU and vice-versa has been ... eliminated" ? No, it hasn't, it's just not that simple. Also, some of the possible meanings of that sentence are things we've had for years (DMA, Intel's Unified Memory) or things that we still don't have (automatic guaranteed zero-copy buffers with zero additional code).

It’s pretty simple.

The memory is more efficient not due to being soldered, but due to where it is soldered. The M1 has all of the unified memory used by the system soldered right on to the SoC whereas traditional x86 systems will have memory, soldered or not, somewhere on the motherboard near to the CPU/GPU/SSD etc. It’s basic physics - there is physically a much lower distance for electrons to travel and even if this seems like an insignificant difference to a human, when this is being done millions of time a second it makes a big difference.

The memory of an M1 system being unified memory is a fact of how it functions. It doesn’t need to be implied anywhere - Apple themselves sell M1 systems clearly stating that they have unified memory. I’m not sure what else I need to imply..?

Data being stored in memory can be accessed by every single component of an M1 system at the exact same memory address. Therefore it doesn’t need to be copied anywhere to be accessed or processed, e.g. doesn’t need to be copied from GPU VRAM to main memory to CPU cache for the CPU to be able to access it and work on it. Apple isn’t the first to do this with the M1, as AMD APUs function in a similar way but again they do not come with memory soldered on to their chip so it is not the same.

The overhead has been eliminated because the data exists in memory at one address and can be accessed by both the CPU and GPU and any other component of the M1 at this one address, without needing to be copied or moved anywhere. And this can be done essentially simultaneously by all components (instantenous switching of access by any component).

leman · May 9, 2021

TheSynchronizer said:
It’s pretty simple.

First of all, I wouldn't recommend arguing with @Toutou on these matters, they are one of very few users here that actually know this stuff.

TheSynchronizer said:
The memory is more efficient not due to being soldered, but due to where it is soldered. The M1 has all of the unified memory used by the system soldered right on to the SoC whereas traditional x86 systems will have memory, soldered or not, somewhere on the motherboard near to the CPU/GPU/SSD etc. It’s basic physics - there is physically a much lower distance for electrons to travel and even if this seems like an insignificant difference to a human, when this is being done millions of time a second it makes a big difference.

Real world tests how that M1 RAM has exactly the same latency and bandwidth as similar configured LPDDR4X soldered on to the mainboard. You are correct that shorter pathways would theoretically improve the latency, but there is also the LPDDR4 protocol to think about. The truth is, it does not make any practical difference for the performance. Where M1's on-package memory does seem to have a big advantage is in memory usage (it uses a laughably small amount of power, even for LPDDR4 standards).

TheSynchronizer said:
The memory of an M1 system being unified memory is a fact of how it functions. It doesn’t need to be implied anywhere - Apple themselves sell M1 systems clearly stating that they have unified memory. I’m not sure what else I need to imply..?

Data being stored in memory can be accessed by every single component of an M1 system at the exact same memory address. Therefore it doesn’t need to be copied anywhere to be accessed or processed, e.g. doesn’t need to be copied from GPU VRAM to main memory to CPU cache for the CPU to be able to access it and work on it. Apple isn’t the first to do this with the M1, as AMD APUs function in a similar way but again they do not come with memory soldered on to their chip so it is not the same.

The overhead has been eliminated because the data exists in memory at one address and can be accessed by both the CPU and GPU and any other component of the M1 at this one address, without needing to be copied or moved anywhere. And this can be done essentially simultaneously by all components (instantenous switching of access by any component).

Yes and no. The truth is that unified memory does not eliminate all memory copies between CPU and GPU simply because GPU APIs are fundamentally based on copying data between host and device memory. You can get zero copy, but it will require changes to your application. Where unified memory does shine however is if you need to map and manipulate the contexts of GPU-owned memory on the CPU. This can indeed be done without any overhead on Apple Silicon, but it's not something that is done very often anyway in everyday software. Professional applications (content creation) are a primary beneficiary here since they frequently need to apply GPU and CPU (and more recently, NPU) processing to the same data. Games not so much since most game engines are designed under the assumption that CPU/GPU memory synchronization is slow. Of course, if you are targeting Apple Silicon specifically, you could do some neat things that exploit unified memory (e.g. in my game prototype I use the CPU to dynamically generate geometry and immediately send it to the GPU without any memory copies).

Apple Knowledge Navigator · May 9, 2021

Toutou said:
Sorry, I get your enthusiasm for the new platform, but that's not true. And what's worse, it's not even false, most of the sentences just don't make sense.

Greatly more efficient due to being soldered? How? Why?

?‍♂️

You don't need to visit these forums to understand how and why - just visit Apple's website.

The location of the RAM dies is exactly what makes the architecture 'unified'. It's the equivalent of town planning with the greatest road efficiency in mind, featuring fewer, shorter lanes of traffic that allow faster access to every destination and less energy requirements.

Put simply; the memory is unified because it can be directly accessed by the processing units, without having to take the 'scenic route' of a traditional computer architecture.

And remember that unified memory doesn't change the laws of physics. 8gb is still 8gb, 16gb is still 16gb. All that's changing is the speed of access, and unfortunately it's this feature that has caused the misconception that M1 RAM allocations are, magically, worth double of the stated amount - all because users are no longer noticing the SSD writes.

leman · May 9, 2021

Apple Knowledge Navigator said:
You don't need to visit these forums to understand how and why - just visit Apple's website.

I am quite sure the person you are quoting understands these things better than most here.

Apple Knowledge Navigator said:
The location of the RAM dies is exactly what makes the architecture 'unified'.

No, it does not. The memory hierarchy is what makes it unified. The location does not matter much. As I have pointed out in my previous post, M1's on-package RAM is not any faster than same LPDDR4X in an Intel system.

thunng8 · May 9, 2021

TheSynchronizer said:
I don‘t think I’ve ever seen anyone say they’re equal. Saying that is completely false.

However, they are indeed very different and 8GB of RAM on an M1 system is not the same as 8GB on an x86 system. It’s greatly more efficient due to physically being soldered on to the M1 chip itself, allowing it to be used as unified memory therefore both the GPU and the CPU have zero-copy direct access to all the memory they could ever want, which is a lot more efficient than the way x86 systems do it. All the parts of the M1 SoC can access any data in memory they need at the exact same address. The whole overhead of the CPU needing to access the memory of the GPU and vice-versa has been completely eliminated with the M1. This fact coupled with the much faster SSD controller, and faster access to the memory itself due to it being soldered on the SoC, means that an 8GB M1 system performs a lot better than an 8GB x86 system in terms of memory management, efficiency, and total size required.

No amount of memory on an M1 system is equal to any amount of memory on an x86 system as they simply function completely differently, so there is no logical way to call them equal.

However, as a fact: an 8GB m1 system can handle a lot more memory intensive workloads than an 8GB x86 system, and many workloads which you would need (>8GB) 16GB RAM for on an x86 system, you can do just fine on the M1. Hence why the 8GB M1 is plenty enough for a lot of people.

That seems correct from my experience. I’m amazed how much a 16gb m1 can handle. I wouldn’t even try running multiple large applications on a 16gb windows machine. It would slow down to a crawl.

Here is an example

skip to approx 15 min section. Hardly any slowdown on an m1, while windows was struggling and laggy.

BigPotatoLobbyist · May 9, 2021

1. Compressed memory.
2. 16K paging. This is a large portion of it.
3. Mac OS caching/speculation which is quite aggressive

BigPotatoLobbyist · May 9, 2021

thunng8 said:
That seems correct from my experience. I’m amazed how much a 16gb m1 can handle. I wouldn’t even try running multiple large applications on a 16gb windows machine. It would slow down to a crawl.

Here is an example

skip to approx 15 min section. Hardly any slowdown on an m1, while windows was struggling and laggy.

Yeah it's absurd. Today for the first time I noticed a mild slowdown -

Running full loads of Xcode instruments + 67 tabs in Safari, no adblock, with battery drawing to around 10%.

BigPotatoLobbyist · May 9, 2021

Toutou said:
Sorry, I get your enthusiasm for the new platform, but that's not true. And what's worse, it's not even false, most of the sentences just don't make sense.

Greatly more efficient due to being soldered? How? Why?

Allowing to be used as unified memory? How is that implied?

CPU has "zero-copy direct access", what's that? As opposed to what, some kind of indirect access? Which CPU does that?

"overhead of the CPU needing to access the memory of the GPU and vice-versa has been ... eliminated" ? No, it hasn't, it's just not that simple. Also, some of the possible meanings of that sentence are things we've had for years (DMA, Intel's Unified Memory) or things that we still don't have (automatic guaranteed zero-copy buffers with zero additional code).

Yeah the "unified memory" **** really ****ing grinds my gears. Like holy hell, not sure UHD graphics were on my list of high points in the last ten years, weird to take this ******** advertising line.

Most of the memory stangeness is about how Mac OS treats memory, both in terms of eating up any possible forks in the road + caching previous ones, and the paging at 16k now over 4k, willingness to swap, which.....

Surely Microsoft will match that soon with Intel and AMD? the 16k, taht is

JMacHack · May 9, 2021

dogslobber said:
A few MB here and there doesn't help whatsoever as most users know macOS will slow down after a week of use and need a reboot.

Not apples to apples (ha), but you sure on that timeframe? I have a Mac Pro 5,1 running Mojave and it usually takes a month before slowdowns and glitciness starts showing up.

BigPotatoLobbyist · May 9, 2021

Apple Knowledge Navigator said:
?‍♂️

You don't need to visit these forums to understand how and why - just visit Apple's website.

The location of the RAM dies is exactly what makes the architecture 'unified'. It's the equivalent of town planning with the greatest road efficiency in mind, featuring fewer, shorter lanes of traffic that allow faster access to every destination and less energy requirements.

Put simply; the memory is unified because it can be directly accessed by the processing units, without having to take the 'scenic route' of a traditional computer architecture.

And remember that unified memory doesn't change the laws of physics. 8gb is still 8gb, 16gb is still 16gb. All that's changing is the speed of access, and unfortunately it's this feature that has caused the misconception that M1 RAM allocations are, magically, worth double of the stated amount - all because users are no longer noticing the SSD writes.

This post makes me want to relinquish every Apple product I own

Truly Apple's real mastery has been one in marketing with this chip, to a degree, haha.

BigPotatoLobbyist · May 9, 2021

thunng8 said:
That seems correct from my experience. I’m amazed how much a 16gb m1 can handle. I wouldn’t even try running multiple large applications on a 16gb windows machine. It would slow down to a crawl.

Here is an example

skip to approx 15 min section. Hardly any slowdown on an m1, while windows was struggling and laggy.

Yeah but this isn't really about the "unified memory" per se.
The XPS are notorious for ghastly thermal throttling. The "15W TDP" figure is more like 28W at reasonable loads and frankly higher depending on internal settings.

By contrast, well, the M1 at *full boost* can consistently run without slowdown modulo the Macbook Air after 30 minutes to an hour. Like, it's only a few minutes before an XPS or Envy shifts to PL1.

Edit: not even re minute

leman · May 9, 2021

BigPotatoLobbyist said:
Yeah the "unified memory" **** really ****ing grinds my gears. Like holy hell, not sure UHD graphics were on my list of high points in the last ten years, weird to take this ******** advertising line.

Yeah, folks really took this a bit too far. But I don’t think it was bad advertising from Apple. Quite on contrary. Unified memory is nothing new, every budget computer has it, but it has been used as a way to save money. High performance user systems don’t use UM since it would be too expensive and to complicated to orchestrate between different vendors (except some supercomputers or gaming consoles which rely on unified memory extensively). Apple is now bringing unified memory to high performance user systems, for first time ever. This has major implications for software design, especially in the pro space. It radically simplifies your assumptions and frees you to mix CPU/GPU/ML procession of the same data as you want. It’s really exiting stuff for us devs.

BigPotatoLobbyist said:
Surely Microsoft will match that soon with Intel and AMD? the 16k, taht is

Unlikely. You wouldn’t believe how much software is hardcoded to use 4K pages. If Windows tried to move to 16K, it would wreck chaos.

Gnattu · May 9, 2021

leman said:
Unlikely. You wouldn’t believe how much software is hardcoded to use 4K pages. If Windows tried to move to 16K, it would wreck chaos.

A fun fact is that RedHat hardcoded their aarch64 RHEL Linux kernel to use 64K page size, which made RHEL based distros not usable on M1.

dogslobber · May 9, 2021

JMacHack said:
Not apples to apples (ha), but you sure on that timeframe? I have a Mac Pro 5,1 running Mojave and it usually takes a month before slowdowns and glitciness starts showing up.

Mojave runs OK in 4GB and is great in 8GB for my experience as I usually run it in a VM nowadays. Big Sur is really bad from my experience in that it can't run as efficiently on the same hardware. I went back to Catalina for most of my systems and Mojave for VMs.

JMacHack · May 9, 2021

dogslobber said:
Mojave runs OK in 4GB and is great in 8GB for my experience as I usually run it in a VM nowadays. Big Sur is really bad from my experience in that it can't run as efficiently on the same hardware. I went back to Catalina for most of my systems and Mojave for VMs.

No kidding, I updated the work machine from Catalina to Big Sur and the experience has been much better with Big Sur. Catalina was a ******** of kernel panics. I’m running 64 gigs of RAM though, take that as you will.

BigMcGuire · May 9, 2021

JMacHack said:
No kidding, I updated the work machine from Catalina to Big Sur and the experience has been much better with Big Sur. Catalina was a ******** of kernel panics. I’m running 64 gigs of RAM though, take that as you will.

Jealous. My work provided MBP has 16GB and trying to do VS 2019 with a 200k+ file application is a lot of fun. Been my experience too that Big Sur has been pretty stable.

TopherMan12 · May 9, 2021

Toutou said:
Sorry, I get your enthusiasm for the new platform, but that's not true. And what's worse, it's not even false, most of the sentences just don't make sense.

Greatly more efficient due to being soldered? How? Why?

Allowing to be used as unified memory? How is that implied?

CPU has "zero-copy direct access", what's that? As opposed to what, some kind of indirect access? Which CPU does that?

"overhead of the CPU needing to access the memory of the GPU and vice-versa has been ... eliminated" ? No, it hasn't, it's just not that simple. Also, some of the possible meanings of that sentence are things we've had for years (DMA, Intel's Unified Memory) or things that we still don't have (automatic guaranteed zero-copy buffers with zero additional code).

So what is the difference between the RAM on the chip and CPU cache?

BigPotatoLobbyist · May 9, 2021

dogslobber said:
Mojave runs OK in 4GB and is great in 8GB for my experience as I usually run it in a VM nowadays. Big Sur is really bad from my experience in that it can't run as efficiently on the same hardware. I went back to Catalina for most of my systems and Mojave for VMs.

Christ on a stick thank you, Big Sur made my longstanding complaints about MacOS's bloatedness feel much more widespread, contra teh narratives about windows:mac os on similar hardware. Been like this since 2013. I ran Windas 8 on a Mac & even given a deluge of games and files it just felt cleaner, more fluid.

Found the exact same to my dismay in 2020 with an Ice Lake MBP experimenting with Bootcamp. I mean, people meme nebulous "optimization" to rationalize x y and z - and yea, "optimization" assuredly does exist in various contexts - but has anyone considered the direction of the function may not lie in the precise coordinates they've assumed (namely taht MS having to target myriad devices is a pernicious thing). Certainly the fragmentation could grow annoying I imagine, but I think Windows's day-to-day performance is a bit better off for it. Hell, they've a stable ABI, so.

Android is the counterexample here, but then again, WP8 was smooth as hell, so my guess is Google just likes f*cking with higher-level languages + libraries.

BigPotatoLobbyist · May 9, 2021

leman said:
Yeah, folks really took this a bit too far. But I don’t think it was bad advertising from Apple. Quite on contrary. Unified memory is nothing new, every budget computer has it, but it has been used as a way to save money. High performance user systems don’t use UM since it would be too expensive and to complicated to orchestrate between different vendors (except some supercomputers or gaming consoles which rely on unified memory extensively). Apple is now bringing unified memory to high performance user systems, for first time ever. This has major implications for software design, especially in the pro space. It radically simplifies your assumptions and frees you to mix CPU/GPU/ML procession of the same data as you want. It’s really exiting stuff for us devs.

Unlikely. You wouldn’t believe how much software is hardcoded to use 4K pages. If Windows tried to move to 16K, it would wreck chaos.

hmmm I guess we'll see

Fomalhaut · May 9, 2021

TopherMan12 said:
So what is the difference between the RAM on the chip and CPU cache?

CPU cache (L1 & L2) is part of the silicon die, and is used (L1 at least) in the instruction decoding process for predictive branches (I think). There is a big difference in latency between L1, L2, L3 (if present) and DRAM, which is why CPU cache is so important to performance in modern CPUs.

In the M1, the RAM is not part of the die, and is on separate modules located next to the main CPU die. I posted some photos of the die at https://forums.macrumors.com/thread...the-new-mac-book-pro-16.2294435/post-29860116

SlCKB0Y · May 9, 2021

mi7chy said:
Feel free not to participate if it's above your level or you don't like the truth. Several people were claiming 8GB on M1 is equal to 16GB on other architectures. Were you one of them?

Lol. it YOU who are participating in a conversation above your technical level. The fact that you even started a thread to discuss this proves my point.

themanwithaplan · May 10, 2021

As others have said in this thread, MacOS uses aggressive caching algorithms which means it uses a lot of ram for cache. Windows actually does this as well (albeit to a lesser degree), in fact if you go to task manager and click on the memory section you will see how much memory is being used for caching under the "cached" section. The difference is that under windows, the memory being used for caching is not counted in the general memory usage graph shown in the task manager and you will only see it if you click on the ram usage section. For example, if you have 16GB of ram under windows and 3GB is being used for caching rather than running services/applications, you will not see the 3GB under the memory usage graph unless you go deeper into the ram usage section. Comparing MacOS ram usage to linux is pointless because out of the box most linux distributions have little to no caching. You can however somewhat emulate this caching behavior under linux by installing a package called 'preload.' This package uses an algorithm to cache commonly accessed files in RAM up to a set amount of RAM that is user-configurable. Predictably, if you do this or use a linux distribution with preload configured out of the box, you will see that it uses a lot more RAM with preload enabled than it would with it disabled.

And of course, no matter the ram usage you see in system monitor, it will not affect performance unless the system starts using a significant amount of swap memory.

Significant1 · May 10, 2021

themanwithaplan said:
As others have said in this thread, MacOS uses aggressive caching algorithms which means it uses a lot of ram for cache. Windows actually does this as well (albeit to a lesser degree), in fact if you go to task manager and click on the memory section you will see how much memory is being used for caching under the "cached" section. The difference is that under windows, the memory being used for caching is not counted in the general memory usage graph shown in the task manager and you will only see it if you click on the ram usage section. For example, if you have 16GB of ram under windows and 3GB is being used for caching rather than running services/applications, you will not see the 3GB under the memory usage graph unless you go deeper into the ram usage section. Comparing MacOS ram usage to linux is pointless because out of the box most linux distributions have little to no caching. You can however somewhat emulate this caching behavior under linux by installing a package called 'preload.' This package uses an algorithm to cache commonly accessed files in RAM up to a set amount of RAM that is user-configurable. Predictably, if you do this or use a linux distribution with preload configured out of the box, you will see that it uses a lot more RAM with preload enabled than it would with it disabled.

And of course, no matter the ram usage you see in system monitor, it will not affect performance unless the system starts using a significant amount of swap memory.

Exactly. iStat menu shows both Apple's memory pressure method and the traditional way. (Sorry about language on image, Tryk=pressure, Hukommelse=memory and Ledig=free).

dogslobber · Jun 20, 2021

mi7chy said:
6.93GB+4.70GB seems excessive though. Can understand the 4.70GB cache memory getting freed up for apps in theory although I've seen it use up nearly all 16GB RAM and not free up cache but go straight to using swap. Aside from cache, 6.93GB OS/app usage seems excessive when Windows 10 uses half and Linux less than a third.

For sure macOS uses more memory than the others. Think of macOS like a Hummer whereas Linux is a Subaru.

Why is 16GB on Big Sur like 8GB on Windows 10 or 4GB on Linux?

macrumors 65816

macrumors 6502

macrumors Core

macrumors 68040

macrumors Core

macrumors 65816

macrumors 6502

macrumors 6502

macrumors 6502

Suspended

macrumors 6502

macrumors 6502

macrumors Core

macrumors 65816

macrumors 601

Suspended

Cancelled

macrumors 6502a

macrumors 6502

macrumors 6502

macrumors 68000

macrumors 68040

macrumors newbie

macrumors 68000

macrumors 601

Our Staff