M1 Owners: Is 16GB the same as Intel or has it better memory management?

theorist9 · Jan 3, 2024

leman said:
This is a popular argument but far from being obvious in practice. On many dGPU systems the application/driver/system will mirror a decent portion of GPU memory in system memory, for various reasons. So you can't conclude that the dGPU's separate memory pool gives you more effective memory. The specific memory usage needs to be measured in every case.

Regarding multiple external displays: you don't need a lot of memory to support a display on its own. Even a 5K monitor with full colors only needs around 60MB per frame. Of course, more displays usually means more visible and active apps, which is where the main memory cost comes from. Again, this is something that has to be measured for every use case.

I'm curious—for systems with separate RAM and VRAM, are there particular classes of items that take up a significant amount of VRAM, yet that would not be mirrored in RAM?

For instance are the textures and shaders for a video game copied from RAM to VRAM, after which they can be deleted from RAM? If so, for those particular items, the dGPU's separate memory pool would act as additional memory.

leman · Jan 3, 2024

theorist9 said:
I'm curious—for systems with separate RAM and VRAM, are there particular classes of items that take up a significant amount of VRAM, yet that would not be mirrored in RAM?

For instance are the textures and shaders for a video game copied from RAM to VRAM, after which they can be deleted from RAM? If so, for those particular items, the dGPU's separate memory pool would act as additional memory.

I don't know. One would need to ask driver engineers how these things are handled. A big question for me is how drivers work in the context of multitasking. If you have different processes uploading textures, will you run out of memory? Or will the driver offload some texture data to the system RAM to make space? Shaders/pipeline states are even trickier. Do drivers permanently store all shade pipelines application creates in the GPU RAM (sounds like a huge waste)? Or do they upload the required pipelines when needed?

I can imagine that on a modern API (like VK or DX12), where you handle memory pools explicitly, it's your responsibility to manage the available space and juggle data if needed. But even there it's not clear to me how tis works with multiple applications. If an app reserves some memory, does it mean other apps will get less memory to play with or will the driver move things around it the background (if it's the latter, then the entire story with manual memory management is just a bad joke)? Note that even with all the focus on manual management, memory allocations performed by shader program creation are completely hidden from the user.

pipo2 · Jan 4, 2024

leman said:
In addition, CPU technology has evolved a lot in the last 30 years. Original CISC and RISC CPU designs have as much relevance to today's technology as ancient chariots to modern cars. Some basic design philosophy has survived of course: contemporary "RISC" is designed with efficient execution in hardware in mind, while contemporary "CISC" is mostly a mixed bag that cares about compatibility with old code (in fact, Intel's current CPUs only real "CISC" feature is that operands can come from either registers or memory, so if one wants to argue that x86 is CISC the entire RISC/CISC distinction is trivially reduced to load-store vs. reg/mem design ).

Not so sure, but are you on a crusade wrt the usage of CISC and RISC? I recall you have been doing this before.

As a programmer I (and others) still use the CISC, RISC and MISC terminology. Obviously it's not saying anything about the underlying hardware (anymore). It's software, the ISA.
We're talking about what is presented to us wrt number of registers and instruction set. Of course we have to deal with the "load-store vs. reg/mem design". Old hat. At least/last we can do something with the hardware presented to us. And I'm afraid it is not always so trivial. Regardless we soldier on...

BTW like it or not, ARM is still using RISC to describe itself:

What is RISC?

RISC is an alternative to the Complex Instruction Set Computing (CISC) architecture and is often considered the most efficient CPU architecture technology available today.

www.arm.com

leman · Jan 4, 2024

pipo2 said:
Not so sure, but are you on a crusade wrt the usage of CISC and RISC? I recall you have been doing this before.

I am. I just don't think these notions are helpful. They just promote confusion and half-truths. In fact, I this is why I oppose mindless use of labeling altogether.

pipo2 said:
As a programmer I (and others) still use the CISC, RISC and MISC terminology. Obviously it's not saying anything about the underlying hardware (anymore). It's software, the ISA.
We're talking about what is presented to us wrt number of registers and instruction set. Of course we have to deal with the "load-store vs. reg/mem design". Old hat. At least/last we can do something with the hardware presented to us. And I'm afraid it is not always so trivial. Regardless we soldier on...

If the load-store vs reg/mem is the crucial distinction point, why not talk about this instead of using non-transparent labels like RISC and CISC? There is nothing "reduced" about the ARM instruction set, nor is the x86-64 particularly "complex" in comparison (messy and convoluted, maybe). Modern ARM certain has more addressing modes and instructions than x86. And there is a huge difference in complexity between ARM and core RISC-V, for example.

I mean, if we look at ARM64 vs. x86-64 ISA specifically. Both have registers, stack, condition flags, combined FP/packed SIMD state, and both offer pretty much the same set of arithmetic and logical operations. The notable differences are load-store vs. reg/mem design, fixed-width vs. variable-width instructions, more addressing modes in ARM (pre/post-increment, index register sign extension control — don't remember if x86 has that too), more instructions that do multiple things in ARM (two register load/store, combined ALU+shift), SIMD ISA design. I don't think that these differences can be meaningfully conveyed with RISC/CISC terminology...

casperes1996 · Jan 4, 2024

pipo2 said:
Not so sure, but are you on a crusade wrt the usage of CISC and RISC? I recall you have been doing this before.

As a programmer I (and others) still use the CISC, RISC and MISC terminology. Obviously it's not saying anything about the underlying hardware (anymore). It's software, the ISA.
We're talking about what is presented to us wrt number of registers and instruction set. Of course we have to deal with the "load-store vs. reg/mem design". Old hat. At least/last we can do something with the hardware presented to us. And I'm afraid it is not always so trivial. Regardless we soldier on...

BTW like it or not, ARM is still using RISC to describe itself:

What is RISC?

RISC is an alternative to the Complex Instruction Set Computing (CISC) architecture and is often considered the most efficient CPU architecture technology available today.

www.arm.com

I'd say Leman is right here. The number of general purpose registers available is not standardised among chips claiming to be either RISC or CISC. AArch64 IIRC has 32 GPRs. x64 has 16. 68K has 8 and 8 address registers. And what about vector registers like AVX, NEON, SVE, etc.? There's no point grouping the ISAs together in CISC or RISC camps when you need to consider each chip individually anyway. Each different ISA also has vastly different instructions available and much to Leman's point the only thing that really seems to determine whether something categorises as CISC or RISC is whether you perform load/store or can do register-memory operations directly. With things like SVE2 it's hardly like AArch64 is that reduced after all; Still quite advanced and numerous instructions in there.

leman · Jan 4, 2024

casperes1996 said:
With things like SVE2 it's hardly like AArch64 is that reduced after all; Still quite advanced and numerous instructions in there.

ARM nowadays even has dedicated memcpy/memset instructions. That had some people take up pitchforks "because it's not RISC". Wouldn't couldn't be more far from the truth IMO. Copying is a fairly tricky operation if you want it done right and requires an expert-level understanding of hardware. Well-tuned memcopy/memset implementations are large, complex assembly subroutines, which might not reach best possible performance on new hardware. If one thinks about this rationally, it is quite insane that such a crucially important task is commonly done in software!

MrGunny94 · Jan 4, 2024

I do agree that they should go up to 12/16GB on base models depending on the Chip with the M3.

I'm already happy they did increase the base model RAM to 18GB on the Pro models (by design on the chip and memory)

Even with 16GB of shared memory across CPU/GPU when I'm hooked up to dual 4K monitors and have everything open I'm at 60-70% memory usage.

It's quite crazy if you think about it, the whole point of the 'Pro' models is for people to use them as professional tools so I am definitely disappointed that moving to Apple Silicon they didn't make at least the M 'Pro' line with 32GB base.

pipo2 · Jan 4, 2024

casperes1996 said:
I'd say Leman is right here. The number of general purpose registers available is not standardised among chips claiming to be either RISC or CISC. AArch64 IIRC has 32 GPRs. x64 has 16. 68K has 8 and 8 address registers. And what about vector registers like AVX, NEON, SVE, etc.? There's no point grouping the ISAs together in CISC or RISC camps when you need to consider each chip individually anyway. Each different ISA also has vastly different instructions available and much to Leman's point the only thing that really seems to determine whether something categorises as CISC or RISC is whether you perform load/store or can do register-memory operations directly. With things like SVE2 it's hardly like AArch64 is that reduced after all; Still quite advanced and numerous instructions in there.

"The number of general purpose registers available is not standardised among chips claiming to be either RISC or CISC." Sure.
IMHO getting upset about CISC and RISC is not worth it. How often are those terms used? In my world hardly. And a long time ago, when moving from 68k to PowerPC, there was some bewilderment wrt a reduced instruction set, just counting the PPC instructions ;-)
So I consider CISC and RISC as empty words (not abbreviations), sometimes useful to convey something in context. I would certainly not condemn people for using them. We differ, no problem.

leman · Jan 4, 2024

pipo2 said:
"The number of general purpose registers available is not standardised among chips claiming to be either RISC or CISC." Sure.
IMHO getting upset about CISC and RISC is not worth it. How often are those terms used? In my world hardly. And a long time ago, when moving from 68k to PowerPC, there was some bewilderment wrt a reduced instruction set, just counting the PPC instructions ;-)
So I consider CISC and RISC as empty words (not abbreviations), sometimes useful to convey something in context. I would certainly not condemn people for using them. We differ, no problem.

I don't think our opinion differ that much. And I fully agree with you that these terms can be sometimes useful in a technical discussion, as long as all the interlocutors understand the nuances. But also I think that these notions are potentially dangerous in a casual non-technical discussion for wider audiences (taking @MacInMotion's post for example), because they obfuscate the reality. Instead of discussing what is actually going on (which might be interesting and educational for a hobbyist curious about these things), labels perpetuate unhealthy myths and overzealous generalizations (like "RISC is low-power, CISC is high-power" or "integrated is slow, dedicated is fast"). Labels are easy, and they tend to get repeated a lot, which makes them seem "right". And the end effect is that people stop at labels and don't bother learning the actual interesting effect hiding behind it.

casperes1996 · Jan 4, 2024

pipo2 said:
"The number of general purpose registers available is not standardised among chips claiming to be either RISC or CISC." Sure.
IMHO getting upset about CISC and RISC is not worth it. How often are those terms used? In my world hardly. And a long time ago, when moving from 68k to PowerPC, there was some bewilderment wrt a reduced instruction set, just counting the PPC instructions ;-)
So I consider CISC and RISC as empty words (not abbreviations), sometimes useful to convey something in context. I would certainly not condemn people for using them. We differ, no problem.

If they are empty words, their usage is pointless and better replaced with words that get to the real point. I don't condemn anyone for using the terms, but I would prefer their meaning to be, well, meaningful, when used.

pipo2 · Jan 4, 2024

casperes1996 said:
If they are empty words, their usage is pointless and better replaced with words that get to the real point. I don't condemn anyone for using the terms, but I would prefer their meaning to be, well, meaningful, when used.

Me too!

But this is English, depending on context, a word can have different meanings. Possibly a well known example is Noise. Without context, it's rather empty, no idea what is meant.

casperes1996 · Jan 4, 2024

pipo2 said:
Me too!
But this is English, depending on context, a word can have different meanings. Possibly a well known example is Noise. Without context, it's rather empty, no idea what is meant.

Fair play

theluggage · Jan 4, 2024

leman said:
There is nothing "reduced" about the ARM instruction set, nor is the x86-64 particularly "complex" in comparison (messy and convoluted, maybe). Modern ARM certain has more addressing modes and instructions than x86. And there is a huge difference in complexity between ARM and core RISC-V, for example.

RISC doesn't simply mean "fewer instructions", it also means individual instructions which execute faster - ideally in a single clock cycle - and allow for more efficient pipelining and finer-grained code with more opportunities for optimisation by the compiler.

See https://en.wikipedia.org/wiki/Reduced_instruction_set_computer#Instruction_set_philosophy (and follow some of the citations), including this one which directly addresses the "fewer instructions" misconception. (The Wiki article also gives a better acronym: RISC=Relegate Interesting Stuff to the Compiler).

ARM - even the old 24 bit version - can be spun as having ridiculous number of instructions if you consider that every instruction can be made conditional and have a variety of shift/rotate options applied - but all of that is hardwired to happen without additional clock cycles - and handling simple conditionals without needing a jump avoids trashing the pipeline every time.

leman · Jan 4, 2024

theluggage said:
RISC doesn't simply mean "fewer instructions", it also means individual instructions which execute faster - ideally in a single clock cycle - and allow for more efficient pipelining and finer-grained code with more opportunities for optimisation by the compiler.

This hasn’t been the case for many years. ARM instructions on current CPUs do not execute any faster than x86 instructions (if anything, it’s a property of the implementation and not the ISA). And ARM has its fair share of instructions that execute in multiple steps. For example, shift+add is one instruction in ARM, but pretty much every modern implementation (including Apple) executes it in two steps. And ARM designs until very recently even used micro-ops, just like x86 CPUs. So I really don’t see how this applies to the current situation.

I suspect the story is that x86 did offer a few “high-level” instructions at some point, to simplify programming in assembler. These instructions were never much used and have been made obsolete long time ago n

theluggage said:
See https://en.wikipedia.org/wiki/Reduced_instruction_set_computer#Instruction_set_philosophy (and follow some of the citations), including this one which directly addresses the "fewer instructions" misconception. (The Wiki article also gives a better acronym: RISC=Relegate Interesting Stuff to the Compiler).

I don’t really see how this is the case with modern implementations. As I mentioned above, modern ARM even has dedicated memory copy instructions (exactly the case the article is arguing against), simply because a CPU can do a better job copying data than software.

theluggage said:
ARM - even the old 24 bit version - can be spun as having ridiculous number of instructions if you consider that every instruction can be made conditional and have a variety of shift/rotate options applied - but all of that is hardwired to happen without additional clock cycles - and handling simple conditionals without needing a jump avoids trashing the pipeline every time.

Modern ARM (v8 and later) has dropped predicated instructions, because they made it more difficult to develop high-performance CPUs. Only a small selection of conditional moves remain, which have also been a standard feature on x86 for many years.

What is undoubtedly true is that ARMv8 is a more modern, streamlined, symmetric design than x86, simply because it’s much more recent. ARM64 is a clean skate design - which industry rumors suggest was developed in tight cooperation with Apple, the main goal being enabling very high performance CPU cores. On the other hand, x86 still carries around legacy baggage from ages past. But Intel has announced a new instruction encoding (APX) which they hope will give them parity with ARM.

theluggage · Jan 4, 2024

leman said:
Modern ARM (v8 and later) has dropped predicated instructions, because they made it more difficult to develop high-performance CPUs.

OK. Hadn't realised that ARM64 was such a complete do-over.

mr_roboto · Jan 4, 2024

leman said:
If the load-store vs reg/mem is the crucial distinction point, why not talk about this instead of using non-transparent labels like RISC and CISC? There is nothing "reduced" about the ARM instruction set, nor is the x86-64 particularly "complex" in comparison (messy and convoluted, maybe). Modern ARM certain has more addressing modes and instructions than x86. And there is a huge difference in complexity between ARM and core RISC-V, for example.

The problem is that the original acronym was misleading, and as a result, what the general public thinks it means is very different from how it's used by the small community of people who get to design ISAs.

RISC wasn't ever merely about reducing instruction count. Perhaps the most important innovation of the RISC movement was bringing much more analytical rigor to ISA design. For example, at one time lots of people believed that future ISAs should close the "semantic gap" between CPU instructions and high level languages by making the former much more like the latter. This wasn't based on much beyond feeling it was the right thing to do. RISC rejected this in favor of a more scientific, data-driven approach.

As for the "reduced" theme, that wasn't primarily about counting up instructions, even though that has been a very popular interpretation suggested by the acronym. The important thing to reduce is the number of implementation pain points. This makes it easier and cheaper to design high performance implementations, and reduces their gate count and power draw. By itself, instruction count isn't a great predictor of implementation complexity.

In the CPU design community, RISC is also used as a shorthand for ISAs which are recognizably in the family tree of the original 1980s RISC ISAs: load/store, usually 32 general purpose registers, usually 32-bit fixed size instruction word, limited yet sufficient addressing modes, and several other things.

32-bit Arm was an outlier among those early RISCs; you could make an argument that it shouldn't have been lumped in with the rest. However, modern arm64 is a very orthodox RISC ISA. Ignore the high instruction count, that's not important. Most arm64 instructions are just variations on a theme, and none look very complex to implement.

Here's an example. I've frequently seen people cite arm64's "Javascript instruction" as evidence that it's not really a RISC. When you look this instruction up, it's a variant of floating point to integer conversion with a specialized rounding mode. For reasons I won't go into here, this variant is extremely important to Javascript performance.

The extra gate required by this instruction is almost nothing: it's a low impact extra mode for execution resources which have to exist for other FP-int conversion instructions. The impact is high, thanks to how important JS is in today's world. So arm64's ISA architects decided it was worth it to burn a single opcode (one of the most precious ISA resources, from their perspective) on it.

arm64 is full of things like that. They clearly did a ton of homework trying to figure out places where they could offer high-leverage, low-cost variants of common operations. It doesn't mean that the resulting ISA isn't RISC, as it's still an extremely regular and simple ISA design.

casperes1996 said:
I'd say Leman is right here. The number of general purpose registers available is not standardised among chips claiming to be either RISC or CISC. AArch64 IIRC has 32 GPRs. x64 has 16. 68K has 8 and 8 address registers. And what about vector registers like AVX, NEON, SVE, etc.? There's no point grouping the ISAs together in CISC or RISC camps when you need to consider each chip individually anyway. Each different ISA also has vastly different instructions available and much to Leman's point the only thing that really seems to determine whether something categorises as CISC or RISC is whether you perform load/store or can do register-memory operations directly. With things like SVE2 it's hardly like AArch64 is that reduced after all; Still quite advanced and numerous instructions in there.

With what I've written above, do you see that the important thing is not how many instructions or even whether they are "advanced"? Keep in mind that some things which seem 'advanced' from the software perspective are dead easy when designing gates, and some things which seem trivial are a giant pain in the butt.

leman said:
This hasn’t been the case for many years. ARM instructions on current CPUs do not execute any faster than x86 instructions (if anything, it’s a property of the implementation and not the ISA).

There's two factors at work here.

One is that while x86 can and should be classified as a CISC ISA, reality is messier than a pure binary one-or-the other kind of thing. x86 was one of the RISC-iest of the CISC ISAs. You noted that x86 doesn't have tons of addressing modes, and addressing modes are one of the key metrics which can make an ISA more or less "CISCy". Just like everything else, one mustn't get hung up on the number of modes, it's really about implementation complexity. Do any of the modes make life really difficult for hardware designers? Mostly by accident, x86 avoided some of the common addressing mode pitfalls other pre-RISC ISAs fell into, and that was very important to x86 managing to survive the 1980s.

leman said:
And ARM has its fair share of instructions that execute in multiple steps. For example, shift+add is one instruction in ARM, but pretty much every modern implementation (including Apple) executes it in two steps. And ARM designs until very recently even used micro-ops, just like x86 CPUs. So I really don’t see how this applies to the current situation.

I don't think there's any significant use of microcode in mainstream high performance 64-bit Arm core. Maybe in those which still have support for AArch32, but cores like Apple's (where AArch32 is a distant and unsupported memory), not so much.

More importantly, you have to look at all the outcomes, not just clock speed. For example, consider Zen 4 vs M1, as that's as close to the same process node as we can compare. The Zen 4 core is much larger than Apple's Firestorm core. Zen 4 scales to higher frequencies, but Apple's core delivers profoundly better perf/Hz and perf/W. If ISA doesn't matter at all, one would expect such differences to be far less pronounced.

leman said:
I suspect the story is that x86 did offer a few “high-level” instructions at some point, to simplify programming in assembler. These instructions were never much used and have been made obsolete long time ago n

No, x86 was never particularly high-level.

The 8086 was the successor of Intel's 8080 and 8085. The biggest new feature was support for a 20-bit (1MB) address space, up from 16-bit (64KB). 8086 wasn't binary compatible with the 8085, but was intentionally mostly assembly language source compatible, as that was an important selling feature in many of the markets 8080 and 8085 had sold into.

Because the 8080 and 8085 were designed in the early 1970s, there just wasn't the transistor budget to do anything fancy. 8086 wasn't much more than that because when that project kicked off, Intel already had a team working on their extremely ambitious all-new 32-bit architecture of the future, iAPX 432. 432 was a "close the semantic gap" design: it had HLL features (capabilities, objects, garbage collection) baked into the ISA and microcode. 8086 was just a side project to keep existing 8085 customers loyal to Intel while the 432 team finished their work.

But the 432 was a dismal failure. Extremely late, incredibly slow, and ironically, its advanced ISA features made it extremely difficult to port existing operating systems and applications. It was a complete disaster, far worse than Itanium.

Concurrent with the 432 beginning to fail, x86 received the windfall of IBM selecting it for the IBM PC, and the PC's success meant x86 got allocated resources for some upgrades. After some false steps in the 286, Intel came up with some decent ideas for cleaning up the ugliest aspects of the 8086 ISA in the 386, and perhaps even more importantly, didn't succumb to the temptation to add too much.

If any of this had gone a little bit differently, we wouldn't have x86 as we know it today. For example, if 8086 had been regarded as the important project, it might have gotten the resources to be more ambitious, and that might have resulted in the inclusion of base ISA features too difficult to paper over in the long term. Designing microprocessor ISA features for ease of pipelined, superscalar, and out-of-order implementation was not something on anyone's mind in the 1970s; it really was a weird historical accident that x86 managed to avoid problems common to its contemporaries.

throAU · Jan 4, 2024

Chozes said:
Not 100% sure here but I have had performance issues on my 8GB Mac Mini. I still love it though. Its tiny and silent.

Same thing happens on my intel 8 GB MacBook Pro 13"...

leman · Jan 4, 2024

mr_roboto said:
The problem is that the original acronym was misleading, and as a result, what the general public thinks it means is very different from how it's used by the small community of people who get to design ISAs. […]

Thanks for the great summary, very insightful!

mr_roboto said:
RISC wasn't ever merely about reducing instruction count. Perhaps the most important innovation of the RISC movement was bringing much more analytical rigor to ISA design. For example, at one time lots of people believed that future ISAs should close the "semantic gap" between CPU instructions and high level languages by making the former much more like the latter. This wasn't based on much beyond feeling it was the right thing to do. RISC rejected this in favor of a more scientific, data-driven approach.

Yes, this is a great point. And I think it illustrates well why the commonly used RISC/CISC discourse is misleading. All the “high level” ISAs have pretty much died out. The examples you discuss also apply to x86, with the caveat that Intel is bogged down by decades of legacy.

mr_roboto said:
Here's an example. I've frequently seen people cite arm64's "Javascript instruction" as evidence that it's not really a RISC.

What’s your opinion on memset/memcpy instructions?

mr_roboto said:
I don't think there's any significant use of microcode in mainstream high performance 64-bit Arm core. Maybe in those which still have support for AArch32, but cores like Apple's (where AArch32 is a distant and unsupported memory), not so much.

Microcode - probably not, but micro-ops, yes. I also wonder whether microcode is used much in modern x86 designs, it’s mostly about implementing legacy stuff, right?

mr_roboto said:
The Zen 4 core is much larger than Apple's Firestorm core.

Unless I am misremembering, they should be comparable (especially if you take the cache size into account)? Around 3.5-4mm2?

mr_roboto said:
No, x86 was never particularly high-level.

I meant instructions like ENTER/LEAVE etc.

casperes1996 · Jan 4, 2024

mr_roboto said:
With what I've written above, do you see that the important thing is not how many instructions or even whether they are "advanced"? Keep in mind that some things which seem 'advanced' from the software perspective are dead easy when designing gates, and some things which seem trivial are a giant pain in the butt.

I agree with that, always have. And I find it an argument to why the RISC/CISC terms are bad especially in the modern day, which was my point, so no disagreement with your excellent post

Sydde · Jan 5, 2024

leman said:
… more addressing modes in ARM (pre/post-increment, index register sign extension control — don't remember if x86 has that too) …

This calls for some clarification. The breadth of addressing modes is instruction-dependent. Some opcodes can use up to 4 variations on the address mode, but others do not have those options. In AL, it can look like a load or store has many different address modes, but in machine code, those modes resolve to different base instruction codes.

Intel, by contrast, makes its variety of address modes uniformly available to every instruction that can access memory. An operand may be in a register or in a memory location that can be specified in one of about a dozen different ways. Most code will not use those elaborate address forms combining a base, an offset and a scaled index, but a few will, and the decoder will have to dig through the code stream to figure out what the spec is and how many bytes the instruction takes up.

As to the pre/post index update forms, those are very useful. They allow any general register to behave exactly like a stack pointer (there are no actual instructions that use the dedicated stack pointer implicitly) and also make it easy to scan through arrays of data structures concisely. The Intel design only facilitates the use of the dedicated stack pointer for stacking behavior and requires multiple instructions for scanning large-component arrays.

Intel made some design choices that seemed to make sense in the 1980s but end up wasting code space. The uniform instruction format of a RISC ISA imposes some real limitations on what one instruction can do, but in the practical world, those limitations converge with the way program code actually works.

thebart · Jan 5, 2024

When I first moved from a 12gb PC to a 16gb M1 mini, I basically replicated all my apps and workflow, except for a couple apps that didn't have a Mac version. I found that the Mac filled up memory and hit the swap faster than the PC, even with 4gb more. (I should note that the PC had a dedicated GPU. I don't know how much of a difference that makes, but if I look in the system monitor, I see a few GPU processes that eat up about 500MB each.)

I don't think macs use any less memory or is more memory efficient. Maybe it's faster at swapping so you don't see as much impact. If Apple didn't charge an arm and a leg for memory upgrade everybody would just get 16gb, and so much of this discourse is just copium.

leman · Jan 5, 2024

thebart said:
When I first moved from a 12gb PC to a 16gb M1 mini, I basically replicated all my apps and workflow, except for a couple apps that didn't have a Mac version. I found that the Mac filled up memory and hit the swap faster than the PC, even with 4gb more. (I should note that the PC had a dedicated GPU. I don't know how much of a difference that makes, but if I look in the system monitor, I see a few GPU processes that eat up about 500MB each.)

I don't think macs use any less memory or is more memory efficient. Maybe it's faster at swapping so you don't see as much impact. If Apple didn't charge an arm and a leg for memory upgrade everybody would just get 16gb, and so much of this discourse is just copium.

Did you notice any difference in perceived performance? Hangs, glitches, stutter on any system? Was there a difference in perceived smoothness?

Looking at how fast RAM fills up is not helpful because different systems simply have different behavior. For example, Apple uses system memory for SSD cache. If none of your applications need memory right now the system will happily "steal" multiple GB for this purpose. Hitting the swap also doesn't mean much, only when you repeatedly see heavy swapping activity and memory warnings you can be certain that something is off.

dmccloud · Jan 5, 2024

MacInMotion said:
On Intel Macs, the graphics cards have built-in dedicated memory.

This would only be true for those Macs that shipped with either nVidia or AMD graphics solutions. For the majority of Macs in the Intel era, they were using some variant of Intel's integrated graphics, which means that part of the system RAM was permanently allocated to the GPU and therefore not available to the CPU.

dmccloud · Jan 5, 2024

theorist9 said:
I'm curious—for systems with separate RAM and VRAM, are there particular classes of items that take up a significant amount of VRAM, yet that would not be mirrored in RAM?

For instance are the textures and shaders for a video game copied from RAM to VRAM, after which they can be deleted from RAM? If so, for those particular items, the dGPU's separate memory pool would act as additional memory.

In most cases, the CPU and GPU are handling different data, so there is minimal swapping between system RAM and VRAM. There are other factors at play which preclude VRAM from being treated as additional system RAM. For starters, most dedicated videocards actually are running higher spec DDR than modern CPUs support. For example, even a previous generation Radeon 6700XT is running GDDR6, while current AM4 CPUs only run DDR5 and Intel's 13th/14th gen parts can run either DDR4 or DDR5 depending on the motherboard being used. Beyond the simple generational differences, there are significant differences between DDR and GDDR, including a wider data bus with GDDR and lower power consumption compared to desktop RAM. DDR is also optimized for latency (which is why system builders often talk about memory timings, aka the CLxx numbers) instead of bandwidth (which is how GDDR is designed).

These charts compare GDDR, DDR, and LPDDR both in relation to overall bandwidth and power efficiency. What's interesting to me is that LPDDR beats standard DDR both in bandwidth and power efficiency

thebart · Jan 5, 2024

leman said:
Did you notice any difference in perceived performance? Hangs, glitches, stutter on any system? Was there a difference in perceived smoothness?

Looking at how fast RAM fills up is not helpful because different systems simply have different behavior. For example, Apple uses system memory for SSD cache. If none of your applications need memory right now the system will happily "steal" multiple GB for this purpose. Hitting the swap also doesn't mean much, only when you repeatedly see heavy swapping activity and memory warnings you can be certain that something is off.

Well my PC was 11yo, so of course the m1 is way smoother and faster.

I assume Windows also tries to use as much RAM as possible to speed things up. And Mac OS will reduce cache to avoid unnecessary swapping

Anyway, the topic is whether MacOS+AS is more memory efficient, and my admittedly limited experience says it isn't. It uses sheer processing power and tight integration to swap really well. But swapping only gives you some headroom before you run into a brick wall, otherwise truly NOBODY would need more than 8gb

M1 Owners: Is 16GB the same as Intel or has it better memory management?

macrumors 601

macrumors Core

macrumors member

macrumors Core

macrumors 604

macrumors Core

macrumors 65816

macrumors member

macrumors Core

macrumors 604

macrumors member

macrumors 604

macrumors G3

macrumors Core

macrumors G3

macrumors 6502a

macrumors G4

macrumors Core

macrumors 604

macrumors 68030

macrumors 6502a

macrumors Core

macrumors 68040

macrumors 68040

Attachments

macrumors 6502a

Our Staff