RAM Requirements With AS

mailman199 · Nov 7, 2020

With the switch to AS is there a change in how the system utilizes RAM? I’m curious if it will be more efficient and therefor need less RAM than an Intel Mac. iPads seem to be very snappy with lower amounts of RAM.

Woochoo · Nov 7, 2020

The OS can probably be less RAM hungry as Apple had to work with that low RAM (to increase benefits) for their other ARM OSses, so they have experience on optimizing that. Don't know if Big Sur will make a big difference being the first one though

For the software depends on usage: if you are editing there's no way around of using as much RAM as the videos/images you use.

I'm honestly more interested in the ASICs that Apple Silicon carries (Neural Engine, Machine Learning accelerator, etc) as those really can make a difference in many software vs more generic x86 processors when developers start using those.

ian87w · Nov 7, 2020

mailman199 said:
With the switch to AS is there a change in how the system utilizes RAM? I’m curious if it will be more efficient and therefor need less RAM than an Intel Mac. iPads seem to be very snappy with lower amounts of RAM.

I don’t think there will be much difference. It’s still macOS, and thus it still needs to carry and support more stuff compared to iOS. AS will simply improve performance per watt, but imo RAM utilization won’t dramatically change. I’d argue that people will actually need more RAM early on as some apps are still using Rosetta 2, before all apps go native.

pshufd · Nov 7, 2020

x64 has byte and word addressing that you typically don't see in RISC instruction sets so I would expect AS to use more RAM than x64. You get better performance, though, with native addressing.

Serban55 · Nov 7, 2020

Since Apple placed 16gb Ram for the dev kit mac mini, dont expect nothing new regarding the RAM situation
its full MACOS
no wonder ipad pro cant run macOS well even under jailbreak, you need at least 8gb ram...
So expect first arm macbooks to still have at least 8gb ram and 16 BTO probably

mr_roboto · Nov 8, 2020

The only significant difference I expect in RAM use has to do with how much RAM is occupied by program code.

Basically, while AArch64 is a relatively dense RISC instruction set, x86_64 should be denser for most programs. This means that if you compile the same program twice, once for ARM and once for x86, the size of the compiled x86 binary will be smaller than the ARM binary.

This will not be a huge difference in practice. The amount of RAM occupied by binary code is typically far smaller than the amount used for data.

Another related thing that's going to hurt ARM Macs for a while is that, when Rosetta is used, I expect there to be higher memory overhead for program code than when running a native program. macOS must load a different version of every system framework used by a Rosetta process. Furthermore, the translation process is likely to expand code size well above what you'd get if the program had been properly ported and recompiled for ARM. In cases where a program cannot use Rosetta's ahead-of-time full program translation mode, it will run in JIT mode, so you'll have the even more overhead due to both the original x86 code and cached translated ARM code being resident in memory.

Once again, this isn't necessarily the end of the world. You can still expect data to occupy a lot more RAM than code. But it is real extra overhead which doesn't exist when running that same x86_64 binary on an Intel Mac running Catalina or Big Sur.

For some perspective, this isn't a new type of memory overhead in macOS. It has always existed to some degree during times when Apple supported multiple CPU types. On Mojave, for example, there is dual-library overhead when running both x86_64 and x86_32 programs. Probably the worst case was the early Intel Mac era, when 64-bit Intel Macs could run PPC32, PPC64, x86_32, and x86_64 programs.

leman · Nov 8, 2020

I don’t expect major difference here. Factors like code density etc. are rather minor because code doesn’t occupy too much space to begin with. Data layout and alignment is the same between x86-64 and Aarch64. A major difference is that Apple uses 16kb memory pages (finally!) instead of 4kb pages used on Intel, but that’s shouldn’t be to big of a deal.

P.S. A thing does come to my mind. On Intel Macs, Apple sometimes has to reserve some RAM for GPU data synchronization. That won’t be necessary with AS and it’s unified memory.

aeronatis · Nov 8, 2020

iPad Pro fails miserably when trying to add multi-layer effects and filter on a high-res photo on Affinity apps. The general editing experience is top-notch, though.

The same goes for LumaFusion. The timeline smoothness and applying effects are beyond anything you can experience on macOS; however, try to add a big video file and apply complex edits, the app crushes mostly due to the limited RAM amount.

Also, as RISC instruction set uses simpler instructions, thus usually increasing the number of instructions needed for the same work, we could expect slightly more RAM usage compared to the most optimised CISC devices. Still, as the individual instructions are completed in shorter time, it could be balanced. I would say, in this case, it totally depends on the work needed. Both instruction sets could be less RAM hungry depending on the use case.

ian87w · Nov 8, 2020

aeronatis said:
iPad Pro fails miserably when trying to add multi-layer effects and filter on a high-res photo on Affinity apps. The general editing experience is top-notch, though.

The same goes for LumaFusion. The timeline smoothness and applying effects are beyond anything you can experience on macOS; however, try to add a big video file and apply complex edits, the app crushes mostly due to the limited RAM amount.

Also, as RISC instruction set uses simpler instructions, thus usually increasing the number of instructions needed for the same work, we could expect slightly more RAM usage compared to the most optimised CISC devices. Still, as the individual instructions are completed in shorter time, it could be balanced. I would say, in this case, it totally depends on the work needed. Both instruction sets could be less RAM hungry depending on the use case.

That’s interesting take. It explains why Apple seems to be pushing more RAM recently on iPads and iPhones. Well remember how Apple kept on 1Gb and 2GB of RAM? And then suddenly they ramped up pretty quickly to 4 and now 6GB for the Pro models. Seems that they realized the limitations.

Hopefully Apple is not being cheap on RAM for AS Macs. But I expect 8GB as standard, with 16GB being the amount everybody should be on.

leman · Nov 8, 2020

aeronatis said:
Also, as RISC instruction set uses simpler instructions, thus usually increasing the number of instructions needed for the same work, we could expect slightly more RAM usage compared to the most optimised CISC devices. Still, as the individual instructions are completed in shorter time, it could be balanced. I would say, in this case, it totally depends on the work needed. Both instruction sets could be less RAM hungry depending on the use case.

This is a common misconception. ARM instructions are not any “simpler” (better designed maybe), and they certainly don’t do less work per instruction. The code density arguments mostly boils down to two factors:

ARM is fixed-length, each instruction takes four bytes, while x86 is variable-length, with some instructions only occupying one or two bytes, and some can take up to 15 bytes. If you are optimizing codegen for size, you have much more compression potential on x86 size, if you are fine with generating subpar code
ARM has separate instructions for loading and storing data from memory, where x86 incorporates these into their regular instructions. If you have code that’s heavy on memory operations but light on everything else, you will get shorter code with x86

pshufd · Nov 8, 2020

mr_roboto said:
The only significant difference I expect in RAM use has to do with how much RAM is occupied by program code.

Basically, while AArch64 is a relatively dense RISC instruction set, x86_64 should be denser for most programs. This means that if you compile the same program twice, once for ARM and once for x86, the size of the compiled x86 binary will be smaller than the ARM binary.

This will not be a huge difference in practice. The amount of RAM occupied by binary code is typically far smaller than the amount used for data.

Another related thing that's going to hurt ARM Macs for a while is that, when Rosetta is used, I expect there to be higher memory overhead for program code than when running a native program. macOS must load a different version of every system framework used by a Rosetta process. Furthermore, the translation process is likely to expand code size well above what you'd get if the program had been properly ported and recompiled for ARM. In cases where a program cannot use Rosetta's ahead-of-time full program translation mode, it will run in JIT mode, so you'll have the even more overhead due to both the original x86 code and cached translated ARM code being resident in memory.

Once again, this isn't necessarily the end of the world. You can still expect data to occupy a lot more RAM than code. But it is real extra overhead which doesn't exist when running that same x86_64 binary on an Intel Mac running Catalina or Big Sur.

For some perspective, this isn't a new type of memory overhead in macOS. It has always existed to some degree during times when Apple supported multiple CPU types. On Mojave, for example, there is dual-library overhead when running both x86_64 and x86_32 programs. Probably the worst case was the early Intel Mac era, when 64-bit Intel Macs could run PPC32, PPC64, x86_32, and x86_64 programs.

I worked on VAX to Alpha ports when I was a Digital Equipment Corporation and I had to read crash dumps from time to time and one of the things that surprised me was how much bigger the code was for RISC over CISC. Program sizes were a lot bigger as well as you need more instructions to do the same thing.

The easiest way to show it would be to compile a program under both and turn on the assembler listing output and you could see the difference in the number of instructions used which would be greater for RISC.

One other thing: you can do a straight port but you can also port for performance which would mean changing data structures for naturally aligned structures. This could mean more memory usage. In the old days, memory was precious, so you'd never think of doing this but RAM is really cheap these days and it might be a consideration for applications where the best performance is required.

aeronatis · Nov 8, 2020

ian87w said:
Well remember how Apple kept on 1Gb and 2GB of RAM? And then suddenly they ramped up pretty quickly to 4 and now 6GB for the Pro models. Seems that they realized the limitations.

Hopefully Apple is not being cheap on RAM for AS Macs. But I expect 8GB as standard, with 16GB being the amount everybody should be on.

I read on Pixelmator forums that the reason for Pixelmator Photo app no to have released a Photos extension for iPadOS yet is that the required RAM is much higher than what Apple allows them for extensions. Thus, I think the next iPad Pro should have at least 8 gigs of RAM (which is probably the highest we can hope for, tough).

leman said:
This is a common misconception. ARM instructions are not any “simpler” (better designed maybe), and they certainly don’t do less work per instruction. The code density arguments mostly boils down to two factors:

ARM is fixed-length, each instruction takes four bytes, while x86 is variable-length, with some instructions only occupying one or two bytes, and some can take up to 15 bytes. If you are optimizing codegen for size, you have much more compression potential on x86 size, if you are fine with generating subpar code

ARM has separate instructions for loading and storing data from memory, where x86 incorporates these into their regular instructions. If you have code that’s heavy on memory operations but light on everything else, you will get shorter code with x86

The wording I use (simpler) was just to summarise in one word between two instruction sets. Otherwise, we can write pages of details. I would not want to cause any misunderstanding.

If you summarise two instruction sets to anyone who doesn't want to go too deep, you can basically say one is more like putting add, sum, read instructions while the other is like factorial, sinus sinus square (again these are figurative terms, not the actual instructions), which is why CISC based devices performs complex works easier while RISC based devices perform simpler tasks easier. RISC device would be like adding multiple "multiply" instruction instead of just one "factorial" instruction, so the code will be longer on RISC compared to CISC.

When I compare 4K 10-bit HEVC video editing comparison between my iPad Pro (LumaFusion) and MacBook Pro (Final Cut Pro), if the project is basic colour and light corrections, iPad Pro performs the export in half as much time. However, when I add too many effects, apply a LUT, increase the number of layers, MacBook Pro gets ahead.

pshufd · Nov 8, 2020

leman said:
I don’t expect major difference here. Factors like code density etc. are rather minor because code doesn’t occupy too much space to begin with. Data layout and alignment is the same between x86-64 and Aarch64. A major difference is that Apple uses 16kb memory pages (finally!) instead of 4kb pages used on Intel, but that’s shouldn’t be to big of a deal.

P.S. A thing does come to my mind. On Intel Macs, Apple sometimes has to reserve some RAM for GPU data synchronization. That won’t be necessary with AS and it’s unified memory.

A lot of legacy software isn't optimized for data alignment though and that's something that developers might want to do in migrating to RISC.

pshufd · Nov 8, 2020

aeronatis said:
I read on Pixelmator forums that the reason for Pixelmator Photo app no to have released a Photos extension for iPadOS yet is that the required RAM is much higher than what Apple allows them for extensions. Thus, I think the next iPad Pro should have at least 8 gigs of RAM (which is probably the highest we can hope for, tough).

The wording I use (simpler) was just to summarise in one word between two instruction sets. Otherwise, we can write pages of details. I would not want to cause any misunderstanding.

If you summarise two instruction sets to anyone who doesn't want to go too deep, you can basically say one is more like putting add, sum, read instructions while the other is like factorial, sinus sinus square (again these are figurative terms, not the actual instructions), which is why CISC based devices performs complex works easier while RISC based devices perform simpler tasks easier. RISC device would be like adding multiple "multiply" instruction instead of just one "factorial" instruction, so the code will be longer on RISC compared to CISC.

When I compare 4K 10-bit HEVC video editing comparison between my iPad Pro (LumaFusion) and MacBook Pro (Final Cut Pro), if the project is basic colour and light corrections, iPad Pro performs the export in half as much time. However, when I add too many effects, apply a LUT, increase the number of layers, MacBook Pro gets ahead.

I really hope that Apple isn't a pain on RAM. The system I'm running on has 64 GB of RAM and there are two additional slots for another 64 GB of RAM. I suspect that I could put 64 GB sticks in when those are available. My workday workload uses a bit under half of that but I like the idea that I have plenty of memory left if I need to run additional applications or workloads. I love the ability to test with up to five virtual machines on it where I don't have to be skimpy with RAM allocation.

I think that Apple Silicon is going to be great and I expect Apple to charge through the nose for it. If they price AS reasonably, they could get an insane amount of PC marketshare. This could be like the iPhone all over again where Apple has a significant technological advantage that takes many years for the PC industry to catch up to.

leman · Nov 8, 2020

pshufd said:
A lot of legacy software isn't optimized for data alignment though and that's something that developers might want to do in migrating to RISC.

Can you elaborate on this a bit more? As far as I know, unaligned memory access is undefined behavior in C and C++ (forbidden by the standard). So any code that does weird pointer trickery to violate alignment is invalid code and might crash or produce invalid results on any platform. Also, as far as I know modern ARM supports unaligned memory access similar to modern x86 CPUs — it should work in most cases, but it will be slower. An exception are SIMD data types, where invalid memory access will likely crash the CPU.

aeronatis said:
The wording I use (simpler) was just to summarise in one word between two instruction sets. Otherwise, we can write pages of details. I would not want to cause any misunderstanding.

If you summarise two instruction sets to anyone who doesn't want to go too deep, you can basically say one is more like putting add, sum, read instructions while the other is like factorial, sinus sinus square (again these are figurative terms, not the actual instructions), which is why CISC based devices performs complex works easier while RISC based devices perform simpler tasks easier. RISC device would be like adding multiple "multiply" instruction instead of just one "factorial" instruction, so the code will be longer on RISC compared to CISC.

I understand your intent, but I just wanted to make clear that what you write is factually incorrect. It might have been appropriate twenty or so years ago, where we still had "RISC" or "CISC", but these considerations do not apply to modern CPUs and ISAs. Yes, x86-64 has some rare "CISC-style" complex instructions like "enter" (which are mostly remnants from the past and are basically deprecated due to their horrible performance), but so does ARM (e.g. loading/storing multiple registers in one instruction, matrix multiply instructions, vector table lookups etc. — and they are FAST). There is certainly code where you will need multiple ARM instructions to convey what a single x86-64 instruction can do — but the reverse is true as well. And if you get into modern extensions like SVE, just few ARM instructions can replace several dozens of x86-64 SIMD code.

My point is that we should stop talking of RISC and CISC and should instead look at the real differences in the ISA. Which for ARM64 vs. AMD64 basically boils down to instruction length and load/store vs. hybrid instruction design.

pshufd · Nov 8, 2020

leman said:
Can you elaborate on this a bit more? As far as I know, unaligned memory access is undefined behavior in C and C++ (forbidden by the standard). So any code that does weird pointer trickery to violate alignment is invalid code and might crash or produce invalid results on any platform. Also, as far as I know modern ARM supports unaligned memory access similar to modern x86 CPUs — it should work in most cases, but it will be slower. An exception are SIMD data types, where invalid memory access will likely crash the CPU.

I'm talking about operations in a packed structure where you want to access something that's smaller than the natural wordsize. So if you want to access a word in ARM, you'd have to load and mask it. Or, if you wanted to write a word, you have to load the original long, modify just the part you wanted to modify it and then write out the long whereas this is just one instruction on CISC. If ARM has instructions to write out a single byte, word, etc, then I'm wrong - I've never really looked at ARM. My last look at RISC was PowerPC.

If you had a data structure with 16-bit words, the consideration might be to go with 64-bit longs to improve performance. There would be an obvious penalty in RAM usage but it might be a worthwhile consideration.

leman · Nov 8, 2020

pshufd said:
I'm talking about operations in a packed structure where you want to access something that's smaller than the natural wordsize. So if you want to access a word in ARM, you'd have to load and mask it. Or, if you wanted to write a word, you have to load the original long, modify just the part you wanted to modify it and then write out the long whereas this is just one instruction on CISC. If ARM has instructions to write out a single byte, word, etc, then I'm wrong - I've never really looked at ARM. My last look at RISC was PowerPC.

If you had a data structure with 16-bit words, the consideration might be to go with 64-bit longs to improve performance. There would be an obvious penalty in RAM usage but it might be a worthwhile consideration.

One would need to double-check the ARM manuals to be 100% certain, but I am quite sure that Aarch64 supports all the common data type widths and can load/store them directly (including bytes, halfwords etc.). You don't need to mask anything. And I am also quite sure that it support unaligned memory reads, albeit your performance might suffer.

aeronatis · Nov 8, 2020

leman said:
My point is that we should stop talking of RISC and CISC and should instead look at the real differences in the ISA. Which for ARM64 vs. AMD64 basically boils down to instruction length and load/store vs. hybrid instruction design.

That I agree with... Saying the current ARM instruction set is a limited number of simple instructions would be incorrect (again that was not what I meant at all). The current Apple chips use the ARM instruction set for the general purpose cores; however, there are also GPU cores, Neural Engine cores, video accelerators and many others, so it is not an ARM chip per se. Still, the photo/video editing examples I mentioned above shows that the simple photo/video editing tasks are easier on A12X compared to i9-9880H/Radeon Pro 5500M combo whereas the latter has more raw power. So, yes, of course, the situation is much more different than how it was 20 years ago. After all, RISC and CISC are design languages and what we should compare are actually x86 vs ARM64, which still has similar differences. Their differences in that x86 tries to finish a task in as fewer as instruction possible compared to ARM64 still apply. Your previous comment:

"If you have code that’s heavy on memory operations but light on everything else, you will get shorter code with x86"

is actually what I wanted to point out anyway. You just did it better. 👍🏼

iKrivetko · Nov 8, 2020

I honestly don't understand what people making threads like these expect to see.
Nobody knows that except for Apple engineers and even if one were to read this topic, they would not give an answer until the products are released.

pshufd · Nov 8, 2020

leman said:
One would need to double-check the ARM manuals to be 100% certain, but I am quite sure that Aarch64 supports all the common data type widths and can load/store them directly (including bytes, halfwords etc.). You don't need to mask anything. And I am also quite sure that it support unaligned memory reads, albeit your performance might suffer.

I don't care enough to read an architecture manual. I had a quick look at the V8 manual and couldn't find it definitively.

I can find what I need in the Intel x64 docs because I know where everything is.

leman · Nov 8, 2020

pshufd said:
I don't care enough to read an architecture manual. I had a quick look at the V8 manual and couldn't find it definitively.

I can find what I need in the Intel x64 docs because I know where everything is.

Yeah, ARM assembly manuals can be confusing.

Anyway: https://developer.arm.com/architect...uction-set-architecture/loads-and-stores-size

As you can see, the instructions take a size suffix that determines the width of the data to be transferred. I couldn't find the information unavailable data sizes on the official ARM website, but I found this: https://modexp.wordpress.com/2018/10/30/arm64-assembly/#datatypes

So it seems to me that everything one would want is supported directly.

pshufd · Nov 8, 2020

leman said:
Yeah, ARM assembly manuals can be confusing.

Anyway: https://developer.arm.com/architect...uction-set-architecture/loads-and-stores-size

As you can see, the instructions take a size suffix that determines the width of the data to be transferred. I couldn't find the information unavailable data sizes on the official ARM website, but I found this: https://modexp.wordpress.com/2018/10/30/arm64-assembly/#datatypes

So it seems to me that everything one would want is supported directly.

It sounds like ARM is more CISC than RISC.

mikeboss · Nov 8, 2020

11.0.1 Release Candidate build 20B5022a

fresh boot, exactly the same set of apps started on both machines

intel:

arm:

leman · Nov 8, 2020

pshufd said:
It sounds like ARM is more CISC than RISC.

Compared to x86, I find it very elegant. Of course, it’s a modern, newly designed ISA that makes a clean break vs. something that has grown over multiple decades.

pshufd · Nov 8, 2020

leman said:
Compared to x86, I find it very elegant. Of course, it’s a modern, newly designed ISA that makes a clean break vs. something that has grown over multiple decades.

Legacy CISC has a lot of cruft.

I recall that the VAX architecture had this may instruction that was truly huge.

RAM Requirements With AS

macrumors 6502

macrumors 6502a

macrumors G3

macrumors G4

Suspended

macrumors 6502a

macrumors Core

macrumors regular

macrumors G3

macrumors Core

macrumors G4

macrumors regular

macrumors G4

macrumors G4

macrumors Core

macrumors G4

macrumors Core

macrumors regular

macrumors 6502a

macrumors G4

macrumors Core

macrumors G4

macrumors 68000

macrumors Core

macrumors G4

Our Staff