With the switch to AS is there a change in how the system utilizes RAM? I’m curious if it will be more efficient and therefor need less RAM than an Intel Mac. iPads seem to be very snappy with lower amounts of RAM.
I don’t think there will be much difference. It’s still macOS, and thus it still needs to carry and support more stuff compared to iOS. AS will simply improve performance per watt, but imo RAM utilization won’t dramatically change. I’d argue that people will actually need more RAM early on as some apps are still using Rosetta 2, before all apps go native.With the switch to AS is there a change in how the system utilizes RAM? I’m curious if it will be more efficient and therefor need less RAM than an Intel Mac. iPads seem to be very snappy with lower amounts of RAM.
That’s interesting take. It explains why Apple seems to be pushing more RAM recently on iPads and iPhones. Well remember how Apple kept on 1Gb and 2GB of RAM? And then suddenly they ramped up pretty quickly to 4 and now 6GB for the Pro models. Seems that they realized the limitations.iPad Pro fails miserably when trying to add multi-layer effects and filter on a high-res photo on Affinity apps. The general editing experience is top-notch, though.
The same goes for LumaFusion. The timeline smoothness and applying effects are beyond anything you can experience on macOS; however, try to add a big video file and apply complex edits, the app crushes mostly due to the limited RAM amount.
Also, as RISC instruction set uses simpler instructions, thus usually increasing the number of instructions needed for the same work, we could expect slightly more RAM usage compared to the most optimised CISC devices. Still, as the individual instructions are completed in shorter time, it could be balanced. I would say, in this case, it totally depends on the work needed. Both instruction sets could be less RAM hungry depending on the use case.
Also, as RISC instruction set uses simpler instructions, thus usually increasing the number of instructions needed for the same work, we could expect slightly more RAM usage compared to the most optimised CISC devices. Still, as the individual instructions are completed in shorter time, it could be balanced. I would say, in this case, it totally depends on the work needed. Both instruction sets could be less RAM hungry depending on the use case.
The only significant difference I expect in RAM use has to do with how much RAM is occupied by program code.
Basically, while AArch64 is a relatively dense RISC instruction set, x86_64 should be denser for most programs. This means that if you compile the same program twice, once for ARM and once for x86, the size of the compiled x86 binary will be smaller than the ARM binary.
This will not be a huge difference in practice. The amount of RAM occupied by binary code is typically far smaller than the amount used for data.
Another related thing that's going to hurt ARM Macs for a while is that, when Rosetta is used, I expect there to be higher memory overhead for program code than when running a native program. macOS must load a different version of every system framework used by a Rosetta process. Furthermore, the translation process is likely to expand code size well above what you'd get if the program had been properly ported and recompiled for ARM. In cases where a program cannot use Rosetta's ahead-of-time full program translation mode, it will run in JIT mode, so you'll have the even more overhead due to both the original x86 code and cached translated ARM code being resident in memory.
Once again, this isn't necessarily the end of the world. You can still expect data to occupy a lot more RAM than code. But it is real extra overhead which doesn't exist when running that same x86_64 binary on an Intel Mac running Catalina or Big Sur.
For some perspective, this isn't a new type of memory overhead in macOS. It has always existed to some degree during times when Apple supported multiple CPU types. On Mojave, for example, there is dual-library overhead when running both x86_64 and x86_32 programs. Probably the worst case was the early Intel Mac era, when 64-bit Intel Macs could run PPC32, PPC64, x86_32, and x86_64 programs.
Well remember how Apple kept on 1Gb and 2GB of RAM? And then suddenly they ramped up pretty quickly to 4 and now 6GB for the Pro models. Seems that they realized the limitations.
Hopefully Apple is not being cheap on RAM for AS Macs. But I expect 8GB as standard, with 16GB being the amount everybody should be on.
This is a common misconception. ARM instructions are not any “simpler” (better designed maybe), and they certainly don’t do less work per instruction. The code density arguments mostly boils down to two factors:
- ARM is fixed-length, each instruction takes four bytes, while x86 is variable-length, with some instructions only occupying one or two bytes, and some can take up to 15 bytes. If you are optimizing codegen for size, you have much more compression potential on x86 size, if you are fine with generating subpar code
- ARM has separate instructions for loading and storing data from memory, where x86 incorporates these into their regular instructions. If you have code that’s heavy on memory operations but light on everything else, you will get shorter code with x86
I don’t expect major difference here. Factors like code density etc. are rather minor because code doesn’t occupy too much space to begin with. Data layout and alignment is the same between x86-64 and Aarch64. A major difference is that Apple uses 16kb memory pages (finally!) instead of 4kb pages used on Intel, but that’s shouldn’t be to big of a deal.
P.S. A thing does come to my mind. On Intel Macs, Apple sometimes has to reserve some RAM for GPU data synchronization. That won’t be necessary with AS and it’s unified memory.
I read on Pixelmator forums that the reason for Pixelmator Photo app no to have released a Photos extension for iPadOS yet is that the required RAM is much higher than what Apple allows them for extensions. Thus, I think the next iPad Pro should have at least 8 gigs of RAM (which is probably the highest we can hope for, tough).
The wording I use (simpler) was just to summarise in one word between two instruction sets. Otherwise, we can write pages of details. I would not want to cause any misunderstanding.
If you summarise two instruction sets to anyone who doesn't want to go too deep, you can basically say one is more like putting add, sum, read instructions while the other is like factorial, sinus sinus square (again these are figurative terms, not the actual instructions), which is why CISC based devices performs complex works easier while RISC based devices perform simpler tasks easier. RISC device would be like adding multiple "multiply" instruction instead of just one "factorial" instruction, so the code will be longer on RISC compared to CISC.
When I compare 4K 10-bit HEVC video editing comparison between my iPad Pro (LumaFusion) and MacBook Pro (Final Cut Pro), if the project is basic colour and light corrections, iPad Pro performs the export in half as much time. However, when I add too many effects, apply a LUT, increase the number of layers, MacBook Pro gets ahead.
A lot of legacy software isn't optimized for data alignment though and that's something that developers might want to do in migrating to RISC.
The wording I use (simpler) was just to summarise in one word between two instruction sets. Otherwise, we can write pages of details. I would not want to cause any misunderstanding.
If you summarise two instruction sets to anyone who doesn't want to go too deep, you can basically say one is more like putting add, sum, read instructions while the other is like factorial, sinus sinus square (again these are figurative terms, not the actual instructions), which is why CISC based devices performs complex works easier while RISC based devices perform simpler tasks easier. RISC device would be like adding multiple "multiply" instruction instead of just one "factorial" instruction, so the code will be longer on RISC compared to CISC.
Can you elaborate on this a bit more? As far as I know, unaligned memory access is undefined behavior in C and C++ (forbidden by the standard). So any code that does weird pointer trickery to violate alignment is invalid code and might crash or produce invalid results on any platform. Also, as far as I know modern ARM supports unaligned memory access similar to modern x86 CPUs — it should work in most cases, but it will be slower. An exception are SIMD data types, where invalid memory access will likely crash the CPU.
I'm talking about operations in a packed structure where you want to access something that's smaller than the natural wordsize. So if you want to access a word in ARM, you'd have to load and mask it. Or, if you wanted to write a word, you have to load the original long, modify just the part you wanted to modify it and then write out the long whereas this is just one instruction on CISC. If ARM has instructions to write out a single byte, word, etc, then I'm wrong - I've never really looked at ARM. My last look at RISC was PowerPC.
If you had a data structure with 16-bit words, the consideration might be to go with 64-bit longs to improve performance. There would be an obvious penalty in RAM usage but it might be a worthwhile consideration.
My point is that we should stop talking of RISC and CISC and should instead look at the real differences in the ISA. Which for ARM64 vs. AMD64 basically boils down to instruction length and load/store vs. hybrid instruction design.
One would need to double-check the ARM manuals to be 100% certain, but I am quite sure that Aarch64 supports all the common data type widths and can load/store them directly (including bytes, halfwords etc.). You don't need to mask anything. And I am also quite sure that it support unaligned memory reads, albeit your performance might suffer.
I don't care enough to read an architecture manual. I had a quick look at the V8 manual and couldn't find it definitively.
I can find what I need in the Intel x64 docs because I know where everything is.
Yeah, ARM assembly manuals can be confusing.
Anyway: https://developer.arm.com/architect...uction-set-architecture/loads-and-stores-size
As you can see, the instructions take a size suffix that determines the width of the data to be transferred. I couldn't find the information unavailable data sizes on the official ARM website, but I found this: https://modexp.wordpress.com/2018/10/30/arm64-assembly/#datatypes
So it seems to me that everything one would want is supported directly.
It sounds like ARM is more CISC than RISC.
Compared to x86, I find it very elegant. Of course, it’s a modern, newly designed ISA that makes a clean break vs. something that has grown over multiple decades.