Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

m1maverick

macrumors 65816
Nov 22, 2020
1,368
1,267
here you are, right from the other thread :

Pipeline complexity,

x86 - 5+9×N
ARM - 4×N

Above assuming zero cache misses,

ARM can execute instructions without waiting for condition checks
ARM Requires alot less registers to move memory around
x86 everything has to be stored in memory , most of the code we had was moving around data, arm not so much.

an example :

PCAP traffic required us to write it all in memory as it was coming off the line , than flush to nvme array. ARM we can push data off the line right into the disk array by using significantly less memory , i think we are around 12GB total vs 200ish gb before.

inspecting that traffic can be done right from the array , vs loading large chunks into memory to inspect.


i would never argue more memory is bad , my argument is how arm handles memory in the first place , how the code is designed. how it works in a technical aspect. any developer in the apple ecosystem does not get a complete picture of how it works since apple is doing alot of this on their own. if you go down the road of building arm applications and understanding how it works. it becomes a much clearer picture about how it is more efficient.


all can be easily proven right in the white papers for AARCH.

I have been developing on Graviton for years now.
None of this addresses memory requirements. It speaks to performance of ARM over x64 which is not the topic of this discussion.

The M1 Macs have been out for approximately six months now. Are there no comparisons of a 8GB M1 system having comparable levels of memory paging compared to a 16GB x64 system for data sets which require 16GB of RAM?
 

m1maverick

macrumors 65816
Nov 22, 2020
1,368
1,267
Wait, I don't follow. ARMv8 uses condition flags jut like x86. What exactly do you mean here? Also, how does ARM use less registers to move memory? Did you mean to say that it does not suffer as much from register pressure since it has more registers?
ARM does include instructions which are conditionally executed. Obviously the processor has to perform a comparison to know whether or not to execute said instructions. With these instructions the programmer doesn't have to explicitly code a comparison / branch instruction, those are implicit in the instructions themselves. It's a nice optimization.
 

m1maverick

macrumors 65816
Nov 22, 2020
1,368
1,267
Different architectures or implementations may have different branch prediction units or strategies, but AFAIK all modern CPUs absolutely do execute instructions from one of the possible branches while waiting for the result of the condition, so I'm also confused about what he meant here.
I think what he's referring to are instructions for which some test is part of the instruction and, if the condition is true, the instruction is executed.

An example, which is completely made up for illustrative purposes:

INC R1​

This instruction increments the R1 register regardless of any condition. Now, using a conditional instruction for the same purpose:

INCNZ R1​

This instruction looks at the zero flag and, if it is not zero, increments the R1 register. To code this using "traditional" instructions you'd have something like this:

COMP R1​
BRCZR somewhere ; BRCZR stands for "branch zero" in my hypothetical processor​
INC R1​

somewhere:​
... code path continues​

This simplifies coding, reduces memory requests / requirements (as two fewer instructions are required), and avoids a branch. It's a nice optimization and I see no reason it cannot be implemented in x64. What it doesn't do is halve the memory requirements.
 

falainber

macrumors 68040
Mar 16, 2016
3,539
4,136
Wild West
JFC, so you want it spoonfed to you? Go down in the thread, and it applies in Rosetta as well. The M1 Macs can release unused memory more quickly than Intel ones - even under Rosetta. M1 uses a unified memory architecture, so no need to copy data between main RAM to VRAM, etc. All of the memory is available to the GPU, GPU, and Neural Engine. Less duplication of data, and no wasted time copying between pools of memory. Even with an iGPU sharing memory, the pools are completely separate and data must be copied back and forth between the two.

No, this does not add up to 8GB == 16GB, but M1 is more efficient with memory overall. This isn't necessarily restricted to arm64 vs x64, as the iPhone will still outperform other devices with more memory.
Let's analyze your claims one by one...
The M1 Macs can release unused memory more quickly than Intel ones - even if that was true, it would only increase the performance. It would not help with memory size in any way.

M1 uses a unified memory architecture, so no need to copy data between main RAM to VRAM, etc. - so, you are saying that the system without unified memory actually has more memory than, say, the 8GB spec (because this spec does not account for GPU's memory. For these systems, in theory it might be possible to copy the data from RAM to VRAM and delete it in RAM (in reality, most video objects get created in VRAM and stay there thus not using RAM at all - something that is not possible with unified memory architecture)

All of the memory is available to the GPU, GPU, and Neural Engine. Less duplication of data, and no wasted time copying between pools of memory. Even with an iGPU sharing memory, the pools are completely separate and data must be copied back and forth between the two. - this has nothing to do with memory size.

This isn't necessarily restricted to arm64 vs x64, as the iPhone will still outperform other devices with more memory. - that's just neither here nor there. The main reason why iPhone can get away with less memory is the crippled multitasking where iOS simply won't allow any real background apps/processes to operate. Apple can do similar tricks in desktop OS but then it would not be true desktop OS.
 
  • Like
Reactions: m1maverick

leman

macrumors Core
Oct 14, 2008
19,521
19,678
ARM does include instructions which are conditionally executed. Obviously the processor has to perform a comparison to know whether or not to execute said instructions. With these instructions the programmer doesn't have to explicitly code a comparison / branch instruction, those are implicit in the instructions themselves. It's a nice optimization.

If you are talking about conditional moves, x86 has them too. If you are talking about instruction predication, that was dropped in 64-bit ARM because contrary to the popular belief it’s not good for performance (limits OOE/makes dependency tracking complicated).

Edit: just saw your example. Yes, ARMv8 does have conditional increment. While x86 since can implement the same via inc+cmov or branch+inc and actually use less instruction encoding space, there are register pressure implications so ARM way might be a bit more efficient overall. Still, both need a preceding comparison instruction, so I still don't understand what @MacModMachine means with ARM not needing to wait for condition checks.
 
Last edited:

ha1o2surfer

macrumors 6502
Sep 24, 2013
425
46
I like how Apple talks about unified memory as something special. Hasn't intel/AMD, heck anybody with integrated graphics, been using this for like the past 10 years? lol
 

leman

macrumors Core
Oct 14, 2008
19,521
19,678
I like how Apple talks about unified memory as something special. Hasn't intel/AMD, heck anybody with integrated graphics, been using this for like the past 10 years? lol

Yes, that is indeed correct. But shared/unified memory have been an economy feature for previous implementations. Apple’s innovation is that they are bringing it to the high-performance consumer market.

M1 is not quite there yet, it’s an entry-level chip after all, but is shows what is possible.
 
  • Like
Reactions: jdb8167 and AdamNC
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.