Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

name99

macrumors 68020
Jun 21, 2004
2,410
2,317
ARMv9 is ARMv8 with few extra instruction. Its really much less of a deal than people make of it.
The real question is whether Apple believes in SVE or not. It's unclear that SVE is a good idea.
The concept of "length independent vector processing" is a good idea, but the specific details of SIMD are not the only way to do this; and there's plenty of evidence that Apple has a very different sort of idea in mind.

But if there's anything you should know about Tim's Apple, it's that they don't ship until they are ready. The ideas they have for indefinite-length vector processing were first thought about around 2000, with constant revision for the next ten years. My guess is there was then a hiatus during which they negotiated with ARM (apparently unsuccessfully)?

If they are going to ship in the next few years, they probably want good answers as to how this interacts with AMX (and the way AMX is growing to become something like AVX512 for Apple)?
If they are willing to make the instruction set for AMX visible (in a way that it is not right now; and probably after redesigning it based on what they have learned in the past few years, so that generic code compiles to it, not just function calls through Accelerate) perhaps that's an optimal solution?
 
  • Like
Reactions: jido

leman

macrumors Core
Oct 14, 2008
19,521
19,674
The real question is whether Apple believes in SVE or not. It's unclear that SVE is a good idea.
The concept of "length independent vector processing" is a good idea, but the specific details of SIMD are not the only way to do this; and there's plenty of evidence that Apple has a very different sort of idea in mind.

I would love to have SVE just for masks and better support for arrays of arbitrary length. The vector width can stay 128bit for all I care. Frankly, I’d take another 128 EU (or two) over widening the SIMD width to 256bit.

If they are going to ship in the next few years, they probably want good answers as to how this interacts with AMX (and the way AMX is growing to become something like AVX512 for Apple)?
If they are willing to make the instruction set for AMX visible (in a way that it is not right now; and probably after redesigning it based on what they have learned in the past few years, so that generic code compiles to it, not just function calls through Accelerate) perhaps that's an optimal solution?

Thats the interesting question, isn’t it? I still have a strong suspicion that streaming SVE+SME was modeled after Apples AMX, after all, it was the first hardware unit of this type on the market if I understand correctly (and both use outer product instead of full matmul). Apple could probably easily support streaming SVE/SME for AMX programming, but from what I understand AMX actually has additional capabilities (like the very interesting LUT instruction and some addressing modes? Not sure)

I just hope that we will have a stable public ISA for programming the AMX at some point, as BLAS alone is not cutting it.
 
  • Like
Reactions: EntropyQ3

s1oplus

macrumors regular
Aug 24, 2023
105
23
I think they should make 16gb the default ram size along with 512 gb ssd. 10 core cpu 12 core gpu on the lower end and 10 cpu 14 gpu on the higher end. Less power consumption and heat. Plus i think they will finaly redesign the macbook pro 13 inch or discontinue it for this new chip.
 

dmccloud

macrumors 68040
Sep 7, 2009
3,142
1,899
Anchorage, AK
I think this requires an architecture change. I don't think Apple wants to do another transition, because they are still with Intel to arm.

Apple's licensing terms for ARM include the ISA only. They are not beholden to the existing ARM architecture like Qualcomm or other ARM customers. Apple can (and has) changed the SoC architecture with its A series (and now M series) SoCs for years at this point. The other difference with Apple's licensing terms is that they can add their own code onto the ISA, some of which has been backported into the ARM instruction set itself.
 
  • Like
Reactions: Tagbert

WilliApple

macrumors 6502a
Feb 19, 2022
984
1,427
Colorado
Apple's licensing terms for ARM include the ISA only. They are not beholden to the existing ARM architecture like Qualcomm or other ARM customers. Apple can (and has) changed the SoC architecture with its A series (and now M series) SoCs for years at this point. The other difference with Apple's licensing terms is that they can add their own code onto the ISA, some of which has been backported into the ARM instruction set itself.
What I meant was 32 bit from A6 to 64 bits to A7. That I consider is an an architecture change because when Apple deprecated the translation. I believe arm v8 and arm v9 will be a similar thing. Apple is already doing x86 to Arm.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,674
What I meant was 32 bit from A6 to 64 bits to A7. That I consider is an an architecture change because when Apple deprecated the translation. I believe arm v8 and arm v9 will be a similar thing.

Yes, moving from aarch32 to aarch64 was an architectural change. ARMv9 does not include any changes of this nature at all. I am confused what you are basing your beliefs on. Did you read the ARM architecture documentation?
 

NT1440

macrumors Pentium
May 18, 2008
15,092
22,158
Yes, moving from aarch32 to aarch64 was an architectural change. ARMv9 does not include any changes of this nature at all. I am confused what you are basing your beliefs on. Did you read the ARM architecture documentation?
There seems to be a total misunderstanding of v9 floating around out there. Last I looked into it, it’s basically v8 with some additional security stuff that Apple long ago already implemented (and got folded into the v9 spec). Little to nothing that would relate to performance gains.

Am I way off base here?
 

leman

macrumors Core
Oct 14, 2008
19,521
19,674
There seems to be a total misunderstanding of v9 floating around out there. Last I looked into it, it’s basically v8 with some additional security stuff that Apple long ago already implemented (and got folded into the v9 spec). Little to nothing that would relate to performance gains.

Am I way off base here?

ARMv9 is just ARMv8.5+ with some new instructions. It's fully backwards compatible with previous versions. From what I understand, the focus of the release is on secure virtualisation (mostly seems relevant for server/cloud), but they also introduce some new vector and matrix instructions (SVE2).

But definitely not an architecture change. Code written for previous Aarch64 will continue running as expected on a newer ARMv9 CPUs.
 
  • Like
Reactions: Tagbert and koyoot

WilliApple

macrumors 6502a
Feb 19, 2022
984
1,427
Colorado
Yes, moving from aarch32 to aarch64 was an architectural change. ARMv9 does not include any changes of this nature at all. I am confused what you are basing your beliefs on. Did you read the ARM architecture documentation?
Oh, I was just assuming.
 

Rychiar

macrumors 68040
May 16, 2006
3,065
6,514
Waterbury, CT
Corrected, thanks

The max RAM is sadly short of the 2019 Mac Pro's 1.5TB. At the rate Apple's doing this it may take a decade or two to reach that amount. All for the sake of economies of scale.
That was 1.5TB of third party RAM. 1.5tb of apple ram would prob be like 50 grand lol
 

name99

macrumors 68020
Jun 21, 2004
2,410
2,317
I heard "10% performance improvement" - That doesn't seem like a huge bump over A16...
More interesting is the claim of wider decode and execution; we expected the improved branch prediction, I've already written up the various elements of that.

Maybe on the A chip they took all the performance boost in the form of IPC; and kept frequency flat or even reduced to save energy? Certainly it seems like for phones people want an extra 10% battery life more than 10% faster P-core?
 

Confused-User

macrumors 6502a
Oct 14, 2014
852
986
I heard "10% performance improvement" - That doesn't seem like a huge bump over A16...
Yes. Incredibly disappointing. Shocking, even. I mean, there's a small chance they've reduced clocks and the IPC gain is better but I'm not counting on it.

The news on the NPUs is good - potentially VERY significant for some people. For GPUs, who can say? The quoted performance for ray tracing is weird (4x software?!? what does that even mean?), and if all the general performance boost is from the extra core, then I guess that means they haven't been working on improving the rest of the GPU all that much. Which, maybe, is fair, RT is big. We'll have to see. AV1 decode is good, lack of encode sucks but isn't surprising. Could possibly still appear in the Mx, though I'm not counting on it.

Lack of Thunderbolt sucks for people doing lots of video. 10gbps USB3 is fine for the rest of us, though I'd have appreciate TB3 or USB4 for doing backups. I'm surprised, honestly, they've got the controller silicon, but I guess the area price was too high, they thought.

For a phone, this is fantastic. As a harbinger of the M3, it's likely a tragedy, and not because I have to eat my words about the CPU core. :-(
 

Confused-User

macrumors 6502a
Oct 14, 2014
852
986
More interesting is the claim of wider decode and execution; we expected the improved branch prediction, I've already written up the various elements of that.
Wait, did I miss something? I don't remember that. I'll go back and rewatch...

Edit: Oh right. I did see that. I dunno, maybe they did reduce clocks. Guess we need to wait a few more days to find out.
Maybe on the A chip they took all the performance boost in the form of IPC; and kept frequency flat or even reduced to save energy? Certainly it seems like for phones people want an extra 10% battery life more than 10% faster P-core?
You're reaching, just like I did. :) I hope so, but I'm not counting on it.
 

name99

macrumors 68020
Jun 21, 2004
2,410
2,317
Yes. Incredibly disappointing. Shocking, even. I mean, there's a small chance they've reduced clocks and the IPC gain is better but I'm not counting on it.

The news on the NPUs is good - potentially VERY significant for some people. For GPUs, who can say? The quoted performance for ray tracing is weird (4x software?!? what does that even mean?), and if all the general performance boost is from the extra core, then I guess that means they haven't been working on improving the rest of the GPU all that much. Which, maybe, is fair, RT is big. We'll have to see. AV1 decode is good, lack of encode sucks but isn't surprising. Could possibly still appear in the Mx, though I'm not counting on it.
nVidia (at "comparable" hardware, very handwaving) gets about 6x the ray tracking performance of an M1. So a 4x boost is not bad, but also perhaps not the maximum possible.

At the most basic level, what you want in your ray tracing hardware is
- instructions to accelerate BVH traversal (apparently Apple already has these as of M1)
- instructions to accelerate triangle intersection (Apple was missing these, surely now has them)
BUT ALSO
- nVidia puts these instructions in separate hardware (the RT unit) that can be given some work then execute autonomously from the rest of the GPU. This is good in that RT and "generic" GPU work happen in parallel – but it means that RT hardware is unused when you're not doing ray tracing.
It's possible, given Apple's constraints, that they thought this was sub-optimal for them, so they run ray tracing in their generic hardware. Less performant for ray tracing games; more performant for everything else.

- you probably want to recoalesce threads that diverge as the rays propagate, either for branch coalescing or for memory coalescing. NVidia surely does some of this (unclear how aggressively), Apple may not do it in this first round, or again may do it differently (optimizing the coalescing for energy rather than performance)


There are more interesting questions buried in this claim of "new Apple designed shader architecture". Many possibilities exist, most of which are very important for some use cases and not at all for others, for example:
- add 33rd lane for uniform calculations
- split int and FP pipelines like nVidia
- issue two instructions per quadrant per cycle (like nVidia has done, in multiple different ways, from double-pumped execution units to dispatching two instructions from the same thread, to dispatching two instructions from different threads)

- a different GPU for Pro/Max vs A/M (compare eg Lovelace vs Hopper) with the Pro/Max GPU also getting, eg, some degree of 64b support (definitely FP64, but maybe also support for larger indexes/pointers with int64?)
- more independent lane execution/synchronization (like nVidia introduced for Volta). This sounds boring but, along with 64b atomics, is REALLY important if Apple wants to compete with CUDA in terms of mostly recompile and drop-in sophisticated algorithms for non-AI purposes.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,674
Regarding the disappointing 10% CPU boost… I would like to see some in-depth tests first. Already A16 is faster than any competitor by far, it makes sense to invest in efficiency more than performance. I mean, 10% SC over A16 is 13900K territory, that alone will make A17 Pro faster than any Xeon workstation ^^

So far, prosumer M-series are clocked up to 15% faster than A-series. Add these together and you are looking at 20-30% over M2.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.