M3 Chip Generation - Discussion Megathread

WilliApple · Aug 24, 2023

Love-hate 🍏 relationship said:
Hopefully armv9 , finally

I think this requires an architecture change. I don't think Apple wants to do another transition, because they are still with Intel to arm.

leman · Aug 24, 2023

WilliApple said:
I think this requires an architecture change. I don't think Apple wants to do another transition, because they are still with Intel to arm.

ARMv9 is ARMv8 with few extra instruction. Its really much less of a deal than people make of it.

name99 · Aug 24, 2023

leman said:
ARMv9 is ARMv8 with few extra instruction. Its really much less of a deal than people make of it.

The real question is whether Apple believes in SVE or not. It's unclear that SVE is a good idea.
The concept of "length independent vector processing" is a good idea, but the specific details of SIMD are not the only way to do this; and there's plenty of evidence that Apple has a very different sort of idea in mind.

But if there's anything you should know about Tim's Apple, it's that they don't ship until they are ready. The ideas they have for indefinite-length vector processing were first thought about around 2000, with constant revision for the next ten years. My guess is there was then a hiatus during which they negotiated with ARM (apparently unsuccessfully)?

If they are going to ship in the next few years, they probably want good answers as to how this interacts with AMX (and the way AMX is growing to become something like AVX512 for Apple)?
If they are willing to make the instruction set for AMX visible (in a way that it is not right now; and probably after redesigning it based on what they have learned in the past few years, so that generic code compiles to it, not just function calls through Accelerate) perhaps that's an optimal solution?

leman · Aug 24, 2023

name99 said:
The real question is whether Apple believes in SVE or not. It's unclear that SVE is a good idea.
The concept of "length independent vector processing" is a good idea, but the specific details of SIMD are not the only way to do this; and there's plenty of evidence that Apple has a very different sort of idea in mind.

I would love to have SVE just for masks and better support for arrays of arbitrary length. The vector width can stay 128bit for all I care. Frankly, I’d take another 128 EU (or two) over widening the SIMD width to 256bit.

name99 said:
If they are going to ship in the next few years, they probably want good answers as to how this interacts with AMX (and the way AMX is growing to become something like AVX512 for Apple)?
If they are willing to make the instruction set for AMX visible (in a way that it is not right now; and probably after redesigning it based on what they have learned in the past few years, so that generic code compiles to it, not just function calls through Accelerate) perhaps that's an optimal solution?

Thats the interesting question, isn’t it? I still have a strong suspicion that streaming SVE+SME was modeled after Apples AMX, after all, it was the first hardware unit of this type on the market if I understand correctly (and both use outer product instead of full matmul). Apple could probably easily support streaming SVE/SME for AMX programming, but from what I understand AMX actually has additional capabilities (like the very interesting LUT instruction and some addressing modes? Not sure)

I just hope that we will have a stable public ISA for programming the AMX at some point, as BLAS alone is not cutting it.

s1oplus · Aug 25, 2023

I think they should make 16gb the default ram size along with 512 gb ssd. 10 core cpu 12 core gpu on the lower end and 10 cpu 14 gpu on the higher end. Less power consumption and heat. Plus i think they will finaly redesign the macbook pro 13 inch or discontinue it for this new chip.

dmccloud · Aug 25, 2023

WilliApple said:
I think this requires an architecture change. I don't think Apple wants to do another transition, because they are still with Intel to arm.

Apple's licensing terms for ARM include the ISA only. They are not beholden to the existing ARM architecture like Qualcomm or other ARM customers. Apple can (and has) changed the SoC architecture with its A series (and now M series) SoCs for years at this point. The other difference with Apple's licensing terms is that they can add their own code onto the ISA, some of which has been backported into the ARM instruction set itself.

WilliApple · Aug 28, 2023

dmccloud said:
Apple's licensing terms for ARM include the ISA only. They are not beholden to the existing ARM architecture like Qualcomm or other ARM customers. Apple can (and has) changed the SoC architecture with its A series (and now M series) SoCs for years at this point. The other difference with Apple's licensing terms is that they can add their own code onto the ISA, some of which has been backported into the ARM instruction set itself.

What I meant was 32 bit from A6 to 64 bits to A7. That I consider is an an architecture change because when Apple deprecated the translation. I believe arm v8 and arm v9 will be a similar thing. Apple is already doing x86 to Arm.

leman · Aug 28, 2023

WilliApple said:
What I meant was 32 bit from A6 to 64 bits to A7. That I consider is an an architecture change because when Apple deprecated the translation. I believe arm v8 and arm v9 will be a similar thing.

Yes, moving from aarch32 to aarch64 was an architectural change. ARMv9 does not include any changes of this nature at all. I am confused what you are basing your beliefs on. Did you read the ARM architecture documentation?

NT1440 · Aug 28, 2023

leman said:
Yes, moving from aarch32 to aarch64 was an architectural change. ARMv9 does not include any changes of this nature at all. I am confused what you are basing your beliefs on. Did you read the ARM architecture documentation?

There seems to be a total misunderstanding of v9 floating around out there. Last I looked into it, it’s basically v8 with some additional security stuff that Apple long ago already implemented (and got folded into the v9 spec). Little to nothing that would relate to performance gains.

Am I way off base here?

leman · Aug 28, 2023

NT1440 said:
There seems to be a total misunderstanding of v9 floating around out there. Last I looked into it, it’s basically v8 with some additional security stuff that Apple long ago already implemented (and got folded into the v9 spec). Little to nothing that would relate to performance gains.

Am I way off base here?

ARMv9 is just ARMv8.5+ with some new instructions. It's fully backwards compatible with previous versions. From what I understand, the focus of the release is on secure virtualisation (mostly seems relevant for server/cloud), but they also introduce some new vector and matrix instructions (SVE2).

But definitely not an architecture change. Code written for previous Aarch64 will continue running as expected on a newer ARMv9 CPUs.

WilliApple · Aug 28, 2023

leman said:
Yes, moving from aarch32 to aarch64 was an architectural change. ARMv9 does not include any changes of this nature at all. I am confused what you are basing your beliefs on. Did you read the ARM architecture documentation?

Oh, I was just assuming.

Rychiar · Sep 12, 2023

Longplays said:
Corrected, thanks

The max RAM is sadly short of the 2019 Mac Pro's 1.5TB. At the rate Apple's doing this it may take a decade or two to reach that amount. All for the sake of economies of scale.

That was 1.5TB of third party RAM. 1.5tb of apple ram would prob be like 50 grand lol

ArkSingularity · Sep 12, 2023

Rychiar said:
That was 1.5TB of third party RAM. 1.5tb of apple ram would prob be like 50 grand lol

Out of curiosity, I did the math. Given that Apple's upgrade price to go from 8GB to 16GB is $200 ($25 per additional GB), $25 * 1560GB would be around $39,000 (assuming they use the same prices).

So your estimate is actually a pretty good estimate.

Boil · Sep 12, 2023

I want to say the RAM was about $28K to max out from Apple, but you also had the upgrade charge for a CPU to support that much RAM...?

MayaUser · Sep 12, 2023

So its official, hardware Ray tracing for Apple Silicon

huge_apple_fangirl · Sep 12, 2023

AV1 hardware decoding and ray tracing!

dgdosen · Sep 12, 2023

I heard "10% performance improvement" - That doesn't seem like a huge bump over A16...

dugbug · Sep 12, 2023

dgdosen said:
I heard "10% performance improvement" - That doesn't seem like a huge bump over A16...

That was wrt the performance cpu cores

MayaUser · Sep 12, 2023

dgdosen said:
I heard "10% performance improvement" - That doesn't seem like a huge bump over A16...

they said 10% SC i think

name99 · Sep 12, 2023

dgdosen said:
I heard "10% performance improvement" - That doesn't seem like a huge bump over A16...

More interesting is the claim of wider decode and execution; we expected the improved branch prediction, I've already written up the various elements of that.

Maybe on the A chip they took all the performance boost in the form of IPC; and kept frequency flat or even reduced to save energy? Certainly it seems like for phones people want an extra 10% battery life more than 10% faster P-core?

huge_apple_fangirl · Sep 12, 2023

dgdosen said:
I heard "10% performance improvement" - That doesn't seem like a huge bump over A16...

Big focus on GPU upgrades. But yeah, only 10% single-core improvements on a new process node isn't great...

Confused-User · Sep 12, 2023

dgdosen said:
I heard "10% performance improvement" - That doesn't seem like a huge bump over A16...

Yes. Incredibly disappointing. Shocking, even. I mean, there's a small chance they've reduced clocks and the IPC gain is better but I'm not counting on it.

The news on the NPUs is good - potentially VERY significant for some people. For GPUs, who can say? The quoted performance for ray tracing is weird (4x software?!? what does that even mean?), and if all the general performance boost is from the extra core, then I guess that means they haven't been working on improving the rest of the GPU all that much. Which, maybe, is fair, RT is big. We'll have to see. AV1 decode is good, lack of encode sucks but isn't surprising. Could possibly still appear in the Mx, though I'm not counting on it.

Lack of Thunderbolt sucks for people doing lots of video. 10gbps USB3 is fine for the rest of us, though I'd have appreciate TB3 or USB4 for doing backups. I'm surprised, honestly, they've got the controller silicon, but I guess the area price was too high, they thought.

For a phone, this is fantastic. As a harbinger of the M3, it's likely a tragedy, and not because I have to eat my words about the CPU core. :-(

Confused-User · Sep 12, 2023

name99 said:
More interesting is the claim of wider decode and execution; we expected the improved branch prediction, I've already written up the various elements of that.

Wait, did I miss something? I don't remember that. I'll go back and rewatch...

Edit: Oh right. I did see that. I dunno, maybe they did reduce clocks. Guess we need to wait a few more days to find out.

Maybe on the A chip they took all the performance boost in the form of IPC; and kept frequency flat or even reduced to save energy? Certainly it seems like for phones people want an extra 10% battery life more than 10% faster P-core?

You're reaching, just like I did.

I hope so, but I'm not counting on it.

name99 · Sep 12, 2023

Confused-User said:
Yes. Incredibly disappointing. Shocking, even. I mean, there's a small chance they've reduced clocks and the IPC gain is better but I'm not counting on it.

The news on the NPUs is good - potentially VERY significant for some people. For GPUs, who can say? The quoted performance for ray tracing is weird (4x software?!? what does that even mean?), and if all the general performance boost is from the extra core, then I guess that means they haven't been working on improving the rest of the GPU all that much. Which, maybe, is fair, RT is big. We'll have to see. AV1 decode is good, lack of encode sucks but isn't surprising. Could possibly still appear in the Mx, though I'm not counting on it.

nVidia (at "comparable" hardware, very handwaving) gets about 6x the ray tracking performance of an M1. So a 4x boost is not bad, but also perhaps not the maximum possible.

At the most basic level, what you want in your ray tracing hardware is
- instructions to accelerate BVH traversal (apparently Apple already has these as of M1)
- instructions to accelerate triangle intersection (Apple was missing these, surely now has them)
BUT ALSO
- nVidia puts these instructions in separate hardware (the RT unit) that can be given some work then execute autonomously from the rest of the GPU. This is good in that RT and "generic" GPU work happen in parallel – but it means that RT hardware is unused when you're not doing ray tracing.
It's possible, given Apple's constraints, that they thought this was sub-optimal for them, so they run ray tracing in their generic hardware. Less performant for ray tracing games; more performant for everything else.

- you probably want to recoalesce threads that diverge as the rays propagate, either for branch coalescing or for memory coalescing. NVidia surely does some of this (unclear how aggressively), Apple may not do it in this first round, or again may do it differently (optimizing the coalescing for energy rather than performance)

There are more interesting questions buried in this claim of "new Apple designed shader architecture". Many possibilities exist, most of which are very important for some use cases and not at all for others, for example:
- add 33rd lane for uniform calculations
- split int and FP pipelines like nVidia
- issue two instructions per quadrant per cycle (like nVidia has done, in multiple different ways, from double-pumped execution units to dispatching two instructions from the same thread, to dispatching two instructions from different threads)

- a different GPU for Pro/Max vs A/M (compare eg Lovelace vs Hopper) with the Pro/Max GPU also getting, eg, some degree of 64b support (definitely FP64, but maybe also support for larger indexes/pointers with int64?)
- more independent lane execution/synchronization (like nVidia introduced for Volta). This sounds boring but, along with 64b atomics, is REALLY important if Apple wants to compete with CUDA in terms of mostly recompile and drop-in sophisticated algorithms for non-AI purposes.

leman · Sep 12, 2023

Regarding the disappointing 10% CPU boost… I would like to see some in-depth tests first. Already A16 is faster than any competitor by far, it makes sense to invest in efficiency more than performance. I mean, 10% SC over A16 is 13900K territory, that alone will make A17 Pro faster than any Xeon workstation ^^

So far, prosumer M-series are clocked up to 15% faster than A-series. Add these together and you are looking at 20-30% over M2.

M3 Chip Generation - Discussion Megathread

macrumors 65816

macrumors Core

macrumors 68030

macrumors Core

macrumors regular

macrumors 68040

macrumors 65816

macrumors Core

macrumors P6

macrumors Core

macrumors 65816

macrumors 68040

macrumors 6502a

macrumors 68040

macrumors 68040

macrumors 6502a

macrumors 68030

macrumors 68000

macrumors 68040

macrumors 68030

macrumors 6502a

macrumors 6502a

macrumors 6502a

macrumors 68030

macrumors Core

Our Staff