M3 Chip Generation - Discussion Megathread

koyoot · Jul 18, 2023

257Loner said:
Millions of dollars are being poured into research and development to erode the difference between CPUs and GPUs. The evidence?

This evidence:
General-purpose computing on GPUs
CUDA
OpenCL
OpenGL
Unified Memory Architecture (What Apple Silicon uses)
Heterogeneous System Architecture
AMD APUs (Used in the PlayStation 4 and Xbox One)

If you give computer engineers enough time and money, they'll continue to erode the difference between CPUs and GPUs and change computing again.

Do you understand what you are talking about? Do you understand what those link mean, and what they are talking about?

Serious questions.

Xiao_Xi · Jul 18, 2023

leman said:
a truck has such large cargo space some describe it as a container with an engine

Like this one? Yes.

Wouldn't you describe the iMac as a display with a computer inside?

257Loner · Jul 18, 2023

koyoot said:
Do you understand what you are talking about? Do you understand what those link mean, and what they are talking about?

Serious questions.

I do. Unless you have something constructive to say, relax and let it go.

ChrisA · Jul 18, 2023

scottrichardson said:
What about stylised rendering techniques that don't need to look hyper-real? Won't that still need rasterisation?

Yes, this would be most 3D user interfaces and 3D CAD and maybe apps like Blender. Really, any app except games. Even some kinds of games don't want to be photo-realistic.

koyoot · Jul 18, 2023

257Loner said:
I do. Unless you have something constructive to say, relax and let it go.

If you do, then I will ask you this:

How is this picture, an SOC that is made from three separate dies fit into your picture?

Is it a GPU with integrated CPU?

How is Dragon Range APU:

fit into this picture, where you have two separate CPU chiplets, and the GPU is on a third chiplet?

Is it a GPU with integrated CPU?

How does all of it it fit into Unified Memory architecture?

Unified memory architecture, and General-Purpose Compute is all about homogenization of software, so that you can accelarate execution with the GPU, not the other way around.

Software has to always execute on the CPU and GPU, or any other accelerator has the capability of expanding its capabilities.

altaic · Jul 18, 2023

257Loner said:
I would like to ask all y'all a question: Wouldst thou agree with the analysis that Apple's high-end M-series chips (Max and above) are basically dedicated GPUs with integrated CPUs? The reason provided for this analysis, and I can't remember who said it first, was that most of the die space is dedicated to GPU cores, and so these chips are GPUs first, and CPUs second.

That was me, here, during the M1 Pro/Max event. It was off-the-cuff, and was in the context of people endlessly arguing about ill-defined terms, integrated vs “dedicated” GPUs (“discrete” is the proper terminology, btw) where the usual suspects’ toxic claims were that toy-like phone iGPUs couldn’t possibly perform well enough, etc., ad nauseam.

Well, it turned out that the M1 Pro/Max had an enormous amount of throughput and large caches; qualities of a well designed GPU (rather than a tacked on iGPU with anemic throughput). The fact is the ASi chips are SoCs with cores and other bits that are really capable, unique, well integrated with a healthy amount of secret sauce & magic, and overall pretty much amazing.

Sure, if you squint, it's a CPU with an iGPU, or a GPU with an iCPU, or whatever you all want to argue about. But, in the end it's still an SoC and all you all end up with is crows feet and frown lines.

257Loner · Jul 18, 2023

altaic said:
That was me, here, during the M1 Pro/Max event. It was off-the-cuff, and was in the context of people endlessly arguing about ill-defined terms, integrated vs “dedicated” GPUs (“discrete” is the proper terminology, btw) where the usual suspects’ toxic claims were that toy-like phone iGPUs couldn’t possibly perform well enough, etc., ad nauseam.

Well, it turned out that the M1 Pro/Max had an enormous amount of throughput and large caches; qualities of a well designed GPU (rather than a tacked on iGPU with anemic throughput). The fact is the ASi chips are SoCs with cores and other bits that are really capable, unique, well integrated with a healthy amount of secret sauce & magic, and overall pretty much amazing.

Sure, if you squint, it's a CPU with an iGPU, or a GPU with an iCPU, or whatever you all want to argue about. But, in the end it's still an SoC and all you all end up with is crows feet and frown lines.

Thank you for letting me know it was you. And thank you for providing the context surrounding your original statement. Now that you're here, in this lovely topic, please stick around if you would like to be argued to death.

iPadified · Jul 18, 2023

It is nearly a philosophical questions regarding CPU/GPU hierarchy. Are not a SoC like a human organism, with a brain (CPU) and muscle (GPU)? A human need both to function. The whole system function is more important than its part.

leman · Jul 18, 2023

iPadified said:
It is nearly a philosophical questions regarding CPU/GPU hierarchy.

As one of my favourite language philosophers once wrote "there is not much mystery in these things unless we go and make one out of it"

Saying something like 'wow, this SoC is more GPU as an GPU' (like @altaic did) is a great informal observation and a way to draw attention to some interesting features of the implementation. But attempting to make a technical discourse out of it (like @257Loner appears to do) very quickly derails into nonsense.

257Loner said:
Millions of dollars are being poured into research and development to erode the difference between CPUs and GPUs. The evidence?

This evidence:
General-purpose computing on GPUs
CUDA
OpenCL
OpenGL
Unified Memory Architecture (What Apple Silicon uses)
Heterogeneous System Architecture
AMD APUs (Used in the PlayStation 4 and Xbox One)

If you give computer engineers enough time and money, they'll continue to erode the difference between CPUs and GPUs and change computing again.

This is not about eroding the difference between CPUs and GPUs, but about making GPUs more useful outside of graphics domain. You are looking at the false dichotomy. This is not about "CPU" vs "GPU", this is about different class of problems. The GPU is a processor designed to solve massively parallel computational problems. And to achieves this capability you have to sacrifice something else. We have GPUs because the tasks they perform cannot be solved efficiently using the classical CPU approach. We have CPUs because the tasks they perform cannot be solved efficiently using a massively parallel processor. The magical word is "specialisation". Just like it's impossible to build a car that will excel both at top speed and cargo capacity, it's impossible to build a device that will excel both at serial and parallel computation. As long as we need both capabilities, we will have both types of devices, no matter how you call them or what shape they take (e.g. modern CPUs also have GPU-like parallel computation capabilities, just on a smaller scale, and an Apple GPU includes a custom ARM core to coordinate GPU work scheduling and execution ). And if anything, the industry goes more towards specialisation (so the opposite of what you are claiming). In addition to the serial specialists (CPUs) and parallel specialists (GPUs) we also now have convolution/matrix specialists (various NPUs/tensor units), compression specialists (video en/decoding, data compression hardware), image processing specialists, spatial data and audio specialists, the list goes on.

Basic75 · Jul 19, 2023

257Loner said:
I would like to ask all y'all a question: Wouldst thou agree with the analysis that Apple's high-end M-series chips (Max and above) are basically dedicated GPUs with integrated CPUs? The reason provided for this analysis, and I can't remember who said it first, was that most of the die space is dedicated to GPU cores, and so these chips are GPUs first, and CPUs second.

What purpose would this binary classification serve? There is more grey in the real world than just black and white. Anyhow, in my opinion they are neither. They are SoCs that contain CPU and GPU and a bunch of other things.

However if I do accept, for a moment, the premise of your question, I'd point out that the Mx Max still uses regular LPDDR5 memory while the Playstation 5 and Xbox Series X/S both use GDDR6 memory just like real GPUs do.

Zest28 · Jul 19, 2023

I speculate that the M3 is going to be faster than the M2 and M1.

altaic · Jul 19, 2023

leman said:
As one of my favourite language philosophers once wrote "there is not much mystery in these things unless we go and make one out of it"

I’ve been a long time observer of TOP, and my recent comments here are purely to help out. Maybe I’ll yammer you all soon.

leman · Jul 19, 2023

altaic said:
I’ve been a long time observer of TOP, and my recent comments here are purely to help out. Maybe I’ll yammer you all soon.

What's TOP?

altaic · Jul 19, 2023

leman said:
What's TOP?

Technically Overboard with Peeps. The Other Persuasion. Choose your own adventure.

leman · Jul 19, 2023

altaic said:
Technically Overboard with Peeps. The Other Persuasion. Choose your own adventure.

Am I too old, too young, or maybe just too tired to get this?

altaic · Jul 19, 2023

leman said:
Am I too old, too young, or maybe just too tired to get this?

The other place. My apologies.

Basic75 · Jul 19, 2023

Back on topic, I'm wondering whether the M3 generation will increase the size of the P-core clusters. The M1/2 Pro/Max/Ultra all group the P-cores into clusters of 4 that share an L2 cache. And each P-core can only access the L2 cache belonging to its cluster. Will Apple make a jump like AMD did when going from Zen 2's 4-core to Zen 3's 8-core clusters?

scottrichardson · Jul 19, 2023

Basic75 said:
Back on topic, I'm wondering whether the M3 generation will increase the size of the P-core clusters. The M1/2 Pro/Max/Ultra all group the P-cores into clusters of 4 that share an L2 cache. And each P-core can only access the L2 cache belonging to its cluster. Will Apple make a jump like AMD did when going from Zen 2's 4-core to Zen 3's 8-core clusters?

Does it need to be done in a linear fashion? Could it be a cluster of 5 or 6? I appreciate odd numbers result in n odd layout, but nothing a little Tetris can’t fix!

TigeRick · Jul 19, 2023

Basic75 said:
Back on topic, I'm wondering whether the M3 generation will increase the size of the P-core clusters. The M1/2 Pro/Max/Ultra all group the P-cores into clusters of 4 that share an L2 cache. And each P-core can only access the L2 cache belonging to its cluster. Will Apple make a jump like AMD did when going from Zen 2's 4-core to Zen 3's 8-core clusters?

I don't think so, the extra memory bandwidth most likely will be used by GPU cores which I expect would be 1536 ALU.

Apple would increase CPU cores when LPDDR6 appears, earliest would be in 2026...

leman · Jul 19, 2023

TigeRick said:
I don't think so, the extra memory bandwidth most likely will be used by GPU cores which I expect would be 1536 ALU.

You think Apple will bump the M3 GPU core count to 12?

koyoot · Jul 19, 2023

leman said:
You think Apple will bump the M3 GPU core count to 12?

Either monolithic and 12 core GPU, with 4P/4E CPU, or tile based and 20 core GPU with 6P/6E CPU, with 192 bit bus.

TigeRick · Jul 19, 2023

leman said:
You think Apple will bump the M3 GPU core count to 12?

And with 128-bit LPDDR5x-8533 memory support

Basic75 · Jul 19, 2023

TigeRick said:
I don't think so, the extra memory bandwidth most likely will be used by GPU cores which I expect would be 1536 ALU.

The cluster size has nothing to do with (extra) memory bandwidth. It's about how many cores share an L2 cache, and how many cores enjoy the fastest core-to-core latency. Take the M1 Pro for example. The 8 P-cores are organised as two 4-core clusters, each with 12MB of L2. So while the combined L2 of the 8 P-cores is 24MB, each individual core can only make use of (at most) 12MB. As for the latencies: https://github.com/nviennot/core-to-core-latency

TigeRick · Jul 19, 2023

Basic75 said:
The cluster size has nothing to do with (extra) memory bandwidth. It's about how many cores share an L2 cache, and how many cores enjoy the fastest core-to-core latency. Take the M1 Pro for example. The 8 P-cores are organised as two 4-core clusters, each with 12MB of L2. So while the combined L2 of the 8 P-cores is 24MB, each individual core can only make use of (at most) 12MB. As for the latencies: https://github.com/nviennot/core-to-core-latency

CPU still need to be feed with memory data, and they (CPU and GPU) all require enough memory bandwidth to support it. That's why M2 Pro needs 256-bit memory bus to support higher core count. GPU actually more sensitive to memory bandwidth cause they consists of small ALU to process. That's why it is combination of CPU and GPU needs for memory bandwidth. Of course power TDP and clock speed also have to count in here...

Basic75 · Jul 19, 2023

TigeRick said:
CPU still need to be feed with memory data, and they (CPU and GPU) all require enough memory bandwidth to support it. That's why M2 Pro needs 256-bit memory bus to support higher core count. GPU actually more sensitive to memory bandwidth cause they consists of small ALU to process. That's why it is combination of CPU and GPU needs for memory bandwidth. Of course power TDP also have to count in here...

Yes, the CPU needs to be fed with memory data, but that has nothing to do with my question about whether 8 P-cores are organised as one 8x cluster, two 4x clusters, or whatever.

M3 Chip Generation - Discussion Megathread

macrumors 603

macrumors 68000

macrumors 6502a

macrumors G5

macrumors 603

macrumors 6502a

macrumors 6502a

macrumors 68020

macrumors Core

macrumors 68020

macrumors 68030

macrumors 6502a

macrumors Core

macrumors 6502a

macrumors Core

macrumors 6502a

macrumors 68020

macrumors 6502a

macrumors regular

macrumors Core

macrumors 603

macrumors regular

macrumors 68020

macrumors regular

macrumors 68020

Our Staff