M3 Chip Generation - Discussion Megathread

Xiao_Xi · Nov 7, 2023

For those who were wondering what happened to all the TSMC N3B-based SoCs that Apple had supposedly been making since January, it seems that Apple had Macs in a warehouse since July.

Aaron (@aaronp613) on X

Thanks to the MacRumors forums, we were able to verify that some of the new macs ship with an unreleased build of macOS 13.5 which means some of these Macs have been sitting in a warehouse since July.

x.com

Chancha · Nov 7, 2023

So I got myself an M3 Max binned SKU. Is there a way for me to check how the memory behaves, as in how many DRAM are used or how the physical memory channels are laid out, without tearing the thing down?

leman · Nov 7, 2023

Chancha said:
So I got myself an M3 Max binned SKU. Is there a way for me to check how the memory behaves, as in how many DRAM are used or how the physical memory channels are laid out, without tearing the thing down?

Well, you can measure the peak memory bandwidth, but what good will it do? It’s going to be 350GB/s. We know how many channels there are and how wide the bus is.

Chancha · Nov 7, 2023

leman said:
Well, you can measure the peak memory bandwidth, but what good will it do? It’s going to be 350GB/s. We know how many channels there are and how wide the bus is.

Guess we just wait for the inevitable teardown videos then. Max Tech and co are going to de-lid it in the next 24 hours for sure.

leman · Nov 7, 2023

Chancha said:
Guess we just wait for the inevitable teardown videos then. Max Tech and co are going to de-lid it in the next 24 hours for sure.

Something that would be interesting is to run a RAM bandwidth test on the CPU, but I don’t know any tools for that…

altaic · Nov 7, 2023

Chancha said:
Guess we just wait for the inevitable teardown videos then. Max Tech and co are going to de-lid it in the next 24 hours for sure.

Who is the “co” part of that? Any idea if they’ll post high res die shots? I’m not interested in their analysis, just the photos.

Chancha · Nov 7, 2023

altaic said:
Who is the “co” part of that? Any idea if they’ll post high res die shots? I’m not interested in their analysis, just the photos.

https://youtube.com/@wekihome9414?si=9EHjlfaRZQB1YBwv

This Chinese YouTube channel has been posting extremely high quality teardown of phones and various electronics for a long time, but they are relatively unknown to the west. They do Apple stuff as well. Check for example the Mac Studio and M2 Air teardowns, guy even spent time talking about PSU sources and stuff (at that time people speculated PSU supplier lottery with regards to that noise issue).

Chancha · Nov 7, 2023

leman said:
Something that would be interesting is to run a RAM bandwidth test on the CPU, but I don’t know any tools for that…

Apple's M1 Pro, M1 Max SoCs Investigated: New Performance and Efficiency Heights

www.anandtech.com

I was reading the anandtech article on M1 Max memory bandwidth not being saturated to 400GB/s back then, and already was wondering what they did to get the measurements. What even is “Intra-cacheline” on the title of the graph.

Xiao_Xi · Nov 7, 2023

leman said:
Something that would be interesting is to run a RAM bandwidth test on the CPU, but I don’t know any tools for that…

I'm not sure how Chips and Cheese tested M2 Pro. Would a similar benchmark work on M3?

A Brief Look at Apple’s M2 Pro iGPU

Integrated GPUs are often low-end affairs. Even when graphics performance matters, a combination of cost, die space, memory bandwidth, and power constraints prevent iGPUs’ performance from re…

chipsandcheese.com

name99 · Nov 7, 2023

leman said:
Well, you can measure the peak memory bandwidth, but what good will it do? It’s going to be 350GB/s. We know how many channels there are and how wide the bus is.

Actually we DON'T know what that peak memory bandwidth will be...

On the one hand, the memory may be compressed, so that effective bandwidth is larger than 300GB/s.
On the other hand, that effective bandwidth may not be visible to a cluster (or to compute generally) if the bandwidth between a cluster and the SLC is still capped at around ~100GB/s
On the third hand, maybe that bandwidth is capped higher, since this is a newer SoC.
On the fourth hand, with six rather than four CPUs in a cluster, a higher cluster<->SLC bandwidth makes sense.

So bottom line is
- unless you're willing to do a serious deep dive into the new architecture, testing a lot of different bandwidths under different conditions (multiple threads, GPU bandwidth, compressible vs incompressible data sets, etc) you're unlikely to be able to conclude much of interest or validity from a single number derived from code you can't modify and control.

What COULD be done to investigate this, by amateurs, is on a low-end machine (ie an 8GB machine) create the scenarios that people claim lead to thrashing on these machines (which appears to be something like create 20 tabs in Chrome), do the same thing on an M2 equivalent 8GB machine, and see if there is a noticeable difference.

I don't think you can do a perfect investigation right now; the memory footprints between the M3 Pro and M2 generation don't quite match. But you could try this sort of thing and at least see what happens, if there is a noticeable performance drop at, say 1.5x as many tabs, or if you can go quite a bit further (which would suggest some sort of transparent [non-page-based!] memory compression.

leman · Nov 7, 2023

name99 said:
Actually we DON'T know what that peak memory bandwidth will be...

Good point! Once I get my Max I will try to do a very naive GPU bandwidth test, curious what it will show.

By the way, would you be so kind to point me to a resource you trust that deals with benchmarking cache sizes? I’d like to write my own tool but I’m very new to the topic.

name99 · Nov 7, 2023

leman said:
Good point! Once I get my Max I will try to do a very naive GPU bandwidth test, curious what it will show.

By the way, would you be so kind to point me to a resource you trust that deals with benchmarking cache sizes? I’d like to write my own tool but I’m very new to the topic.

You can look at my code dump at
https://github.com/name99-org/AArch64-Explore .

It should be fairly obvious how to modify the code to get timings for reading (or writing) successively larger buffers.

I think I even created two sets of buffers (one all 0, one random data) to see if Apple handled them differently.
(On M1, the answer is no. Or, more precisely, not in a way that's user visible. There's a patent that lines that are all zero are moved around the SoC in a way that uses lower energy, but that doesn't extend on M1 to any sort of compression of all-zero lines in any of the caches, or for DRAM bandwidth.)

But I didn't do any sort of GPU investigation at the time I did that work...
Philip Turner did some GPU benchmarking here (though I disagree with some of his analysis as to what his numbers mean...)
You may find it a fun project to compile his code and replicate his work

GitHub - philipturner/metal-benchmarks: Apple GPU microarchitecture

Apple GPU microarchitecture. Contribute to philipturner/metal-benchmarks development by creating an account on GitHub.

github.com

leman · Nov 7, 2023

name99 said:
You can look at my code dump at
https://github.com/name99-org/AArch64-Explore .

It should be fairly obvious how to modify the code to get timings for reading (or writing) successively larger buffers.

Thanks, will have a look!

name99 said:
Philip Turner did some GPU benchmarking here (though I disagree with some of his analysis as to what his numbers mean...)

Yeah, I've been working on my own tests for a while. Philipp did some great work, but I have difficulty understanding some the terminology and methods he uses (in particular, his mention of "N-issue" is in conflict with my comprehension of how these GPUs work).

KenkoPa · Nov 8, 2023

Xiao_Xi said:
For those who were wondering what happened to all the TSMC N3B-based SoCs that Apple had supposedly been making since January, it seems that Apple had Macs in a warehouse since July.

Aaron (@aaronp613) on X

Thanks to the MacRumors forums, we were able to verify that some of the new macs ship with an unreleased build of macOS 13.5 which means some of these Macs have been sitting in a warehouse since July.

x.com

Wow! That's very expensive inventory cost. At most it should be 1 month worth of stocks.

altaic · Nov 8, 2023

KenkoPa said:
Wow! That's very expensive inventory cost. At most it should be 1 month worth of stocks.

No, definitely not.

kiranmk2 · Nov 8, 2023

Chancha said:
So I got myself an M3 Max binned SKU. Is there a way for me to check how the memory behaves, as in how many DRAM are used or how the physical memory channels are laid out, without tearing the thing down?

Do you have a Geekbench 6 score for this? I haven't seen any results for the binned/14-core version as Apple sent out the 16-core version for reviews.

Chancha · Nov 8, 2023

kiranmk2 said:
Do you have a Geekbench 6 score for this? I haven't seen any results for the binned/14-core version as Apple sent out the 16-core version for reviews.

They did send out some, but only few, the scores could be filtered out on GB website search by the Mac15,11 identifier.
Anyway mine scored:
single: 3203
multi-core: 19483
Metal: 122732
OpenCL: 75813

MayaUser · Nov 8, 2023

@leman do you think in the next 20 years , arm based systems can be a viable option for the servers centres ?
I mean nowadays these full servers buildings are requiring so much pure water and top filtrations systems that cost my goodness ...

leman · Nov 8, 2023

MayaUser said:
@leman do you think in the next 20 years , arm based systems can be a viable option for the big servers centres ?!
I mean nowadays these full servers buildings are requiring so much pure water and top filtrations systems that cost my goodness ...

Why next twenty years? They are viable today. You can buy cloud compute time on Amazon ARM for cheap.

But I’m not sure that the environmental impact of servers will change much because of ARM. For example, Zen4c cores are rather efficient as well…

MayaUser · Nov 8, 2023

leman said:
Why next twenty years? They are viable today. You can buy cloud compute time on Amazon ARM for cheap.

But I’m not sure that the environmental impact of servers will change much because of ARM. For example, Zen4c cores are rather efficient as well…

i mean in UK for example just for the water and electricity to hold 3 buildings full of servers (main data centre) (not arm based) costs around 350 mil pounds/year
I wonder with more efficient arm based systems that by cutting some water and electricity the bill can be cut also in half or neat that?!

leman · Nov 8, 2023

MayaUser said:
i mean in UK for example just for the water and electricity to hold 3 buildings full of servers (main data centre) (not arm based) costs around 350 mil pounds/year
I wonder with more efficient arm based systems that by cutting some water and electricity the bill can be cut also in half or neat that?!

If you compare something like Graviton3 to Intel Xeons, easily. Graviton3 vs newest EPYC, probably not so much.

KenkoPa · Nov 8, 2023

MayaUser said:
i mean in UK for example just for the water and electricity to hold 3 buildings full of servers (main data centre) (not arm based) costs around 350 mil pounds/year
I wonder with more efficient arm based systems that by cutting some water and electricity the bill can be cut also in half or neat that?!

Why not submerge them into the cold cold sea?

Gudi · Nov 8, 2023

KenkoPa said:
Why not submerge them into the cold cold sea?

Because the English Sea is full of ****.

Poop UK 💩🏴󠁧󠁢󠁥󠁮󠁧󠁿

AmazingTechGeek · Nov 9, 2023

Confused-User said:
Or you could be a bit less sensitive.

I'm just giving you this feedback from my perspective. You can do what you want with it including nothing at all.

A reminder that you are “expected” to be respectful on the forum regardless of what point your making.

Thank you.

MayaUser · Nov 9, 2023

M3 Chip Generation - Discussion Megathread

macrumors 68000

macrumors 68030

macrumors Core

macrumors 68030

macrumors Core

macrumors 6502a

macrumors 68030

macrumors 68030

macrumors 68000

macrumors 68030

macrumors Core

macrumors 68030

macrumors Core

Suspended

macrumors 6502a

macrumors 68000

macrumors 68030

macrumors 68040

macrumors Core

macrumors 68040

macrumors Core

Suspended

Suspended

macrumors 6502a

macrumors 68040

Our Staff