[CPU only] Apple M1/2(Max/Ultra) (TSMC 5nm) vs AMD Zen 4 (TSMC 5nm) - Technical Analysis

dmccloud · Aug 20, 2023

Xiao_Xi said:
Now that Chips and Cheese does posts about ARM cores, I can see the possibility of them writing about Apple cores.

ARM’s Cortex A710: Winning by Default

ARM Ltd has been dominating the Android world for the better part of the last decade, with their 7-series cores at the forefront of their success. Throughout the late 2010s, the Cortex A73, A75, an…

chipsandcheese.com

ARM’s Neoverse N2: Cortex A710 for Servers

ARM’s Neoverse N1 was based on ARM’s Cortex A76 mobile core, but enhancements like instruction cache coherency and 48-bit physical addressing made it usable in servers.

chipsandcheese.com

Anyway, how good are their blog posts?

On the surface, their articles read like the old AnandTech articles where they went on a very deep and granular dive into the processor specs and performance. What's interesting in the A710 article is that while they mention Samsung, Qualcomm, etc. using ARMs core designs, they make no mention of ARM with respect to either the iPhone (only addressing the Android market) or any computer using custom SoCs.

Xiao_Xi · Aug 20, 2023

By the way, Hot Chips is coming soon with some great talks.

Advance Program

A Symposium on High Performance Chips

www.hotchips.org

But unfortunately the videos will not be available to the public until December.

In December 2023 all conference videos and presentations will be made available to the public for an indefinite period of time.

Registration

A Symposium on High Performance Chips

www.hotchips.org

leman · Aug 20, 2023

dmccloud said:
On the surface, their articles read like the old AnandTech articles where they went on a very deep and granular dive into the processor specs and performance. What's interesting in the A710 article is that while they mention Samsung, Qualcomm, etc. using ARMs core designs, they make no mention of ARM with respect to either the iPhone (only addressing the Android market) or any computer using custom SoCs.

They are describing individual chips, not architectures. I think their analysis is usually in depth and they are passionate about their work. They are obviously mainstream x86 focused, which is fair enough. I dint expect them to cover Apple any time soon, simply because Apple doesn’t release documentation or tooling for in-depth investigations

Pressure · Aug 20, 2023

The Snapdragon 8+ Gen 1 has worse efficiency at the low end and doesn't even beat the Apple A12 core design until it uses over 4 Watts of power. At over 8 Watts of power the high performance cores finally eke out a win over the Apple A14. This efficiency is very dependent on the foundry used and the chips using Samsung fabs are just the worst, like the Snapdragon 8 Gen 1 barely beating out the Snapdragon 855 at the high-end (8+ Watts of power).

You can find more cool data from the video made by Geekerwan.

dmccloud · Aug 21, 2023

Pressure said:
The Snapdragon 8+ Gen 1 has worse efficiency at the low end and doesn't even beat the Apple A12 core design until it uses over 4 Watts of power. At over 8 Watts of power the high performance cores finally eke out a win over the Apple A14. This efficiency is very dependent on the foundry used and the chips using Samsung fabs are just the worst, like the Snapdragon 8 Gen 1 barely beating out the Snapdragon 855 at the high-end (8+ Watts of power).

View attachment 2248563

You can find more cool data from the video made by Geekerwan.

What would be interesting is a comparison between the Qualcomm chip Samsung is using in the US for the S23 series against the Exynos chip Samsung designed and fabricated in-house for other markets.

leman · Aug 21, 2023

Pressure said:
The Snapdragon 8+ Gen 1 has worse efficiency at the low end and doesn't even beat the Apple A12 core design until it uses over 4 Watts of power. At over 8 Watts of power the high performance cores finally eke out a win over the Apple A14. This efficiency is very dependent on the foundry used and the chips using Samsung fabs are just the worst, like the Snapdragon 8 Gen 1 barely beating out the Snapdragon 855 at the high-end (8+ Watts of power).

View attachment 2248563

You can find more cool data from the video made by Geekerwan.

One thing to keep in mind is that Android SoCs have more CPU cores and can afford to clock them more conservatively for multicore work. I assume that if one would look at single-core, Apple would have a considerable lead.

RedWeasel · Sep 5, 2023

JouniS said:
Linux is the default operating system in many fields, both in the academia and the industry. It's free, which means that it does what you want. You are not at the mercy of the bureaucrats of Microsoft, who keep introducing new features to justify their continued employment, breaking random things in the process.

You must have never been in touch with GNOME…

sunny5 · Sep 17, 2023

Any real life testing between M2 Ultra and AMD 7950X? According to Cinebench R24, both perform quite identical but M2 Ultra consume way less power. I know that Ryzen 7000 is quite a failure but curious to know the real life testing.

pshufd · Sep 17, 2023

RedWeasel said:
You must have never been in touch with GNOME…

I used to work in a Linux shop. We had a support staff of 6 in our building for the operating system, tools and keeping things running. We also had our own version of Linux. Really small companies usually don't run on Linux because it's a lot easier to find Microsoft expertise out there to set things up and keep them running. I was the Mac guy in the building. When one of the IT folks got a service request on Macs, they often sent them to my office.

thenewperson · Sep 17, 2023

sunny5 said:
I know that Ryzen 7000 is quite a failure

Really? Never got that impression from reviewers. It's the RDNA3 family that has seemed a bit underwhelming (from a performance ceiling (and efficiency) standpoint).

Xiao_Xi · Sep 17, 2023

Now, we have all the information to focus on the main topic of the thread.
- AMD's Phoenix SoC

Hot Chips 2023: AMD’s Phoenix SoC

AMD’s mobile and small form factor journey has been arduous.

chipsandcheese.com

- Apple Silicon
Is this infographic the best explanation of Apple Silicon microarchitecture?

Unfortunately, I can't find the link to the original source. Is it yours? @name99

- Geekbench results

HP HP EliteBook 835 13 inch G10 Notebook PC vs MacBook Air (2022) - Geekbench

Some results don't make any sense. The 7840U wins in almost all but three of the multicore tests. Of the three, the Object Detection results are the strangest because M2 wins by 10% on multicore, but loses by 10% on single core. How is this possible?

sunny5 · Sep 18, 2023

thenewperson said:
Really? Never got that impression from reviewers. It's the RDNA3 family that has seemed a bit underwhelming (from a performance ceiling (and efficiency) standpoint).

Despite using a new architecture and 5nm, it didn't really improve that much and instead, in consume quite a lot of power.

name99 · Sep 18, 2023

Xiao_Xi said:
Now, we have all the information to focus on the main topic of the thread.
- AMD's Phoenix SoC

Hot Chips 2023: AMD’s Phoenix SoC

AMD’s mobile and small form factor journey has been arduous.

chipsandcheese.com

- Apple Silicon
Is this infographic the best explanation of Apple Silicon microarchitecture?

View attachment 2266831
Unfortunately, I can't find the link to the original source. Is it yours? @name99

- Geekbench results

HP HP EliteBook 835 13 inch G10 Notebook PC vs MacBook Air (2022) - Geekbench

Some results don't make any sense. The 7840U wins in almost all but three of the multicore tests. Of the three, the Object Detection results are the strangest because M2 wins by 10% on multicore, but loses by 10% on single core. How is this possible?

This is by Dougall Johnson and comes from https://dougallj.github.io/applecpu/firestorm.html
It is about the best you can find, but Dougall and I differ on a few points:

- We have different theories of how the ROB is laid out. I think we agree on the basic point, but I find his language of "coalesced" retirement incomprehensible (and maybe he thinks the same regarding my language!)

- I believe (based on multiple patents) that at least some of the scheduler queues are paired, so that if one queue can find no runnable instructions, it will issue the second-choice runnable instruction from the paired queue.

- And I don't think the LS scheduling queue is a single large queue.

But we have been doing very different work, using very different investigative techniques, so there's no reason to believe either of us is the absolute truth! Both of us have had to be content with quick scans of the territory, rather than the sort of careful detailed investigation of just one subsystem that you get in the x86 world, because there is so much new territory.

AMD uses 8 P cores (with hyperthreading as their kinda sorta equiv of E-cores) so the fact that they lose in ANYTHING in multicore is noteworthy.
Background Blur presumably reflects AVX512 (or whatever version this particular Zen uses); IF Apple opened up AMX a compiler might be able to route the instructions to AMX and get similar performance, but who knows when that will happen.
I don't know if Ray Tracer (single core) is written to be vectorized (with predicates). If it is, that likewise explains the great Zen performance in that case; predicates as part of "NEON" are the one thing it would be nice if Apple picked up from SVE...

As for Object Detection, well unfortunately, unlike SPEC, we don't know which subsystems GB6 stresses. Maybe Object Detection is PRIMARILY a memory bandwidth test, and running one copy is fine for both memory systems, but multiple copies overload the memory systems (of both of them, but more so HP)? You'll note that even on M2 its multicore version scales worse than the other benchmarks.

Xiao_Xi · Sep 18, 2023

name99 said:
Background Blur presumably reflects AVX512 (or whatever version this particular Zen uses);

According to documentation:

The Background Blur workload separates the background from the foreground in a video stream and blurs the background. It models background blurring features in video conferencing apps (such as Zoom, Slack Huddles, and Microsoft Teams). This workload uses DeepLabV3+ as its network and blurs 10 frames from a 1080p video stream.

name99 said:
As for Object Detection, well unfortunately, unlike SPEC, we don't know which subsystems GB6 stresses.

According to documentation:

The Object Detection workload uses machine learning to detect and classify objects in photos and then highlight them in the photo. It models features in photo apps (such as Google Photos or Apple photos) that identify people, animals, and objects in photos. This workload uses the convolutional neural network (CNN) MobileNet v1 SSD to detect and classify objects in 16 photos (300 X 300 px).

https://www.geekbench.com/doc/geekbench6-cpu-workloads.pdf

I wish Geekbench could explain the benchmark better. Some of the results don't make sense.
- M2 Pro (12 cores) vs M2 Pro (10 cores)

MacBook Pro (14-inch, 2023) vs MacBook Pro (14-inch, 2023) - Geekbench

- M2 Max (12 cores) vs M2 Max (12 cores)

MacBook Pro (16-inch, 2023) vs MacBook Pro (16-inch, 2023) - Geekbench

name99 · Sep 18, 2023

Xiao_Xi said:
According to documentation:

According to documentation:

The Object Detection workload uses machine learning to detect and classify objects in photos and then highlight them in the photo. It models features in photo apps (such as Google Photos or Apple photos) that identify people, animals, and objects in photos. This workload uses the convolutional neural network (CNN) MobileNet v1 SSD to detect and classify objects in 16 photos (300 X 300 px).

I don't know why people think this is useful information – this is absolutely USELESS in terms of understanding what subsystems the benchmark stresses.
Does it randomly access a large address range (ie stress TLB)? No idea?
Does it involve a lot of difficult to predict branches? Probably not, but maybe it's using a sparsely compressed NN, so that decompressing it involves lots branches?
etc etc

[various results]
I think you obsess way too much about trivial differences.
10% differences in a single run, IMHO, are noise. 5% differences in statistics are noise.
To my eye, the people who obsess over this level of difference are the people who care about these results as tribal affiliation, not the people who care about them as engineering to be understood.

Xiao_Xi · Sep 20, 2023

name99 said:
I think you obsess way too much about trivial differences.

I failed to communicate what I wanted to express. I find it intriguing how an increase in core count or clock speed affects the results. For instance, in the comparison between 12 cores versus 10 cores, the results for Structure from Motion, Photo Filter and Text Processing are almost the same. However, in the comparison between Pro and Max, the results for Horizon Detection and Photo Filter are almost the same.

name99 said:
10% differences in a single run, IMHO, are noise. 5% differences in statistics are noise.

Statisticians use the standard deviation, not the percentage, to establish whether two points are significantly different or not.

Xiao_Xi · Nov 23, 2023

For those who are lost with computer architecture jargon, Intel made a very interesting video explaining the basic components of a CPU core.

Kristain · Nov 23, 2023

sunny5 said:
Any real life testing between M2 Ultra and AMD 7950X? According to Cinebench R24, both perform quite identical but M2 Ultra consume way less power. I know that Ryzen 7000 is quite a failure but curious to know the real life testing.

Notebook check compared the M3 Max to the Ryzen 7945 hard limited to 55 watts (same as M3 Max) running Cinebench R23, and the Ryzen was actually slightly faster (and hence more efficient) than the M3 Max.

senttoschool · Nov 23, 2023

Kristain said:
Notebook check compared the M3 Max to the Ryzen 7945 hard limited to 55 watts (same as M3 Max) running Cinebench R23, and the Ryzen was actually slightly faster (and hence more efficient) than the M3 Max.

Cinebench is the best case scenario for AMd chips and the worst case scenario for Apple chips. Especially R23.

leman · Nov 23, 2023

Kristain said:
Notebook check compared the M3 Max to the Ryzen 7945 hard limited to 55 watts (same as M3 Max) running Cinebench R23, and the Ryzen was actually slightly faster (and hence more efficient) than the M3 Max.

Two questions:

- why do they use a benchmark known to have performance issues on Apple instead of the new CB 2024?

- is that 55W actual power consumption limit or TDP limit? Because in AND land 55W TDP means around 70-80 watts

Xiao_Xi · Nov 23, 2023

senttoschool said:
Cinebench is the best case scenario for AMd chips and the worst case scenario for Apple chips.

Has anyone tried to analyze what kind of instructions Cinebench R23 uses like Chips and Cheess did with Cinebench 2024?

Cinebench 2024: Reviewing the Benchmark

Maxon’s Cinebench is a perennial benchmark favorite. It’s free, easy to run, and scales across as many cores as you can give it. Its $0 cost allows the internet to provide plenty of res…

chipsandcheese.com

jeanlain · Nov 23, 2023

leman said:
- is that 55W actual power consumption limit or TDP limit? Because in AND land 55W TDP means around 70-80 watts

In fact, it would have been useful to compare the power consumption of SoCs in the test. No only the AMD part may have consumed more than 55W, it's likely that the M3 Max consumed less. Cinebench R23 was notorious for not using much power on Apple Silicon due to optimisation issues.

Kristain · Nov 24, 2023

leman said:
Two questions:

- why do they use a benchmark known to have performance issues on Apple instead of the new CB 2024?

- is that 55W actual power consumption limit or TDP limit? Because in AND land 55W TDP means around 70-80 watts

Agreed. Just mentioned it as it was the only test I've seen limited. Be really interesting to see the results when they all move to 3NM and Intel bring out Core Ultra.

pshufd · Nov 24, 2023

I think that I'd run the test using a power meter to record total system power used for the test.

leman · Nov 24, 2023

pshufd said:
I think that I'd run the test using a power meter to record total system power used for the test.

You can do this on a stationary computer. But anything with a display and a battery is a problem…

[CPU only] Apple M1/2(Max/Ultra) (TSMC 5nm) vs AMD Zen 4 (TSMC 5nm) - Technical Analysis

macrumors 68040

macrumors 68000

macrumors Core

macrumors 603

macrumors 68040

macrumors Core

macrumors 6502a

Suspended

macrumors G4

macrumors 65816

macrumors 68000

Suspended

macrumors 68030

macrumors 68000

macrumors 68030

macrumors 68000

macrumors 68000

macrumors member

macrumors 68030

macrumors Core

macrumors 68000

macrumors 68020

macrumors member

macrumors G4

macrumors Core

Our Staff