[CPU only] Apple M1/2(Max/Ultra) (TSMC 5nm) vs AMD Zen 4 (TSMC 5nm) - Technical Analysis

Icelus · Jul 16, 2024

theorist9 said:
Yeah, more benchmarking BS. They're comparing a 12-core processor (AMD Ryzen AI 9 HX 370) to an 8-core (the M3 in the Air), using MT CPU workloads. E.g., for GB6, the AMD is 2,879/14,888, while the 8-core M3 is 3,082/12,087 (all scores taken from tomshardware.com for consistency), so of course they're going to report the MC scores, and conveniently omit the difference in SC performance.

And what kind of apps are people most commonly using on thin&lights? That's, right SC.

Even worse when ignoring the four M3's efficiency cores (comparing a 4P/4E CPU (M3) to a 12C/24T CPU).

leman · Jul 16, 2024

Icelus said:
Even worse when ignoring the four M3's efficiency cores (comparing a 4P/4E CPU (M3) to a 12C/24T CPU).

The Ryzen can be technically seen as an 4P+8E design or at least 4 high-power and 8 mid-power cores.

Icelus · Jul 16, 2024

leman said:
The Ryzen can be technically seen as an 4P+8E design or at least 4 high-power and 8 mid-power cores.

OK, but AMD's specs don't mention this: https://www.amd.com/en/products/processors/laptop/ryzen/300-series/amd-ryzen-ai-9-hx-370.html

Edit: it is mentioned as "4x Zen 5 , 8x Zen 5c".

leman · Jul 16, 2024

Icelus said:
OK, but AMD's specs don't mention this: https://www.amd.com/en/products/processors/laptop/ryzen/300-series/amd-ryzen-ai-9-hx-370.html

Edit: it is mentioned as "4x Zen 5 , 8x Zen 5c".

Yes, 5c is AMDs lower-power, lower-area core. It’s a core optimized for performance/area and performance/watt.

throAU · Jul 16, 2024

Whoops, wrong thread.

name99 · Jul 16, 2024

theorist9 said:
You missed my point. The chart isn't BS because they compared an 8-core and 12-core processor. It's BS because they showed the kinds of tasks on which the AMD is faster (MC), but conveniently omitted those on which the M3 is faster (SC)—and it's particularly BS b/c the overwhelming majority of apps used on thin&lights are SC rather than MC.

And it's not just that they showed MC rather than SC. They additionally concealed the fact that the scores were MC by labelling them as, e.g., "Geekbench - CPU Score" rather than "Geekbench - MC CPU Score".

Now you might argue that mfrs. do that all the time—they show the scores on which their devices do best. If it were just a matter of cherry-picking certain apps, that's expected. But to omit an entire class of tasks, when that's the most common class used on that category of device, and to conceal that, is an entirely different level of marketing BS.

Another way to say this is to note the lack of appropriate web benchmarks...
Where web benchmarks (and related nonsense like Electron apps) are, as you say, what many people will be running on this sort of device.

theorist9 · Jul 16, 2024

name99 said:
Another way to say this is to note the lack of appropriate web benchmarks...
Where web benchmarks (and related nonsense like Electron apps) are, as you say, what many people will be running on this sort of device.

I was thinking of desktop apps, rather than web-based apps. But I suppose if you had a web-based app where a lot of the computation was done on the client side, then that might be interesting to benchmark as well.

jdb8167 · Jul 16, 2024

theorist9 said:
I was thinking of desktop apps, rather than web-based apps. But I suppose if you had a web-based app where a lot of the computation was done on the client side, then that might be interesting to benchmark as well.

Well theoretically a well designed web-based app should be using worker threads. But a well designed web-based app is kind of an oxymoron.

theorist9 · Jul 16, 2024

jdb8167 said:
Well theoretically a well designed web-based app should be using worker threads. But a well designed web-based app is kind of an oxymoron.

What does "worker threads" mean?

Icelus · Jul 16, 2024

theorist9 said:
What does "worker threads" mean?

Probably Web Workers.

Using Web Workers - Web APIs | MDN

Web Workers are a simple means for web content to run scripts in background threads. The worker thread can perform tasks without interfering with the user interface. In addition, they can make network requests using the fetch() or XMLHttpRequest APIs. Once created, a worker can send messages to...

developer.mozilla.org

Homy · Jul 16, 2024

Xiao_Xi said:
Doesn't the tech press use the same video games to make historical comparisons?

AMD is using games with built-in benchmark tool for convenience but it doesn't make it accurate on Mac. They should use Lies of P, Death Stranding or Resident Evil but accuracy is often not a priority in marketing material.

jdb8167 · Jul 17, 2024

Icelus said:
Probably Web Workers.

Using Web Workers - Web APIs | MDN

Web Workers are a simple means for web content to run scripts in background threads. The worker thread can perform tasks without interfering with the user interface. In addition, they can make network requests using the fetch() or XMLHttpRequest APIs. Once created, a worker can send messages to...

developer.mozilla.org

Exactly.

Xiao_Xi · Feb 18, 2025

Reviews of AMD's latest SoC have been released. While AMD's new SOC seems to have taken a big leap in the right direction, Apple's M4 still outperforms AMD's new SoC.

AMD's game-changing Strix Halo APU, formally known as the Ryzen AI Max+, poses for new die shots and gets annotated

441mm² design holds 16 cores, 40 CUs, and 128GB of memory

www.tomshardware.com

mi7chy · Feb 18, 2025

Enticing as a 45 to 55W MiniPC with 128GB unified memory if they don't overprice it. It will have access to larger software ecosystem, full Linux support and won't be weighed down by compatibility layer overhead.

chars1ub0w · Feb 18, 2025

mi7chy said:
Enticing as a 45 to 55W MiniPC with 128GB unified memory if they don't overprice it. It will have access to larger software ecosystem, full Linux support and won't be weighed down by compatibility layer overhead.

Would like to see a head-to-head token/sec match vs the M4 Max 128GB on the largest LLM model that will fit in 96GB (max to CPU?)

crazy dave · Feb 19, 2025

chars1ub0w said:
Would like to see a head-to-head token/sec match vs the M4 Max 128GB on the largest LLM model that will fit in 96GB (max to CPU?)

96GB is Max to GPU for the Halo. The M4 Max would probably win that rather easily as the Halo's bandwidth is less than the Pro's and the Pro generally beats the Halo in AI workloads measured. You can see that in the video review @Xiao_Xi posted above. So an M4 Max with double the bandwidth of the Pro would be substantially faster.

mi7chy · Feb 19, 2025

Ryzen 395 RT is surprisingly not crappy and competitive with discrete graphics and devices costing around double.

https://arstechnica.com/gadgets/202...ablet-takes-the-asterisk-off-integrated-gpus/

2025-Asus-ROG-Flow-Z13-Benchmarks.007-640x480.jpeg

dmr727 · Feb 19, 2025

mi7chy said:
Ryzen 395 RT is surprisingly not crappy and competitive with discrete graphics and devices costing around double.

https://arstechnica.com/gadgets/202...ablet-takes-the-asterisk-off-integrated-gpus/

View attachment 2483933

I saw that article too - the Z13 is an impressive piece of hardware.

chars1ub0w · Feb 19, 2025

crazy dave said:
96GB is Max to GPU for the Halo. The M4 Max would probably win that rather easily as the Halo's bandwidth is less than the Pro's and the Pro generally beats the Halo in AI workloads measured. You can see that in the video review @Xiao_Xi posted above. So an M4 Max with double the bandwidth of the Pro would be substantially faster.

I meant to type 96GB max to GPU (not CPU). On macOS, sysctl iogpu.wired_limit_mb can be adjusted a bit to give more than the default 75% for large RAM sizes. That's 96GB for 128GB M4 Max, but I don't know the maximum permissible, e.g. could we have 120GB for GPU and reserve 8GB for CPU on macOS?

crazy dave · Feb 19, 2025

chars1ub0w said:
I meant to type 96GB max to GPU (not CPU). On macOS, sysctl iogpu.wired_limit_mb can be adjusted a bit to give more than the default 75% for large RAM sizes. That's 96GB for 128GB M4 Max, but I don't know the maximum permissible, e.g. could we have 120GB for GPU and reserve 8GB for CPU on macOS?

You can go over the 75% recommended limit for macOS. However, @leman did the actual tests and I'm just going to quote him directly rather than filter it through me:

There are some caveats. I just did a quick test to make sure.

- There is a practical limits to a single buffer size, around 20GB on my system. You only get an error when you try to use it, not at allocation time
- The working size of resident GPU memory cannot exceed the total RAM amount. E.g. if you attempt to access 32GB worth of buffers in a single compute pass, you will get an "out of memory" runtime failure
- You *can* however use more than total RAM worth of buffers in separate compute passes. I had no problem writing to 80GB worth of buffers on my 36GB machine as long as I did not bind more than 27GB worth of data per pass. I suppose the system will swap between passes as needed.

Overall, it appears that the actively accessed GPU data needs to be resident in RAM at all time. The system will swap data in and out as needed. The GPU command submission serves as a boundary for residency management. But it does not seem as if the GPU can currently interrupt execution to swap data in and out. And even if it can, it won't (which is understandable, the performance would be very bad).

There are also sparse resources, which might add an additional dimension to all this, but I don't have experience working with them.

P.S. I also managed to hard freeze my laptop by trying to sequentially process multiple 20+GB buffers. I suppose it hit a slow path in the GPU firmware and the machine reset after a timeout. It was completely unresponsive for a minute or two. I previously had a similar experience experimenting with the texture count limits. It seems like Apple engineers don't expect one to do dumb stuff like that (and who would blame them).

chars1ub0w · Feb 19, 2025

crazy dave said:
You can go over the 75% recommended limit for macOS. However, @leman did the actual tests and I'm just going to quote him directly rather than filter it through me:

I've also experimented with the limits, but I only have a 64GB M1 Max to play with. Hope the M5 Max or M4 Ultra will allow a lot more RAM.

senttoschool · Feb 19, 2025

mi7chy said:
Enticing as a 45 to 55W MiniPC with 128GB unified memory if they don't overprice it. It will have access to larger software ecosystem, full Linux support and won't be weighed down by compatibility layer overhead.

You can get an RTX 4060 laptop with an AMD CPU for $1,000 today. Why would anyone pay double that for similar performance?

Because it has 256GB/s of bandwidth?

Why would local LLM people want 128GB of unified memory when it's 256GB/s of bandwidth only? It'd be painfully slow for models that require that much RAM - practically useless. If you load a 70B model on it, it'd have a maximum of 3 tokens/s without accounting for other bottlenecks and overheard. That's torture to use.

I think this laptop would be great if it's $1,500, not $2,100 and certainly not $2800.

Right now, this SoC does not have product market fit. It gets beat by much cheaper RTX laptops in gaming. It gets beat by Apple in performance and efficiency. I can't think of a market for this. If there is one, it must be very small.

mi7chy · Feb 20, 2025

Promising, only a $500 premium to upgrade from 32GB to 128GB RAM vs Apple $1K from 48GB to 128GB.

https://rog.asus.com/us/laptops/rog-flow/rog-flow-z13-2025/spec/

leman · Feb 20, 2025

senttoschool said:
You can get an RTX 4060 laptop with an AMD CPU for $1,000 today. Why would anyone pay double that for similar performance?

Because it has 256GB/s of bandwidth?

Why would local LLM people want 128GB of unified memory when it's 256GB/s of bandwidth only? It'd be painfully slow for models that require that much RAM - practically useless. If you load a 70B model on it, it'd have a maximum of 3 tokens/s without accounting for other bottlenecks and overheard. That's torture to use.

It is unlikely that more bandwidth would result in better performance. As it is, the bandwidth/performance ratio for ML is already better than a 4090. For mixed-precision inference, these SKUs should work well enough.

senttoschool said:
I think this laptop would be great if it's $1,500, not $2,100 and certainly not $2800.

Right now, this SoC does not have product market fit. It gets beat by much cheaper RTX laptops in gaming. It gets beat by Apple in performance and efficiency. I can't think of a market for this. If there is one, it must be very small.

This I agree with. There are better products in almost every category for less money. And if you need CPU performance, a Mac is a better deal.

komuh · Feb 20, 2025

mi7chy said:
Promising, only a $500 premium to upgrade from 32GB to 128GB RAM vs Apple $1K from 48GB to 128GB.

https://rog.asus.com/us/laptops/rog-flow/rog-flow-z13-2025/spec/

But memory is slower, so for 96 usable GB on AMD you got limited by 250GB/s bus?, in newer models there a lot of optimization pushed into minimizing memory transfer as TOPS aren't limiting performance anymore (in most cases).

M4 Ultra would for sure be a lot better if we got it soon and M4 max is better right now, but NPU on AMD seems to be faster (same as Intel) compared to Apple, but how easy it is to utilize or if they stated performance is truthful i have no idea.

As for Apple you are limited by static graphs, so LLM workloads are pretty limited if you want utilize ANE (As you need autoregressive predictions in decoder part, there is new API in macOS 15+ but i didn't have any luck making it faster than good Metal kernels on pro+ devices).

[CPU only] Apple M1/2(Max/Ultra) (TSMC 5nm) vs AMD Zen 4 (TSMC 5nm) - Technical Analysis

macrumors 6502

macrumors Core

macrumors 6502

macrumors Core

macrumors G4

macrumors 68030

macrumors 601

macrumors 601

macrumors 601

macrumors 6502

macrumors 68030

macrumors 601

macrumors 68000

Suspended

macrumors 6502

macrumors 68000

Suspended

macrumors G4

macrumors 6502

macrumors 68000

macrumors 6502

macrumors 68030

Suspended

macrumors Core

Suspended

Our Staff