[CPU only] Apple M1/2(Max/Ultra) (TSMC 5nm) vs AMD Zen 4 (TSMC 5nm) - Technical Analysis

leman · Jul 16, 2023

Xiao_Xi said:
In addition to Metal, Apple could also support SYCL

Why would they need to support any dumb thing just because it comes out of Kronos? Personally, all these C++ DSLs can go burn in hell. They only create fragmentation… we need proper handling of CPU/GPU hybrid code, not another compiler dialect.

Xiao_Xi said:
and Vulkan.

What for? MoltenVK does good enough job as it is.

Xiao_Xi · Jul 16, 2023

leman said:
all these C++ DSLs can go burn in hell.

Isn't MSL an C++ DSL?

leman said:
They only create fragmentation…

The main goal of open standards is to reduce fragmentation. A mediocre open standard is much better than a proprietary standard that I can't use.

leman said:
we need proper handling of CPU/GPU hybrid code, not another compiler dialect.

How is SYCL different from what you want?

leman · Jul 17, 2023

Xiao_Xi said:
Isn't MSL an C++ DSL?

The main goal of open standards is to reduce fragmentation. A mediocre open standard is much better than a proprietary standard that I can't use.

How is SYCL different from what you want?

My personal problem with all these frameworks that let you mix CPU and GPU code (aka C++ embedded DSLs) is that they look convenient on paper, but lock you in into a specific compiler dialect and tooling and remove flexibility. This is the fragmentation I am talking about. If you write a program that uses CUDA or SYCL, you are not writing a valid C++ program according to the standard. In particular, integrating this into codebases (especially if you want to go cross-platform or ship an app) can create additional headache. I fully understand why Nvidia pushed this with CUDA, after all, they were targeting academia (where people are generally sloppy and always in a rush), and this model is great for locking people in.

I don't have a problem with GPU-specific dialects, after all, they need specialised constructs. Both CUDA and Metal generally do a good job staying within the boundaries of the standard. What I have a problem with is the idea of the "same source" for CPU and GPU code. Personally, I like Apple's approach best — CPU and GPU code is kept as separate languages, but use the same interface headers that describe the shared memory IPC. Where Metal falls short is the need of host plumbing (shader libraries, pipeline objects, command encoders, buffers etc.) and inability to directly invoke new kernels from kernels.

What I would really like to see is a framework that cleanly separates between the CPU (serial) and GPU (parallel) code, but makes the IPC layer mostly go away by promoting it to first class citizens. That is, separate source files with shared interface declarations (like Metal is now), but GPU kernels are linked as first class function-like objects that can be invoked directly from the CPU code, without plumbing or setup. No special buffer objects or GPU side allocations, no queues or encoders (of course, the API should be adaptive, letting you use the base primitives if you need more performance or explicit synchronisation). But at the base level, invoking the GPU should be as easy as using Grand Central Dispatch. @Apple, if you are reading this and are interested, give me a call, we can discuss details

leman · Jul 17, 2023

BTW, just as a chance has it, there is currently a discussion about OpenCL on the C programming subreddit and here is a statement from someone who claims to have worked on initial implementation (I have no reason to doubt them)

So, I actually worked on the original version of OpenCL for one of the original Khronos group partners, which meant I got a ringside seat to how parts of the standard was developed. Lots of REALLY smart people who (for the most part) were very dedicated to making something pretty awesome. Unfortunately, this was right when GPU acceleration and several other new processor techniques were just talking off. The various partners had invested enormous sums into their respective technologies, and at some levels things got rather cutthroat. There were more than a few parts of the standard which were not exactly crippled, but designed in such a way to put a competitor at a disadvantage. If AMD needed a feature to enable some capacity but Nvidia and Apple didn't, they might not get it, even though it didn't harm anyone else. There were instances of this from all parties, but Nvidia already was gaining ground with Cuda and tended to swing the biggest stick. AMD and IBM had their own models as well, and I think Apple did as well. No one wanted to miss out on being in the "standard" but most of them had vested interests in seeing it be not-quite-as-good as their own proprietary stuff.

I think this is spot on and illustrates the problem with open standards.

Xiao_Xi · Jul 17, 2023

leman said:
illustrates the problem with open standards.

It just shows that it was too late when they tried to do it.

The various partners had invested enormous sums into their respective technologies, and at some levels things got rather cutthroat.

name99 said:
You want to believe this even after plenty of evidence to the contrary, and repeated explanations of why it doesn't work that way from people inside the system? OK...

You and @leman have repeated over and over again the same example: how a group of companies couldn't agree on a GPU graphics and compute API.

Do you think HTTP3, the European electric car charger or Passkey had those problems?

This is why it is so important to adopt the standard as soon as possible, to discourage companies from trying to develop their own standard.

Joe Dohn · Jul 17, 2023

quarkysg said:
Sure, but relying on Linux for mission critical applications sometimes maybe not be wise.

Depends on how much money you are willing to throw at it to make it suit your needs.
Linux CAN be rock-solid if you write custom patches and a custom framework, but that usually requires a six-figure investment.

quarkysg · Jul 17, 2023

Joe Dohn said:
Depends on how much money you are willing to throw at it to make it suit your needs.
Linux CAN be rock-solid if you write custom patches and a custom framework, but that usually requires a six-figure investment.

I think you hit the crux of the matter. Money. Usually it’s in short supply.

Pet3rK · Jul 17, 2023

quarkysg said:
I think you hit the crux of the matter. Money. Usually it’s in short supply.

Is that why Red Hat have a paid version? Is Red Hat one of the most stable Linux distro? There's a trend specific to the field I am entering that they recommend the paid version of RHEL. They also support macOS but I thought it's weird they are pushing the paid version of a Linux distro.

Xiao_Xi · Jul 17, 2023

Pet3rK said:
Is that why Red Hat have a paid version? Is Red Hat one of the most stable Linux distro? There's a trend specific to the field I am entering that they recommend the paid version of RHEL.

RHEL is considered the standard in many fields because it has 10 years of support and many applications only run on it. Fortunately, there are other distros compatible with it such as RockyLinux or AlmaLinux. However, Red Hat is not happy about it and has decided to do everything possible to kill them.

VFX & Animation Studios Urged To Upgrade To RHEL 9 Or Rocky Linux / AlmaLinux 9 - Phoronix

www.phoronix.com

sunny5 · Jul 17, 2023

Since M2 Ultra is slower than Intel i9-13900K from Cinebench and Blender testing, I wouldn't expect too much about it.

Numa_Numa_eh · Jul 17, 2023

sunny5 said:
Since M2 Ultra is slower than Intel i9-13900K from Cinebench and Blender testing, I wouldn't expect too much about it.

How is it with Spec testing?

name99 · Jul 17, 2023

leman said:
My personal problem with all these frameworks that let you mix CPU and GPU code (aka C++ embedded DSLs) is that they look convenient on paper, but lock you in into a specific compiler dialect and tooling and remove flexibility. This is the fragmentation I am talking about. If you write a program that uses CUDA or SYCL, you are not writing a valid C++ program according to the standard. In particular, integrating this into codebases (especially if you want to go cross-platform or ship an app) can create additional headache. I fully understand why Nvidia pushed this with CUDA, after all, they were targeting academia (where people are generally sloppy and always in a rush), and this model is great for locking people in.

I don't have a problem with GPU-specific dialects, after all, they need specialised constructs. Both CUDA and Metal generally do a good job staying within the boundaries of the standard. What I have a problem with is the idea of the "same source" for CPU and GPU code. Personally, I like Apple's approach best — CPU and GPU code is kept as separate languages, but use the same interface headers that describe the shared memory IPC. Where Metal falls short is the need of host plumbing (shader libraries, pipeline objects, command encoders, buffers etc.) and inability to directly invoke new kernels from kernels.

What I would really like to see is a framework that cleanly separates between the CPU (serial) and GPU (parallel) code, but makes the IPC layer mostly go away by promoting it to first class citizens. That is, separate source files with shared interface declarations (like Metal is now), but GPU kernels are linked as first class function-like objects that can be invoked directly from the CPU code, without plumbing or setup. No special buffer objects or GPU side allocations, no queues or encoders (of course, the API should be adaptive, letting you use the base primitives if you need more performance or explicit synchronisation). But at the base level, invoking the GPU should be as easy as using Grand Central Dispatch. @Apple, if you are reading this and are interested, give me a call, we can discuss details

As I've said before, current Metal is constrained by the need for AMD compatibility...

With the Mac Pro that goes away, so I'm guessing Apple will be fairly aggressive (three, four years?) in rolling out Metal 5 (or Metal Pro whatever) where they can get rid of a lot of this historical cross-device baggage.

bcortens · Jul 17, 2023

name99 said:
As I've said before, current Metal is constrained by the need for AMD compatibility...

With the Mac Pro that goes away, so I'm guessing Apple will be fairly aggressive (three, four years?) in rolling out Metal 5 (or Metal Pro whatever) where they can get rid of a lot of this historical cross-device baggage.

This is partially why I don't think Vulkan is a good choice. Metal allows Apple to tailor the API to their GPU and platform architecture in a way Vulkan will never allow. Vulkan will always have to take into account non-uniform memory architectures as well as Immediate mode rendering systems.

name99 · Jul 17, 2023

Xiao_Xi said:
The correct question is why codec naming changed from h.xxx to MPEG.

H.261 - Wikipedia

en.wikipedia.org

MPEG-1 - Wikipedia

en.wikipedia.org

Advanced Video Coding - Wikipedia

en.wikipedia.org

Like I said, you can be naive and always believe the official account. Or you can listen to people who were actually part of the process...

Xiao_Xi said:
[IBM long term support]
Do you have a link to that?

Here's an example:

IBM product lifecycle - IBM Support

View, search and download lifecycle information for individual IBM product versions and releases. Track general availabilty, end of marketing and end of support dates for IBM products.

www.ibm.com

IBM is all over the place, in part because the names are unfamiliar, but also in part because their clients are of the sort where, if the Pentagon or the Fed ask (and pay...) to maintain OS/360 for another ten years, IBM's not going to say no...

Xiao_Xi · Jul 17, 2023

name99 said:
Like I said, you can be naive and always believe the official account. Or you can listen to people who were actually part of the process...

What would you give more credibility to: a Wikipedia page or someone's story from a forum?

leman · Jul 17, 2023

name99 said:
As I've said before, current Metal is constrained by the need for AMD compatibility...

With the Mac Pro that goes away, so I'm guessing Apple will be fairly aggressive (three, four years?) in rolling out Metal 5 (or Metal Pro whatever) where they can get rid of a lot of this historical cross-device baggage.

I have my doubts whether, for example, Metal's current inability to share virtual address space between CPU and GPU is because Apple has also to support Intel and AMD. So far Metal has no problem providing different feature sets — and different behaviour — on Apple GPUs and third-party GPUs. Undoubtedly, removing support for Intel/AMD will allow Apple to move more freely, but I would guess that there is something about the very architecture of Apple GPUs that makes unified virtual memory not feasible at this point...

bcortens said:
This is partially why I don't think Vulkan is a good choice. Metal allows Apple to tailor the API to their GPU and platform architecture in a way Vulkan will never allow. Vulkan will always have to take into account non-uniform memory architectures as well as Immediate mode rendering systems.

Vulkan is full of compromises and I really have to scratch my head at some of their choices. For example, the new descriptor buffer extension lets implementations choose arbitrary sizes for various resource descriptors, which makes handling addresses in the user code a total nightmare. They could have fixed the size at 8 bytes (like Apple did) or 16 bytes (to also include AMD's extended texture descriptors) and still cover 99.99% of all interesting hardware save for some weird Chinese smartphones that seem to have 64-byte data pointers and will never be used with bindless rendering to begin with...

leman · Jul 17, 2023

Xiao_Xi said:
What would you give more credibility to: a Wikipedia page or someone's story from a forum?

I have recently encountered firsthand how frustrated Wikipedia can be. The M1 and M2 pages have some obviously invalid information about the GPU. I have tried fixing it, but my edits were promptly reverted with the reasoning that my sources are not authoritative enough. When I pointed out that there is no authoritative source because the manufacturer does not release the info and that the information on Wikipedia also doesn't have any source, the reply was basically "yes, but it's already there, so it is more valuable than expert opinion or reverse-engineered Linux drivers". I then opened a discussion which has been ignored.

The point is, Wikipedia is good for some stuff. It's not that good for some other stuff. If @name99 says he has first-hand knowledge of the codec standardisation process, I have no reasons to doubt him. Besides, his work on QuickTime and MPEG is a matter of public record.

Xiao_Xi · Jul 17, 2023

leman said:
Why would this be the ultimate goal? Why not let the technology develop at its own pace?

To avoid situations like you described:

leman said:
My personal problem with all these frameworks that let you mix CPU and GPU code (aka C++ embedded DSLs) is that they look convenient on paper, but lock you in into a specific compiler dialect and tooling and remove flexibility.

It is very difficult to adopt standards when they are not established from the beginning. When do you think all countries will adopt the same plugs and sockets? or drive in the same direction?

leman said:
One particular example I have in mind is const generics. The talent was rushing to ship async because of the pressure of the corporate backer

Although const generics are great, I think async/await has more influence in making Rust a better alternative to C in networking projects. I have used C, because for many, Rust can replace C in many projects.

Redirect

leman said:
If @name99 says he has first-hand knowledge of the codec standardisation process, I have no reasons to doubt him.

Unless I can read something about it, I have no reason to believe him either.

leman · Jul 18, 2023

Xiao_Xi said:
To avoid situations like you described:

It is very difficult to adopt standards when they are not established from the beginning. When do you think all countries will adopt the same plugs and sockets? or drive in the same direction?

No, this is not what I was talking about. My criticism is not of standards or lack of standards, but the very idea of embedded C++ DSLs. The fragmentation I am talking about is not about having multiple standards, but the fragmentation of C++ ecosystem and programming in general. Even if there is a super standards everyone agrees upon, you will still have to deal with a specialised compiler and its C++ dialect. And even if (by some divine intervention) this eDSL standard gets integrated into official C++ you still have a problem if you want to use your own language.

I want to keep the CPU and GPU code strictly separate (using different language dialects where appropriate), while focusing on ease of interfacing and flexible tooling support, not making one fragile unwieldy mega-tool like SYCL.

Xiao_Xi said:
Although const generics are great, I think async/await has more influence in making Rust a better alternative to C in networking projects. I have used C, because for many, Rust can replace C in many projects.

There is no doubt that the async push made it better for servers, I just believe these kind of pushes should not happen at the expense of the base language. But I am ranting. As to C, it hasn't been a good choice for server backends in over a decade, so I am not even sure why you are mentioning it. Most backend stuff these days runs either on Java or JavaScript.

Xiao_Xi said:
Unless I can read something about it, I have no reason to believe him either.

Well, look it up. He is a former Apple engineer who worked on Quicktime/MPEG and the author of the most detailed analysis of Apple Silicon architecture. You will be hard pressed to find a more credible user on MR.

Xiao_Xi · Jul 18, 2023

leman said:
Even if there is a super standards everyone agrees upon, you will still have to deal with a specialised compiler and its C++ dialect.

I doubt you would say that if CPU and GPU programming were as integrated as synchronous and asynchronous programming is in some languages.

leman said:
As to C, it hasn't been a good choice for server backends in over a decade, so I am not even sure why you are mentioning it. Most backend stuff these days runs either on Java or JavaScript.

I didn't mean the server backend like Node.js, but the server infrastructure like Apache Server and Nginx (both written in C). Some companies are using Rust to replace those kinds of tools.

How we built Pingora, the proxy that connects Cloudflare to the Internet

Today we are excited to talk about Pingora, a new HTTP proxy we’ve built in-house using Rust that serves over 1 trillion requests a day.

blog.cloudflare.com

Carl Lerche, a principal AWS engineer, says Rust and Tokio [an asynchronous runtime] give AWS the ability to write services that respond fast, reliably, and that help us offer a better customer experience.

Why AWS loves Rust, and how we’d like to help | Amazon Web Services

One of the most exciting things about the Rust programming language is that it makes infrastructure incredibly boring. That’s not a bad thing, in this case. No one wants their electrical wiring to be exciting; most of us prefer the safety that comes with being able to flip a switch and have...

aws.amazon.com

Romain_H · Jul 18, 2023

leman said:
No, this is not what I was talking about. My criticism is not of standards or lack of standards, but the very idea of embedded C++ DSLs. The fragmentation I am talking about is not about having multiple standards, but the fragmentation of C++ ecosystem and programming in general. Even if there is a super standards everyone agrees upon, you will still have to deal with a specialised compiler and its C++ dialect.

Clang supports CUDA, iirc. Haven't tried it yet, though.

My issue (talking CUDA) is integration into development environments. Took me 3 days or so to get it integrated in my IDE (Qt Creator / C++).
Not sure how it would, e.g., bond with IntelliJ or *insert your IDE here*.

Maybe just me, but I found intrgration nightmare-ish

leman · Jul 18, 2023

Xiao_Xi said:
I doubt you would say that if CPU and GPU programming were as integrated as synchronous and asynchronous programming is in some languages.

I believe this is a pipe dream. CPUs and GPUs are different devices that require different mindsets and algorithmic approaches. They should not be integrated.

Of course, it's just my opinion and I acknowledge that mixing multi-purpose and multi-device code has become popular. I have tried multiple such frameworks (from CUDA to web frameworks like SvelteKit) and they all leave bitter taste in my mouth. Just too much complexity and difficult to reason about. Also, inflexible.

Romain_H said:
Clang supports CUDA, iirc. Haven't tried it yet, though.

Yeah, it's arguably a saner approach than nvcc because at least everything goes though the same parser. And yet, I don't really understand why there has to be one tool that does it all. This neat modern already breaks down if I want to use a different host language than C++. If instead the model cleanly separated the CPU and GPU codebase while defining a standard C ABI for invoking compiled kernels, interfacing with GPU code simply becomes an exercise in C FFI and integration with host code boils down to C FFI integration + additional build step for GPU code. Less complexity, cleaner overall, more flexibility.

Xiao_Xi · Jul 18, 2023

leman said:
This neat modern already breaks down if I want to use a different host language than C++. If instead the model cleanly separated the CPU and GPU codebase while defining a standard C ABI for invoking compiled kernels, interfacing with GPU code simply becomes an exercise in C FFI and integration with host code boils down to C FFI integration + additional build step for GPU code.

Why would Nvidia agree to do that?

leman said:
it's arguably a saner approach than nvcc because at least everything goes though the same parser.

There is no formal CUDA spec, and clang and nvcc speak slightly different dialects of the language.

Compiling CUDA with clang — LLVM 22.0.0git documentation

Lack of open standards hinders competition and progress.

leman said:
I want to keep the CPU and GPU code strictly separate (using different language dialects where appropriate), while focusing on ease of interfacing and flexible tooling support, not making one fragile unwieldy mega-tool like SYCL.

In this regard, Michael Wong, Head of the Canadian Delegation for the ISO C++ Standard, once said:

You cannot type check between code that is running on the CPU and this code that is running on the GPU.

Out of curiosity, what do you think of the C++ executors?

https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2300r7.html

Xiao_Xi · Aug 18, 2023

Now that Chips and Cheese does posts about ARM cores, I can see the possibility of them writing about Apple cores.

ARM’s Cortex A710: Winning by Default

ARM Ltd has been dominating the Android world for the better part of the last decade, with their 7-series cores at the forefront of their success. Throughout the late 2010s, the Cortex A73, A75, an…

chipsandcheese.com

ARM’s Neoverse N2: Cortex A710 for Servers

ARM’s Neoverse N1 was based on ARM’s Cortex A76 mobile core, but enhancements like instruction cache coherency and 48-bit physical addressing made it usable in servers.

chipsandcheese.com

Anyway, how good are their blog posts?

thenewperson · Aug 20, 2023

Xiao_Xi said:
Anyway, how good are their blog posts?

Well, they’re long at least ¯\_(ツ)_/¯

Anyway, here’s hoping they do plan to step up to fill the gap left by Andrei leaving AT. Until the person that writes about them gets hired by some chip company again 😒

[CPU only] Apple M1/2(Max/Ultra) (TSMC 5nm) vs AMD Zen 4 (TSMC 5nm) - Technical Analysis

macrumors Core

macrumors 68000

macrumors Core

macrumors Core

macrumors 68000

macrumors 6502a

macrumors 65816

macrumors member

macrumors 68000

Suspended

Suspended

macrumors 68030

macrumors 65816

macrumors 68030

macrumors 68000

macrumors Core

macrumors Core

macrumors 68000

macrumors Core

macrumors 68000

macrumors 6502a

macrumors Core

macrumors 68000

macrumors 68000

macrumors 65816

Our Staff