Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

leman

macrumors Core
Oct 14, 2008
19,516
19,664
In addition to Metal, Apple could also support SYCL

Why would they need to support any dumb thing just because it comes out of Kronos? Personally, all these C++ DSLs can go burn in hell. They only create fragmentation… we need proper handling of CPU/GPU hybrid code, not another compiler dialect.

and Vulkan.

What for? MoltenVK does good enough job as it is.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
all these C++ DSLs can go burn in hell.
Isn't MSL an C++ DSL?

They only create fragmentation…
The main goal of open standards is to reduce fragmentation. A mediocre open standard is much better than a proprietary standard that I can't use.

we need proper handling of CPU/GPU hybrid code, not another compiler dialect.
How is SYCL different from what you want?
 

leman

macrumors Core
Oct 14, 2008
19,516
19,664
Isn't MSL an C++ DSL?


The main goal of open standards is to reduce fragmentation. A mediocre open standard is much better than a proprietary standard that I can't use.


How is SYCL different from what you want?

My personal problem with all these frameworks that let you mix CPU and GPU code (aka C++ embedded DSLs) is that they look convenient on paper, but lock you in into a specific compiler dialect and tooling and remove flexibility. This is the fragmentation I am talking about. If you write a program that uses CUDA or SYCL, you are not writing a valid C++ program according to the standard. In particular, integrating this into codebases (especially if you want to go cross-platform or ship an app) can create additional headache. I fully understand why Nvidia pushed this with CUDA, after all, they were targeting academia (where people are generally sloppy and always in a rush), and this model is great for locking people in.

I don't have a problem with GPU-specific dialects, after all, they need specialised constructs. Both CUDA and Metal generally do a good job staying within the boundaries of the standard. What I have a problem with is the idea of the "same source" for CPU and GPU code. Personally, I like Apple's approach best — CPU and GPU code is kept as separate languages, but use the same interface headers that describe the shared memory IPC. Where Metal falls short is the need of host plumbing (shader libraries, pipeline objects, command encoders, buffers etc.) and inability to directly invoke new kernels from kernels.

What I would really like to see is a framework that cleanly separates between the CPU (serial) and GPU (parallel) code, but makes the IPC layer mostly go away by promoting it to first class citizens. That is, separate source files with shared interface declarations (like Metal is now), but GPU kernels are linked as first class function-like objects that can be invoked directly from the CPU code, without plumbing or setup. No special buffer objects or GPU side allocations, no queues or encoders (of course, the API should be adaptive, letting you use the base primitives if you need more performance or explicit synchronisation). But at the base level, invoking the GPU should be as easy as using Grand Central Dispatch. @Apple, if you are reading this and are interested, give me a call, we can discuss details :D
 

leman

macrumors Core
Oct 14, 2008
19,516
19,664
BTW, just as a chance has it, there is currently a discussion about OpenCL on the C programming subreddit and here is a statement from someone who claims to have worked on initial implementation (I have no reason to doubt them)

So, I actually worked on the original version of OpenCL for one of the original Khronos group partners, which meant I got a ringside seat to how parts of the standard was developed. Lots of REALLY smart people who (for the most part) were very dedicated to making something pretty awesome. Unfortunately, this was right when GPU acceleration and several other new processor techniques were just talking off. The various partners had invested enormous sums into their respective technologies, and at some levels things got rather cutthroat. There were more than a few parts of the standard which were not exactly crippled, but designed in such a way to put a competitor at a disadvantage. If AMD needed a feature to enable some capacity but Nvidia and Apple didn't, they might not get it, even though it didn't harm anyone else. There were instances of this from all parties, but Nvidia already was gaining ground with Cuda and tended to swing the biggest stick. AMD and IBM had their own models as well, and I think Apple did as well. No one wanted to miss out on being in the "standard" but most of them had vested interests in seeing it be not-quite-as-good as their own proprietary stuff.

I think this is spot on and illustrates the problem with open standards.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
illustrates the problem with open standards.
It just shows that it was too late when they tried to do it.
The various partners had invested enormous sums into their respective technologies, and at some levels things got rather cutthroat.

You want to believe this even after plenty of evidence to the contrary, and repeated explanations of why it doesn't work that way from people inside the system? OK...
You and @leman have repeated over and over again the same example: how a group of companies couldn't agree on a GPU graphics and compute API.

Do you think HTTP3, the European electric car charger or Passkey had those problems?

This is why it is so important to adopt the standard as soon as possible, to discourage companies from trying to develop their own standard.
 
Last edited:

Joe Dohn

macrumors 6502a
Jul 6, 2020
840
748
Sure, but relying on Linux for mission critical applications sometimes maybe not be wise.

Depends on how much money you are willing to throw at it to make it suit your needs.
Linux CAN be rock-solid if you write custom patches and a custom framework, but that usually requires a six-figure investment.
 

quarkysg

macrumors 65816
Oct 12, 2019
1,247
841
Depends on how much money you are willing to throw at it to make it suit your needs.
Linux CAN be rock-solid if you write custom patches and a custom framework, but that usually requires a six-figure investment.
I think you hit the crux of the matter. Money. Usually it’s in short supply.
 
  • Like
Reactions: Joe Dohn

Pet3rK

macrumors member
May 7, 2023
57
34
I think you hit the crux of the matter. Money. Usually it’s in short supply.
Is that why Red Hat have a paid version? Is Red Hat one of the most stable Linux distro? There's a trend specific to the field I am entering that they recommend the paid version of RHEL. They also support macOS but I thought it's weird they are pushing the paid version of a Linux distro.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Is that why Red Hat have a paid version? Is Red Hat one of the most stable Linux distro? There's a trend specific to the field I am entering that they recommend the paid version of RHEL.
RHEL is considered the standard in many fields because it has 10 years of support and many applications only run on it. Fortunately, there are other distros compatible with it such as RockyLinux or AlmaLinux. However, Red Hat is not happy about it and has decided to do everything possible to kill them.
 

sunny5

macrumors 68000
Jun 11, 2021
1,835
1,706
Since M2 Ultra is slower than Intel i9-13900K from Cinebench and Blender testing, I wouldn't expect too much about it.
 

name99

macrumors 68020
Jun 21, 2004
2,407
2,308
My personal problem with all these frameworks that let you mix CPU and GPU code (aka C++ embedded DSLs) is that they look convenient on paper, but lock you in into a specific compiler dialect and tooling and remove flexibility. This is the fragmentation I am talking about. If you write a program that uses CUDA or SYCL, you are not writing a valid C++ program according to the standard. In particular, integrating this into codebases (especially if you want to go cross-platform or ship an app) can create additional headache. I fully understand why Nvidia pushed this with CUDA, after all, they were targeting academia (where people are generally sloppy and always in a rush), and this model is great for locking people in.

I don't have a problem with GPU-specific dialects, after all, they need specialised constructs. Both CUDA and Metal generally do a good job staying within the boundaries of the standard. What I have a problem with is the idea of the "same source" for CPU and GPU code. Personally, I like Apple's approach best — CPU and GPU code is kept as separate languages, but use the same interface headers that describe the shared memory IPC. Where Metal falls short is the need of host plumbing (shader libraries, pipeline objects, command encoders, buffers etc.) and inability to directly invoke new kernels from kernels.

What I would really like to see is a framework that cleanly separates between the CPU (serial) and GPU (parallel) code, but makes the IPC layer mostly go away by promoting it to first class citizens. That is, separate source files with shared interface declarations (like Metal is now), but GPU kernels are linked as first class function-like objects that can be invoked directly from the CPU code, without plumbing or setup. No special buffer objects or GPU side allocations, no queues or encoders (of course, the API should be adaptive, letting you use the base primitives if you need more performance or explicit synchronisation). But at the base level, invoking the GPU should be as easy as using Grand Central Dispatch. @Apple, if you are reading this and are interested, give me a call, we can discuss details :D
As I've said before, current Metal is constrained by the need for AMD compatibility...

With the Mac Pro that goes away, so I'm guessing Apple will be fairly aggressive (three, four years?) in rolling out Metal 5 (or Metal Pro whatever) where they can get rid of a lot of this historical cross-device baggage.
 

bcortens

macrumors 65816
Aug 16, 2007
1,324
1,796
Canada
As I've said before, current Metal is constrained by the need for AMD compatibility...

With the Mac Pro that goes away, so I'm guessing Apple will be fairly aggressive (three, four years?) in rolling out Metal 5 (or Metal Pro whatever) where they can get rid of a lot of this historical cross-device baggage.
This is partially why I don't think Vulkan is a good choice. Metal allows Apple to tailor the API to their GPU and platform architecture in a way Vulkan will never allow. Vulkan will always have to take into account non-uniform memory architectures as well as Immediate mode rendering systems.
 

name99

macrumors 68020
Jun 21, 2004
2,407
2,308
The correct question is why codec naming changed from h.xxx to MPEG.



Like I said, you can be naive and always believe the official account. Or you can listen to people who were actually part of the process...

[IBM long term support]
Do you have a link to that?
Here's an example:

IBM is all over the place, in part because the names are unfamiliar, but also in part because their clients are of the sort where, if the Pentagon or the Fed ask (and pay...) to maintain OS/360 for another ten years, IBM's not going to say no...
 
  • Like
Reactions: Xiao_Xi

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Like I said, you can be naive and always believe the official account. Or you can listen to people who were actually part of the process...
What would you give more credibility to: a Wikipedia page or someone's story from a forum?
 

leman

macrumors Core
Oct 14, 2008
19,516
19,664
As I've said before, current Metal is constrained by the need for AMD compatibility...

With the Mac Pro that goes away, so I'm guessing Apple will be fairly aggressive (three, four years?) in rolling out Metal 5 (or Metal Pro whatever) where they can get rid of a lot of this historical cross-device baggage.

I have my doubts whether, for example, Metal's current inability to share virtual address space between CPU and GPU is because Apple has also to support Intel and AMD. So far Metal has no problem providing different feature sets — and different behaviour — on Apple GPUs and third-party GPUs. Undoubtedly, removing support for Intel/AMD will allow Apple to move more freely, but I would guess that there is something about the very architecture of Apple GPUs that makes unified virtual memory not feasible at this point...

This is partially why I don't think Vulkan is a good choice. Metal allows Apple to tailor the API to their GPU and platform architecture in a way Vulkan will never allow. Vulkan will always have to take into account non-uniform memory architectures as well as Immediate mode rendering systems.

Vulkan is full of compromises and I really have to scratch my head at some of their choices. For example, the new descriptor buffer extension lets implementations choose arbitrary sizes for various resource descriptors, which makes handling addresses in the user code a total nightmare. They could have fixed the size at 8 bytes (like Apple did) or 16 bytes (to also include AMD's extended texture descriptors) and still cover 99.99% of all interesting hardware save for some weird Chinese smartphones that seem to have 64-byte data pointers and will never be used with bindless rendering to begin with...
 

leman

macrumors Core
Oct 14, 2008
19,516
19,664
What would you give more credibility to: a Wikipedia page or someone's story from a forum?

I have recently encountered firsthand how frustrated Wikipedia can be. The M1 and M2 pages have some obviously invalid information about the GPU. I have tried fixing it, but my edits were promptly reverted with the reasoning that my sources are not authoritative enough. When I pointed out that there is no authoritative source because the manufacturer does not release the info and that the information on Wikipedia also doesn't have any source, the reply was basically "yes, but it's already there, so it is more valuable than expert opinion or reverse-engineered Linux drivers". I then opened a discussion which has been ignored.

The point is, Wikipedia is good for some stuff. It's not that good for some other stuff. If @name99 says he has first-hand knowledge of the codec standardisation process, I have no reasons to doubt him. Besides, his work on QuickTime and MPEG is a matter of public record.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Why would this be the ultimate goal? Why not let the technology develop at its own pace?
To avoid situations like you described:
My personal problem with all these frameworks that let you mix CPU and GPU code (aka C++ embedded DSLs) is that they look convenient on paper, but lock you in into a specific compiler dialect and tooling and remove flexibility.
It is very difficult to adopt standards when they are not established from the beginning. When do you think all countries will adopt the same plugs and sockets? or drive in the same direction?

One particular example I have in mind is const generics. The talent was rushing to ship async because of the pressure of the corporate backer
Although const generics are great, I think async/await has more influence in making Rust a better alternative to C in networking projects. I have used C, because for many, Rust can replace C in many projects.

If @name99 says he has first-hand knowledge of the codec standardisation process, I have no reasons to doubt him.
Unless I can read something about it, I have no reason to believe him either.
 

leman

macrumors Core
Oct 14, 2008
19,516
19,664
To avoid situations like you described:

It is very difficult to adopt standards when they are not established from the beginning. When do you think all countries will adopt the same plugs and sockets? or drive in the same direction?

No, this is not what I was talking about. My criticism is not of standards or lack of standards, but the very idea of embedded C++ DSLs. The fragmentation I am talking about is not about having multiple standards, but the fragmentation of C++ ecosystem and programming in general. Even if there is a super standards everyone agrees upon, you will still have to deal with a specialised compiler and its C++ dialect. And even if (by some divine intervention) this eDSL standard gets integrated into official C++ you still have a problem if you want to use your own language.

I want to keep the CPU and GPU code strictly separate (using different language dialects where appropriate), while focusing on ease of interfacing and flexible tooling support, not making one fragile unwieldy mega-tool like SYCL.

Although const generics are great, I think async/await has more influence in making Rust a better alternative to C in networking projects. I have used C, because for many, Rust can replace C in many projects.

There is no doubt that the async push made it better for servers, I just believe these kind of pushes should not happen at the expense of the base language. But I am ranting. As to C, it hasn't been a good choice for server backends in over a decade, so I am not even sure why you are mentioning it. Most backend stuff these days runs either on Java or JavaScript.

Unless I can read something about it, I have no reason to believe him either.

Well, look it up. He is a former Apple engineer who worked on Quicktime/MPEG and the author of the most detailed analysis of Apple Silicon architecture. You will be hard pressed to find a more credible user on MR.
 
  • Like
Reactions: EugW and jdb8167

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Even if there is a super standards everyone agrees upon, you will still have to deal with a specialised compiler and its C++ dialect.
I doubt you would say that if CPU and GPU programming were as integrated as synchronous and asynchronous programming is in some languages.

As to C, it hasn't been a good choice for server backends in over a decade, so I am not even sure why you are mentioning it. Most backend stuff these days runs either on Java or JavaScript.
I didn't mean the server backend like Node.js, but the server infrastructure like Apache Server and Nginx (both written in C). Some companies are using Rust to replace those kinds of tools.

Carl Lerche, a principal AWS engineer, says Rust and Tokio [an asynchronous runtime] give AWS the ability to write services that respond fast, reliably, and that help us offer a better customer experience.
 

Romain_H

macrumors 6502a
Sep 20, 2021
520
438
No, this is not what I was talking about. My criticism is not of standards or lack of standards, but the very idea of embedded C++ DSLs. The fragmentation I am talking about is not about having multiple standards, but the fragmentation of C++ ecosystem and programming in general. Even if there is a super standards everyone agrees upon, you will still have to deal with a specialised compiler and its C++ dialect.
Clang supports CUDA, iirc. Haven't tried it yet, though.

My issue (talking CUDA) is integration into development environments. Took me 3 days or so to get it integrated in my IDE (Qt Creator / C++).
Not sure how it would, e.g., bond with IntelliJ or *insert your IDE here*.

Maybe just me, but I found intrgration nightmare-ish
 

leman

macrumors Core
Oct 14, 2008
19,516
19,664
I doubt you would say that if CPU and GPU programming were as integrated as synchronous and asynchronous programming is in some languages.

I believe this is a pipe dream. CPUs and GPUs are different devices that require different mindsets and algorithmic approaches. They should not be integrated.

Of course, it's just my opinion and I acknowledge that mixing multi-purpose and multi-device code has become popular. I have tried multiple such frameworks (from CUDA to web frameworks like SvelteKit) and they all leave bitter taste in my mouth. Just too much complexity and difficult to reason about. Also, inflexible.

Clang supports CUDA, iirc. Haven't tried it yet, though.

Yeah, it's arguably a saner approach than nvcc because at least everything goes though the same parser. And yet, I don't really understand why there has to be one tool that does it all. This neat modern already breaks down if I want to use a different host language than C++. If instead the model cleanly separated the CPU and GPU codebase while defining a standard C ABI for invoking compiled kernels, interfacing with GPU code simply becomes an exercise in C FFI and integration with host code boils down to C FFI integration + additional build step for GPU code. Less complexity, cleaner overall, more flexibility.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
This neat modern already breaks down if I want to use a different host language than C++. If instead the model cleanly separated the CPU and GPU codebase while defining a standard C ABI for invoking compiled kernels, interfacing with GPU code simply becomes an exercise in C FFI and integration with host code boils down to C FFI integration + additional build step for GPU code.
Why would Nvidia agree to do that?

it's arguably a saner approach than nvcc because at least everything goes though the same parser.
There is no formal CUDA spec, and clang and nvcc speak slightly different dialects of the language.

Lack of open standards hinders competition and progress.

I want to keep the CPU and GPU code strictly separate (using different language dialects where appropriate), while focusing on ease of interfacing and flexible tooling support, not making one fragile unwieldy mega-tool like SYCL.
In this regard, Michael Wong, Head of the Canadian Delegation for the ISO C++ Standard, once said:
You cannot type check between code that is running on the CPU and this code that is running on the GPU.

Out of curiosity, what do you think of the C++ executors?
 
Last edited:

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Now that Chips and Cheese does posts about ARM cores, I can see the possibility of them writing about Apple cores.

Anyway, how good are their blog posts?
 

thenewperson

macrumors 6502a
Mar 27, 2011
992
912
Anyway, how good are their blog posts?
Well, they’re long at least ¯\_(ツ)_/¯

Anyway, here’s hoping they do plan to step up to fill the gap left by Andrei leaving AT. Until the person that writes about them gets hired by some chip company again 😒
 
  • Haha
Reactions: Xiao_Xi
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.