My personal problem with all these frameworks that let you mix CPU and GPU code (aka C++ embedded DSLs) is that they look convenient on paper, but lock you in into a specific compiler dialect and tooling and remove flexibility. This is the fragmentation I am talking about. If you write a program that uses CUDA or SYCL, you are not writing a valid C++ program according to the standard. In particular, integrating this into codebases (especially if you want to go cross-platform or ship an app) can create additional headache. I fully understand why Nvidia pushed this with CUDA, after all, they were targeting academia (where people are generally sloppy and always in a rush), and this model is great for locking people in.
I don't have a problem with GPU-specific dialects, after all, they need specialised constructs. Both CUDA and Metal generally do a good job staying within the boundaries of the standard. What I have a problem with is the idea of the "same source" for CPU and GPU code. Personally, I like Apple's approach best — CPU and GPU code is kept as separate languages, but use the same interface headers that describe the shared memory IPC. Where Metal falls short is the need of host plumbing (shader libraries, pipeline objects, command encoders, buffers etc.) and inability to directly invoke new kernels from kernels.
What I would really like to see is a framework that cleanly separates between the CPU (serial) and GPU (parallel) code, but makes the IPC layer mostly go away by promoting it to first class citizens. That is, separate source files with shared interface declarations (like Metal is now), but GPU kernels are linked as first class function-like objects that can be invoked directly from the CPU code, without plumbing or setup. No special buffer objects or GPU side allocations, no queues or encoders (of course, the API should be adaptive, letting you use the base primitives if you need more performance or explicit synchronisation). But at the base level, invoking the GPU should be as easy as using Grand Central Dispatch.
@Apple, if you are reading this and are interested, give me a call, we can discuss details