Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
....


AMD's main problem is its poor support for desktop GPUs.
View attachment 2353120

Last generation AMD had a desktop MI210 that was 'cut down' to be a better workstation CNDA options. This MI300 generation they went extra big. So there is no PCI-e card at all. I have a suspicion that wasn't always the 'plan'. The problem for AMD is that they are trying to get this all "off the ground and running while making money" . It likely isn't "poor support" as much as they don't want to lower the margin and give away lots of 'almost free' cards.

The more divergent base level silicon architectures that RoCm has to target , the slower the path to a more optimized product. [ It is most "mythical man month" to just claim that can just hire more bodies and throw more people at at and the software project will mature faster. That typically doesn't work for several reason. Intel hired a whole horde of folks to do their dGPUs drivers. That didn't really work well? Really not surprising viewed from outside the wealthy groupthink bubble inside Intel. Intel has blown a giant money pit hole in the ground here. AMD cannot afford the size of the hole that Intel dug; even if they wanted to try. ]

AMD has somewhat painted themselves into a corner though of not having a 'normal' PCI-e card option for the MI300 series. Nvidia has also bought themselves lots of market share by handing out "free" cards to researchers. AMD is constraining it to the top end of the mainstream GPUs for both least amount of fratricide ( on the much higher priced options) and keeping the architecture expansion the same. ( XTX, XT, and GRE are all the exact same chiplets in different combos and configuration settings).

AMD has to grow bigger in order to do broader array of stuff. Too broad , too soon is a major part of what kept AMD in second place versus Intel for a long time. RoCm working extremely well on MI300 matters way more than running on the consumer stuff.

The 7900 GRE card is going to sell for around $549. Relative to the MI200/MI300 prices, that is a far, far more accessible. Not accessible to home hobbyist on a tight budget, but that isn't really who AMD needs to get to in the short-intermediate term. It is also low enough that AMD probably should consider handing a fair number of those out for free to open source and AI/ML library maintainers. ( give away one $8,000 MI card or 14 7900 GRE , which one is going to have more ecosystem building impact? It is already a higher number. Going even cheaper to grow the freebie count gets into diminishing returns zone pretty quick. ). RoCm's library synergy needs to get broader so the MI300 -- MI400 stuff has more traction.


Also isn't going to match the "go big on VRAM" approach of MI300 to sink lower than 16GB of VRAM capacity threshold.

The other major factor though is that Microsoft has their own generic ML libaries ML APIs that AMD also needs to keep up with. Getting the mainstream GPUs covered by that should be a much higher priority than RoCm. The overwhelming vast majority of those cards are going to get deployed with Windows. Missing what may be perceived as an essential Windows API is a huge potential blow to traction. ( if AMD lets Qualcomm and/or Nvidia do a better job here, that will be losses in more than just GPU space. )
 
Last edited:

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
if AMD lets Qualcomm and/or Nvidia do a better job here, that will be losses in more than just GPU space.
So it seems. Yesterday Qualcomm uploaded 84 models to Hugging Faces.

It has more models than AMD.

I imagine Qualcomm used Nvidia GPUs to train them.
 
  • Like
Reactions: Chuckeee

theorist9

macrumors 68040
May 28, 2015
3,880
3,059
Besides, AMD's strategy appears to be direct CUDA support, which would make a separate TF/PyTorch backend unnecessary (at least to a certain degree).
Is this a new agreement between NVIDIA and AMD? CUDA is one of NVIDIA's main selling points for those doing GPU compute, so I'm surprised NVIDIA would enter an agreement with AMD to allow CUDA to run directly on AMD GPU's. That would be akin to Apple allowing MacOS to run on non-Mac systems.

I've read about this project to port CUDA code to run on AMD GPU's, but that's different, since in this case the AMD GPU's wouldn't be running CUDA directly--instead, the CUDA code would be translated into something that could run on the AMD GPU's:

 
Last edited:
  • Like
Reactions: Chuckeee

leman

macrumors Core
Oct 14, 2008
19,520
19,670
I've read about this project to port CUDA code to run on AMD GPU's, but that's different, since in this case the AMD GPU's wouldn't be running CUDA directly--instead, the CUDA code would be translated into something that could run on the AMD GPU's:


That’s what I mean, yes.
 
  • Like
Reactions: theorist9

theorist9

macrumors 68040
May 28, 2015
3,880
3,059
That’s what I mean, yes.
Could the same thing be done with AS, or is AS's unified architecture too different?

And would Apple want to do this, or is it more interested in pushing its own GPU-compute API?
 

leman

macrumors Core
Oct 14, 2008
19,520
19,670
Could the same thing be done with AS, or is AS's unified architecture too different?

And would Apple want to do this, or is it more interested in pushing its own GPU-compute API?

I think a few days ago I wrote a post just about this (don’t remember in which thread). CUDA has a number of features that are not supported by modern Metal, so no until those incompatibilities are sorted out.

And I doubt that Apple would be interested in doing something like this. They are more likely to offer you some porting tools (like similar APIs) and ask you to use Metal instead. But a third-party group could fairly quickly build a ptx to Metal compiler.
 

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,663
OBX
I think a few days ago I wrote a post just about this (don’t remember in which thread). CUDA has a number of features that are not supported by modern Metal, so no until those incompatibilities are sorted out.

And I doubt that Apple would be interested in doing something like this. They are more likely to offer you some porting tools (like similar APIs) and ask you to use Metal instead. But a third-party group could fairly quickly build a ptx to Metal compiler.
This thread, because I asked the same question.
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
You can always just buy a 2019 Mac Pro.
They will be supported for a LONG time.

'LONG' time ? Which definition of "long" is that? The MP 2019 was 3.5 years old when it got superseded/replaced. That isn't going to 'add' to its longevity. (e.g., the MP 2013 dried up relatively quick after being replaced in 2019 in part due to it hyper-extended service life 9 years. ).

For last 3 years, Apple has been chopping off Intel Macs primarily at 5 years after they were replaced. There is likely no 6-7+ service lifetime coming.

Pretty good chance Apple started the obsolete/vintage countdown clock at the end of 2022 when they originally scheduled to finished. At best, likely have until 2027. 3 years isn't 'short' time, but isn't an emphasized "LONG" either.


The Mac Pro took longer than they wanted, but unlikely they are going to put an extension on because they missed their deadline. ( the bulk of the Intel systems did hit their marks and are relatively rapidly winding down. The Mac Pro ( and upper quartile, or less, Minis ) are not likely viable by themselves. Apple positioned the Mac Studio to 'eat into' the bottom quarter of the Mac Pro space. Pretty good chance Mac Pro sales were sagging toward the end. And substantially more early buyers just bought 6000-series AMD GPUs to 'run out the clock' ... which also meant fewer Mac Pro sales. )

The MP 2009-10-12 leveraged several quirks to slow roll into obsolete status. Primarily, that was other newer Intel Macs extending dGPU support to newer GPU family models. That is completely over. The MP 2019 got GPU upgrades while it was alive, but that whole 'engine' has completely evaporated. To project what happened with the 2010 model to the 2019 model is deeply misguided. Hackery native firmware upgrades like 2010... nope. new GPUs ... nope (at least in MacOS). New models using T2 (or a new T3) ... nope. Other Intel CPU cores extended drivers.. nope. same kernel extension model ... nope Kext is deprecated and likely will not live past Intel macOS dropping off (if not sooner. Got to deprecation status a year before M-series even arrived. WWDC 2019. Wouldn't be surprising if they died before Intel updates stopped; same offset from the CPU transition starting. ). By 2027 anyone serious supporting drivers are going to be on System extensions not kext and very likely Arm System extensions becuase that is overwhelmingly who is buying the new devices that need drivers. (drivers largely get 'sold' bundled to new hardware. )

Windows 10 ? Nope. ( although Microsoft seems likely to sell extended time for money ). Tap dancing to Windows 11 isn't technically supported, but it will 'happen to work' for a while longer. When Windows 12 comes, it is probably out (especially if Microsoft throws some minimal NPU threshold on it along with even more rigid TPM/Pluton requirements. ).

Will some folks squat on the MP 2019 for a 'LONG time' ? Probably, but not on official macOS technical support.
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
Is this a new agreement between NVIDIA and AMD? CUDA is one of NVIDIA's main selling points for those doing GPU compute, so I'm surprised NVIDIA would enter an agreement with AMD to allow CUDA to run directly on AMD GPU's. That would be akin to Apple allowing MacOS to run on non-Mac systems.

More akin to Apple or Micorosoft asking for Intel/AMD's permission to do x86 recompilers. Apple didn't ask.
Apple also didn't do the 'whole thing' either. ( skipped virtualization , kernel , etc. )

Just going to skip duplicating all of the source code movement and conversion optimizations and just look at finished binary. ( any target mode settings on CUDA compiler for specific GPU features (e.g., data movement over NVLink or NVLink clusters can just skip. )
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
And would Apple want to do this, or is it more interested in pushing its own GPU-compute API?

If Apple was less hotile toward OpenCL and/or Sycl they could attract some 3-party compute accelerator to the platform. ( pretty hard to bring RoCm or OpenAPI into the loop or get them to move into a more open direction when blowing up the foundation layers there. ) Apple doesn't want to contribute to alleviating the CUDA-swamp problem either. Not just CUDA but any open solution either.
 

zakarhino

Contributor
Sep 13, 2014
2,611
6,963
No kidding..

I recently got a 4070 Super FE

What an absolute beast (totally sweet industrial design also)

It's hilarious to me that some folks (like Rene Ritchie) were claiming that Apple was going to trounce NVIDIA on GPUs

Yeah -- umm... "no" and it's not close

Rene Ritchie tries too hard to make the mundane sound miraculous. He also refuses to outright criticize Apple, rather he tries to add nuance to something that really doesn't need it. One of the funniest things ever was when YouTube hired him as "Community Liason" because they recognized his unique ability to run prose laden PR-disguised-as-critique for Apple would be useful in communicating YouTube's monthly decision to make the platform worse for everyone.

I haven't seen his take on Vision Pro but if I were a betting man I would assume he's declared it as a technological return of the Mahdi.
 

JordanNZ

macrumors 6502a
Apr 29, 2004
779
290
Auckland, New Zealand
If Apple was less hotile toward OpenCL and/or Sycl they could attract some 3-party compute accelerator to the platform. ( pretty hard to bring RoCm or OpenAPI into the loop or get them to move into a more open direction when blowing up the foundation layers there. ) Apple doesn't want to contribute to alleviating the CUDA-swamp problem either. Not just CUDA but any open solution either.
Apple DONATED OpenCL to Khronos… After what happened there, I don’t blame them for wanting nothing to do with any more open standards by committee when it comes to ‘competing with Nvidia’.
 

leman

macrumors Core
Oct 14, 2008
19,520
19,670
Apple DONATED OpenCL to Khronos… After what happened there, I don’t blame them for wanting nothing to do with any more open standards by committee when it comes to ‘competing with Nvidia’.

Yeah, I agree with this. Khronos really messed things up big time. Apple is actively engaged in open standards where it matters (e.g. WebGPU). Vulkan is awful, so I’m not surprised they are not interested. If Khronos actually shipped a decent graphics API instead of this over complicated nonsense, things might have been different.
 
  • Like
Reactions: AlphaCentauri

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Khronos really messed things up big time. Apple is actively engaged in open standards where it matters (e.g. WebGPU).
Those meetings must have been fun.
Apple is not comfortable working under Khronos IP framework, because of dispute between Apple Legal & Khronos which is private. Can’t talk about the substance of this dispute. Can’t make any statement for Apple to agree to Khronos IP framework.
 
  • Wow
Reactions: Chuckeee

leman

macrumors Core
Oct 14, 2008
19,520
19,670
can you expand on this? I’m not disagreeing but I’m interested in hearing more about it.

Obviously I am being a bit facetious here. A lot of very serious people do serious work with Vulkan. It's just that I fundamentally disagree with pretty much everything this API stands for. It's not an API for users, it's an API for driver and large engine developers. Just a few things that I really dislike about Vulkan:

- it makes everything much more complex than it could have been
- it introduces multiple level of indirection and complexity to accommodate niche hardware; this punishes the majority of users for no reason at all — I just don't understand why a modern API needs to assume that a size of hardware pointer can be arbitrary, just fix it to 16 bytes which will cover 99.9% of cases
- there is still no sane shading language — almost all modern hardware has support for pointers, just give them to me without some weird syntax that has to accommodate historical accidents; Apple had pointers and compile-time programming since forever
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
- it introduces multiple level of indirection and complexity to accommodate niche hardware; this punishes the majority of users for no reason at all — I just don't understand why a modern API needs to assume that a size of hardware pointer can be arbitrary, just fix it to 16 bytes which will cover 99.9% of cases

What would GPUs need with a 128 bit pointer? 64 bits ( 8 bytes) can address 64 EiB of address space (millions of TB) . 2^48 is about the 281 trillion ballpark. For a 64 bit pointer size, that still leaves 14-16 bits for any non address data want to stuff in there. If need another 64 bits of non address data ... that is a struct , not a pointer.

I suspect it isn't as much about 'arbitrary', but not settling on 32 or 64 bits ( 4 or 8 bytes) . Same transition issue that CPUs went through. GPUs normally paired up with 32 bit CPUs there is likely more synergy with 32 bit GPUs. Ditto for 64 bits.
Vulkan objects are referenced via handles.
 

leman

macrumors Core
Oct 14, 2008
19,520
19,670
What would GPUs need with a 128 bit pointer? 64 bits ( 8 bytes) can address 64 EiB of address space (millions of TB) . 2^48 is about the 281 trillion ballpark. For a 64 bit pointer size, that still leaves 14-16 bits for any non address data want to stuff in there. If need another 64 bits of non address data ... that is a struct , not a pointer.

I suspect it isn't as much about 'arbitrary', but not settling on 32 or 64 bits ( 4 or 8 bytes) . Same transition issue that CPUs went through. GPUs normally paired up with 32 bit CPUs there is likely more synergy with 32 bit GPUs. Ditto for 64 bits.
Vulkan objects are referenced via handles.

Here I was talking specifically about the EXT_descriptor_buffer extension. It's a recent functionality proposed for Vulkan where you can describe bindings to GPU resources such as buffers and textures in memory directly (as an opaque-layout struct) instead of going via the verbose and often difficult to handle descriptor pool/set API abstraction. Basically, the idea is that you define structs like

Code:
struct material {
  texture color;
  texture relief;
  buffer  additional_data;
}

which you can then populate with resource information and copy to the GPU memory. Much more convenient than the usual descriptor pool/set dance, and also much more flexible — and the way Metal has been doing thing since 2017.

Only of course Vulkan designers had to make it "hardware-friendly" and decided that the size of the descriptor is driver-specific. That is, you can't really use a struct, you have to do size and offset computation manually. Instead of writing
Code:
bindings.additional_data
you have to query the descriptor size and do
Code:
(buffer*)(bindings + 2*texture_descriptor_size)
, and I am not 100% clear whether you also have to take care of padding rules and other stuff. In other words, have fun with more complex data definitions.

According to Vulkan driver database (vulkan.gpuinfo.org) the size of descriptor for a storage buffer (your basic GPU-visible memory allocation) ranges from 16 bytes to 280 bytes(!!!!). Why the hell does an implementation need 280 bytes to describe a pointer to GPU memory is beyond me. Uniform buffers (a kind of GPU-side memory the is usually faster to access) uses a different descriptor size (obviously) that ranges from 8 to 280 bytes. Finally, texture descriptors are between 4 and yes, you guessed it, 280 bytes long.

I just don't understand why one has to do things like that. The authors of the exception claim that this will help accommodating different types of hardware as well as improve performance. To me it seems like just fixing everything to 16 bytes and telling exotic implementations to deal with it via lookup tables or some other means would work just as well in practice and result in a much more sane programming model. I mean, if some weird smartphone GPU needs 200+bytes to just describe a data buffer, are you really going to use it for high-performance applications where complex resource tables are required? No, you are not. So why does the API need to support this kind of weirdness?

How does Apple do it? Everything is 64-bit. If the hardware can accommodate this directly, great. If not, the driver has to introduce an additional level of indirection. Frankly, I'd take an extra load instruction any time if it means that I can write sane code.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101

leman

macrumors Core
Oct 14, 2008
19,520
19,670
It is normal, these are the main contributors to Vulkan.

Well, I don't like it. I think an API should be accessible to developers. Vulkan was supposed to be a new interface that makes GPU programming easier and more efficient, not more cumbersome and expensive. It was supposed to replace the morally obsolete OpenGL as a new way for developers to harness the power of GPUs. Instead we still have people advocating for using OpenGL because it's simpler. This is just a bad way to do things.

Nvidia has created a new shading language.

Great, so they've remade the Metal shading language with some elements of Go and CUDA. And they still have to use weird things like "opaque types" for buffer bindings instead of pointers to work around Vulkan's idiosyncrasies.
 

sunny5

macrumors 68000
Jun 11, 2021
1,835
1,706
It's just better for Apple to develop their own graphic cards with their own chip but the problem is it's SoC based so their chip design is totally limited.
 

Chuckeee

macrumors 68040
Aug 18, 2023
3,060
8,722
Southern California
It's just better for Apple to develop their own graphic cards with their own chip but the problem is it's SoC based so their chip design is totally limited.
That would be very much an Apple mindset, and totally the wrong approach. Apple always seems to think that if they make the really good innovative hardware, the software developers will come flocking to our platform, that’s just magical thinking that has been shown not to work over and over again. I’m afraid an apple specific graphics card would be just the same.
 

sunny5

macrumors 68000
Jun 11, 2021
1,835
1,706
That would be very much an Apple mindset, and totally the wrong approach. Apple always seems to think that if they make the really good innovative hardware, the software developers will come flocking to our platform, that’s just magical thinking that has been shown not to work over and over again. I’m afraid an apple specific graphics card would be just the same.

BUT... Apple needs to make their own server, sooner or later, especially for AI. Currently, only Nvidia has the power of AI on both hardware and software and therefore, Apple has to buy x86 and Nvidia based servers which is quite ironic. Apple has neither hardware and software and yet how can they even enter the AI market? At least server is what they can justify to make.

b3cfff26-6195-4f21-a7cc-55f7f16af7a_cc632c98-a01c-4a8f-867a-1ccf16244f47-prv.jpg

The only problem is just SoC which is extremely inefficient for laptop, desktop, workstation, and server in terms of chip design. At least Nvidia made separate CPU and GPU on a large board. But since Apple can only make Max chip, that's a huge problem. That's why they need to develop chiplet design asap just like AMD and Intel are doing.

This is also why Apple needs to keep making Mac Pro workstation, not like Mac Pro 2023. It is useful for server and Nvidia actually made both server and workstation all together. Apple really needs to make Mac Pro with superior hardware again in order to start working on software and its ecosystem or otherwise, Mac will be too limited.
 
Last edited:
  • Wow
Reactions: gusmula
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.