Mac Pro should use NVIDIA cards

deconstruct60 · Feb 26, 2024

Xiao_Xi said:
....

AMD's main problem is its poor support for desktop GPUs.
View attachment 2353120

System requirements (Linux) — ROCm installation (Linux)

System requirements for AMD ROCm

rocm.docs.amd.com

Last generation AMD had a desktop MI210 that was 'cut down' to be a better workstation CNDA options. This MI300 generation they went extra big. So there is no PCI-e card at all. I have a suspicion that wasn't always the 'plan'. The problem for AMD is that they are trying to get this all "off the ground and running while making money" . It likely isn't "poor support" as much as they don't want to lower the margin and give away lots of 'almost free' cards.

The more divergent base level silicon architectures that RoCm has to target , the slower the path to a more optimized product. [ It is most "mythical man month" to just claim that can just hire more bodies and throw more people at at and the software project will mature faster. That typically doesn't work for several reason. Intel hired a whole horde of folks to do their dGPUs drivers. That didn't really work well? Really not surprising viewed from outside the wealthy groupthink bubble inside Intel. Intel has blown a giant money pit hole in the ground here. AMD cannot afford the size of the hole that Intel dug; even if they wanted to try. ]

AMD has somewhat painted themselves into a corner though of not having a 'normal' PCI-e card option for the MI300 series. Nvidia has also bought themselves lots of market share by handing out "free" cards to researchers. AMD is constraining it to the top end of the mainstream GPUs for both least amount of fratricide ( on the much higher priced options) and keeping the architecture expansion the same. ( XTX, XT, and GRE are all the exact same chiplets in different combos and configuration settings).

AMD has to grow bigger in order to do broader array of stuff. Too broad , too soon is a major part of what kept AMD in second place versus Intel for a long time. RoCm working extremely well on MI300 matters way more than running on the consumer stuff.

The 7900 GRE card is going to sell for around $549. Relative to the MI200/MI300 prices, that is a far, far more accessible. Not accessible to home hobbyist on a tight budget, but that isn't really who AMD needs to get to in the short-intermediate term. It is also low enough that AMD probably should consider handing a fair number of those out for free to open source and AI/ML library maintainers. ( give away one $8,000 MI card or 14 7900 GRE , which one is going to have more ecosystem building impact? It is already a higher number. Going even cheaper to grow the freebie count gets into diminishing returns zone pretty quick. ). RoCm's library synergy needs to get broader so the MI300 -- MI400 stuff has more traction.

Also isn't going to match the "go big on VRAM" approach of MI300 to sink lower than 16GB of VRAM capacity threshold.

The other major factor though is that Microsoft has their own generic ML libaries ML APIs that AMD also needs to keep up with. Getting the mainstream GPUs covered by that should be a much higher priority than RoCm. The overwhelming vast majority of those cards are going to get deployed with Windows. Missing what may be perceived as an essential Windows API is a huge potential blow to traction. ( if AMD lets Qualcomm and/or Nvidia do a better job here, that will be losses in more than just GPU space. )

Xiao_Xi · Feb 26, 2024

deconstruct60 said:
if AMD lets Qualcomm and/or Nvidia do a better job here, that will be losses in more than just GPU space.

So it seems. Yesterday Qualcomm uploaded 84 models to Hugging Faces.

Qualcomm Continues to Bring the Generative AI Revolution to Devices and Empowers Developers with Qualcomm AI Hub - Edge AI and Vision Alliance

Highlights: Qualcomm now enables at-scale on-device AI commercialization across next-generation PCs, smartphones, software-defined vehicles, XR devices, IoT and more – bringing intelligent computing everywhere. Qualcomm AI Hub offers 75+ optimized AI models for Snapdragon and Qualcomm platforms...

www.edge-ai-vision.com

It has more models than AMD.

qualcomm (Qualcomm)

We’re scaling AI to create new possibilities.

huggingface.co

amd (AMD)

Org profile for AMD on Hugging Face, the AI community building the future.

huggingface.co

I imagine Qualcomm used Nvidia GPUs to train them.

theorist9 · Feb 26, 2024

leman said:
Besides, AMD's strategy appears to be direct CUDA support, which would make a separate TF/PyTorch backend unnecessary (at least to a certain degree).

Is this a new agreement between NVIDIA and AMD? CUDA is one of NVIDIA's main selling points for those doing GPU compute, so I'm surprised NVIDIA would enter an agreement with AMD to allow CUDA to run directly on AMD GPU's. That would be akin to Apple allowing MacOS to run on non-Mac systems.

I've read about this project to port CUDA code to run on AMD GPU's, but that's different, since in this case the AMD GPU's wouldn't be running CUDA directly--instead, the CUDA code would be translated into something that could run on the AMD GPU's:

AMD HIP SDK Now Available: Making CUDA Applications Run Across Consumer, Pro GPUs & APUs

AMD's HIP SDK is now available as a part of the ROCm ecosystem bringing CUDA support for professional and consumer GPUs.

wccftech.com

leman · Feb 26, 2024

theorist9 said:
I've read about this project to port CUDA code to run on AMD GPU's, but that's different, since in this case the AMD GPU's wouldn't be running CUDA directly--instead, the CUDA code would be translated into something that could run on the AMD GPU's:

AMD HIP SDK Now Available: Making CUDA Applications Run Across Consumer, Pro GPUs & APUs

AMD's HIP SDK is now available as a part of the ROCm ecosystem bringing CUDA support for professional and consumer GPUs.

wccftech.com

That’s what I mean, yes.

theorist9 · Feb 26, 2024

leman said:
That’s what I mean, yes.

Could the same thing be done with AS, or is AS's unified architecture too different?

And would Apple want to do this, or is it more interested in pushing its own GPU-compute API?

leman · Feb 26, 2024

theorist9 said:
Could the same thing be done with AS, or is AS's unified architecture too different?

And would Apple want to do this, or is it more interested in pushing its own GPU-compute API?

I think a few days ago I wrote a post just about this (don’t remember in which thread). CUDA has a number of features that are not supported by modern Metal, so no until those incompatibilities are sorted out.

And I doubt that Apple would be interested in doing something like this. They are more likely to offer you some porting tools (like similar APIs) and ask you to use Metal instead. But a third-party group could fairly quickly build a ptx to Metal compiler.

diamond.g · Feb 26, 2024

leman said:
I think a few days ago I wrote a post just about this (don’t remember in which thread). CUDA has a number of features that are not supported by modern Metal, so no until those incompatibilities are sorted out.

And I doubt that Apple would be interested in doing something like this. They are more likely to offer you some porting tools (like similar APIs) and ask you to use Metal instead. But a third-party group could fairly quickly build a ptx to Metal compiler.

This thread, because I asked the same question.

deconstruct60 · Feb 26, 2024

LangdonS said:
You can always just buy a 2019 Mac Pro.
They will be supported for a LONG time.

'LONG' time ? Which definition of "long" is that? The MP 2019 was 3.5 years old when it got superseded/replaced. That isn't going to 'add' to its longevity. (e.g., the MP 2013 dried up relatively quick after being replaced in 2019 in part due to it hyper-extended service life 9 years. ).

For last 3 years, Apple has been chopping off Intel Macs primarily at 5 years after they were replaced. There is likely no 6-7+ service lifetime coming.

Pretty good chance Apple started the obsolete/vintage countdown clock at the end of 2022 when they originally scheduled to finished. At best, likely have until 2027. 3 years isn't 'short' time, but isn't an emphasized "LONG" either.

The Mac Pro took longer than they wanted, but unlikely they are going to put an extension on because they missed their deadline. ( the bulk of the Intel systems did hit their marks and are relatively rapidly winding down. The Mac Pro ( and upper quartile, or less, Minis ) are not likely viable by themselves. Apple positioned the Mac Studio to 'eat into' the bottom quarter of the Mac Pro space. Pretty good chance Mac Pro sales were sagging toward the end. And substantially more early buyers just bought 6000-series AMD GPUs to 'run out the clock' ... which also meant fewer Mac Pro sales. )

The MP 2009-10-12 leveraged several quirks to slow roll into obsolete status. Primarily, that was other newer Intel Macs extending dGPU support to newer GPU family models. That is completely over. The MP 2019 got GPU upgrades while it was alive, but that whole 'engine' has completely evaporated. To project what happened with the 2010 model to the 2019 model is deeply misguided. Hackery native firmware upgrades like 2010... nope. new GPUs ... nope (at least in MacOS). New models using T2 (or a new T3) ... nope. Other Intel CPU cores extended drivers.. nope. same kernel extension model ... nope Kext is deprecated and likely will not live past Intel macOS dropping off (if not sooner. Got to deprecation status a year before M-series even arrived. WWDC 2019. Wouldn't be surprising if they died before Intel updates stopped; same offset from the CPU transition starting. ). By 2027 anyone serious supporting drivers are going to be on System extensions not kext and very likely Arm System extensions becuase that is overwhelmingly who is buying the new devices that need drivers. (drivers largely get 'sold' bundled to new hardware. )

Windows 10 ? Nope. ( although Microsoft seems likely to sell extended time for money ). Tap dancing to Windows 11 isn't technically supported, but it will 'happen to work' for a while longer. When Windows 12 comes, it is probably out (especially if Microsoft throws some minimal NPU threshold on it along with even more rigid TPM/Pluton requirements. ).

Will some folks squat on the MP 2019 for a 'LONG time' ? Probably, but not on official macOS technical support.

deconstruct60 · Feb 26, 2024

theorist9 said:
Is this a new agreement between NVIDIA and AMD? CUDA is one of NVIDIA's main selling points for those doing GPU compute, so I'm surprised NVIDIA would enter an agreement with AMD to allow CUDA to run directly on AMD GPU's. That would be akin to Apple allowing MacOS to run on non-Mac systems.

More akin to Apple or Micorosoft asking for Intel/AMD's permission to do x86 recompilers. Apple didn't ask.
Apple also didn't do the 'whole thing' either. ( skipped virtualization , kernel , etc. )

Just going to skip duplicating all of the source code movement and conversion optimizations and just look at finished binary. ( any target mode settings on CUDA compiler for specific GPU features (e.g., data movement over NVLink or NVLink clusters can just skip. )

deconstruct60 · Feb 26, 2024

theorist9 said:
And would Apple want to do this, or is it more interested in pushing its own GPU-compute API?

If Apple was less hotile toward OpenCL and/or Sycl they could attract some 3-party compute accelerator to the platform. ( pretty hard to bring RoCm or OpenAPI into the loop or get them to move into a more open direction when blowing up the foundation layers there. ) Apple doesn't want to contribute to alleviating the CUDA-swamp problem either. Not just CUDA but any open solution either.

zakarhino · Feb 26, 2024

turbineseaplane said:
No kidding..

I recently got a 4070 Super FE

What an absolute beast (totally sweet industrial design also)

It's hilarious to me that some folks (like Rene Ritchie) were claiming that Apple was going to trounce NVIDIA on GPUs

Yeah -- umm... "no" and it's not close

Rene Ritchie tries too hard to make the mundane sound miraculous. He also refuses to outright criticize Apple, rather he tries to add nuance to something that really doesn't need it. One of the funniest things ever was when YouTube hired him as "Community Liason" because they recognized his unique ability to run prose laden PR-disguised-as-critique for Apple would be useful in communicating YouTube's monthly decision to make the platform worse for everyone.

I haven't seen his take on Vision Pro but if I were a betting man I would assume he's declared it as a technological return of the Mahdi.

JordanNZ · Feb 26, 2024

deconstruct60 said:
If Apple was less hotile toward OpenCL and/or Sycl they could attract some 3-party compute accelerator to the platform. ( pretty hard to bring RoCm or OpenAPI into the loop or get them to move into a more open direction when blowing up the foundation layers there. ) Apple doesn't want to contribute to alleviating the CUDA-swamp problem either. Not just CUDA but any open solution either.

Apple DONATED OpenCL to Khronos… After what happened there, I don’t blame them for wanting nothing to do with any more open standards by committee when it comes to ‘competing with Nvidia’.

leman · Feb 26, 2024

JordanNZ said:
Apple DONATED OpenCL to Khronos… After what happened there, I don’t blame them for wanting nothing to do with any more open standards by committee when it comes to ‘competing with Nvidia’.

Yeah, I agree with this. Khronos really messed things up big time. Apple is actively engaged in open standards where it matters (e.g. WebGPU). Vulkan is awful, so I’m not surprised they are not interested. If Khronos actually shipped a decent graphics API instead of this over complicated nonsense, things might have been different.

Agent007 · Feb 26, 2024

Bro, your shares have gone up enough, stop trying to pump them lol

Xiao_Xi · Feb 26, 2024

leman said:
Khronos really messed things up big time. Apple is actively engaged in open standards where it matters (e.g. WebGPU).

Those meetings must have been fun.

Apple is not comfortable working under Khronos IP framework, because of dispute between Apple Legal & Khronos which is private. Can’t talk about the substance of this dispute. Can’t make any statement for Apple to agree to Khronos IP framework.

Agenda / Minutes for GPU Web meeting 2019-12-09

GPU Web 2019-12-09 Chair: Corentin Scribe: Ken Location: Google Meet TL;DR Neil Trevett presented slides discussing how SPIR-V could be used by WebGPU and under which constraints. Lots of good discussion on the various aspects of what would happen if WebGPU used SPIR-V. See detailed minutes. Ten...

docs.google.com

leman · Feb 27, 2024

Xiao_Xi said:
Those meetings must have been fun.

I hope one day we will know what this is about…

MRMSFC · Feb 27, 2024

leman said:
Vulkan is awful, so I’m not surprised they are not interested.

can you expand on this? I’m not disagreeing but I’m interested in hearing more about it.

leman · Feb 27, 2024

MRMSFC said:
can you expand on this? I’m not disagreeing but I’m interested in hearing more about it.

Obviously I am being a bit facetious here. A lot of very serious people do serious work with Vulkan. It's just that I fundamentally disagree with pretty much everything this API stands for. It's not an API for users, it's an API for driver and large engine developers. Just a few things that I really dislike about Vulkan:

- it makes everything much more complex than it could have been
- it introduces multiple level of indirection and complexity to accommodate niche hardware; this punishes the majority of users for no reason at all — I just don't understand why a modern API needs to assume that a size of hardware pointer can be arbitrary, just fix it to 16 bytes which will cover 99.9% of cases
- there is still no sane shading language — almost all modern hardware has support for pointers, just give them to me without some weird syntax that has to accommodate historical accidents; Apple had pointers and compile-time programming since forever

deconstruct60 · Feb 29, 2024

leman said:
- it introduces multiple level of indirection and complexity to accommodate niche hardware; this punishes the majority of users for no reason at all — I just don't understand why a modern API needs to assume that a size of hardware pointer can be arbitrary, just fix it to 16 bytes which will cover 99.9% of cases

What would GPUs need with a 128 bit pointer? 64 bits ( 8 bytes) can address 64 EiB of address space (millions of TB) . 2^48 is about the 281 trillion ballpark. For a 64 bit pointer size, that still leaves 14-16 bits for any non address data want to stuff in there. If need another 64 bits of non address data ... that is a struct , not a pointer.

I suspect it isn't as much about 'arbitrary', but not settling on 32 or 64 bits ( 4 or 8 bytes) . Same transition issue that CPUs went through. GPUs normally paired up with 32 bit CPUs there is likely more synergy with 32 bit GPUs. Ditto for 64 bits.
Vulkan objects are referenced via handles.

leman · Mar 1, 2024

deconstruct60 said:
What would GPUs need with a 128 bit pointer? 64 bits ( 8 bytes) can address 64 EiB of address space (millions of TB) . 2^48 is about the 281 trillion ballpark. For a 64 bit pointer size, that still leaves 14-16 bits for any non address data want to stuff in there. If need another 64 bits of non address data ... that is a struct , not a pointer.

I suspect it isn't as much about 'arbitrary', but not settling on 32 or 64 bits ( 4 or 8 bytes) . Same transition issue that CPUs went through. GPUs normally paired up with 32 bit CPUs there is likely more synergy with 32 bit GPUs. Ditto for 64 bits.
Vulkan objects are referenced via handles.

Here I was talking specifically about the EXT_descriptor_buffer extension. It's a recent functionality proposed for Vulkan where you can describe bindings to GPU resources such as buffers and textures in memory directly (as an opaque-layout struct) instead of going via the verbose and often difficult to handle descriptor pool/set API abstraction. Basically, the idea is that you define structs like

Code:

struct material {
  texture color;
  texture relief;
  buffer  additional_data;
}

which you can then populate with resource information and copy to the GPU memory. Much more convenient than the usual descriptor pool/set dance, and also much more flexible — and the way Metal has been doing thing since 2017.

Only of course Vulkan designers had to make it "hardware-friendly" and decided that the size of the descriptor is driver-specific. That is, you can't really use a struct, you have to do size and offset computation manually. Instead of writing

Code:

bindings.additional_data

you have to query the descriptor size and do

Code:

(buffer*)(bindings + 2*texture_descriptor_size)

, and I am not 100% clear whether you also have to take care of padding rules and other stuff. In other words, have fun with more complex data definitions.

According to Vulkan driver database (vulkan.gpuinfo.org) the size of descriptor for a storage buffer (your basic GPU-visible memory allocation) ranges from 16 bytes to 280 bytes(!!!!). Why the hell does an implementation need 280 bytes to describe a pointer to GPU memory is beyond me. Uniform buffers (a kind of GPU-side memory the is usually faster to access) uses a different descriptor size (obviously) that ranges from 8 to 280 bytes. Finally, texture descriptors are between 4 and yes, you guessed it, 280 bytes long.

I just don't understand why one has to do things like that. The authors of the exception claim that this will help accommodating different types of hardware as well as improve performance. To me it seems like just fixing everything to 16 bytes and telling exotic implementations to deal with it via lookup tables or some other means would work just as well in practice and result in a much more sane programming model. I mean, if some weird smartphone GPU needs 200+bytes to just describe a data buffer, are you really going to use it for high-performance applications where complex resource tables are required? No, you are not. So why does the API need to support this kind of weirdness?

How does Apple do it? Everything is 64-bit. If the hardware can accommodate this directly, great. If not, the driver has to introduce an additional level of indirection. Frankly, I'd take an extra load instruction any time if it means that I can write sane code.

Xiao_Xi · Mar 1, 2024

leman said:
It's not an API for users, it's an API for driver and large engine developers.

It is normal, these are the main contributors to Vulkan.

Nvidia has created a new shading language.

GitHub - shader-slang/slang: Making it easier to work with shaders

Making it easier to work with shaders. Contribute to shader-slang/slang development by creating an account on GitHub.

github.com

You may find this presentation interesting.

leman · Mar 1, 2024

Xiao_Xi said:
It is normal, these are the main contributors to Vulkan.

Well, I don't like it. I think an API should be accessible to developers. Vulkan was supposed to be a new interface that makes GPU programming easier and more efficient, not more cumbersome and expensive. It was supposed to replace the morally obsolete OpenGL as a new way for developers to harness the power of GPUs. Instead we still have people advocating for using OpenGL because it's simpler. This is just a bad way to do things.

Xiao_Xi said:
Nvidia has created a new shading language.

GitHub - shader-slang/slang: Making it easier to work with shaders

Making it easier to work with shaders. Contribute to shader-slang/slang development by creating an account on GitHub.

github.com

Great, so they've remade the Metal shading language with some elements of Go and CUDA. And they still have to use weird things like "opaque types" for buffer bindings instead of pointers to work around Vulkan's idiosyncrasies.

sunny5 · Mar 1, 2024

It's just better for Apple to develop their own graphic cards with their own chip but the problem is it's SoC based so their chip design is totally limited.

Chuckeee · Mar 1, 2024

sunny5 said:
It's just better for Apple to develop their own graphic cards with their own chip but the problem is it's SoC based so their chip design is totally limited.

That would be very much an Apple mindset, and totally the wrong approach. Apple always seems to think that if they make the really good innovative hardware, the software developers will come flocking to our platform, that’s just magical thinking that has been shown not to work over and over again. I’m afraid an apple specific graphics card would be just the same.

sunny5 · Mar 1, 2024

Chuckeee said:
That would be very much an Apple mindset, and totally the wrong approach. Apple always seems to think that if they make the really good innovative hardware, the software developers will come flocking to our platform, that’s just magical thinking that has been shown not to work over and over again. I’m afraid an apple specific graphics card would be just the same.

iOS 18: AI Server Industry Aiming to Win Business From Apple

AI server makers are hoping to obtain orders from Apple ahead of its highly anticipated unveiling of new AI features later this year, according to...

www.macrumors.com

BUT... Apple needs to make their own server, sooner or later, especially for AI. Currently, only Nvidia has the power of AI on both hardware and software and therefore, Apple has to buy x86 and Nvidia based servers which is quite ironic. Apple has neither hardware and software and yet how can they even enter the AI market? At least server is what they can justify to make.

b3cfff26-6195-4f21-a7cc-55f7f16af7a_cc632c98-a01c-4a8f-867a-1ccf16244f47-prv.jpg

The only problem is just SoC which is extremely inefficient for laptop, desktop, workstation, and server in terms of chip design. At least Nvidia made separate CPU and GPU on a large board. But since Apple can only make Max chip, that's a huge problem. That's why they need to develop chiplet design asap just like AMD and Intel are doing.

This is also why Apple needs to keep making Mac Pro workstation, not like Mac Pro 2023. It is useful for server and Nvidia actually made both server and workstation all together. Apple really needs to make Mac Pro with superior hardware again in order to start working on software and its ecosystem or otherwise, Mac will be too limited.

Mac Pro should use NVIDIA cards

macrumors G5

macrumors 68000

macrumors 601

macrumors Core

macrumors 601

macrumors Core

macrumors G5

macrumors G5

macrumors G5

macrumors G5

Contributor

macrumors 6502a

macrumors Core

macrumors 6502

macrumors 68000

macrumors Core

macrumors 6502

macrumors Core

macrumors G5

macrumors Core

macrumors 68000

macrumors Core

Suspended

macrumors 68040

Suspended

Our Staff