Agreed, other software tools are now available that are comparable to CUDA. But what about hardware? Does AMD make anything to compete with Volta? Or perhaps a better question is: Are there use cases where NVIDIA's offerings are cleary superior?
Nvidia currently still, mostly, has the superior offerings. There are certain fields Where AMD have an edge, or have developed very specialised cards, but overall Nvidia has the strongest hardware - But that very well could change in the not too distant future and with a programming interface that works on both Nvidia and AMD; And even Intel - don't forget they're working on big accelerators too now (again), you'd have an easier time switching should that time come - which would also put more pressure on Nvidia to continue making better products and keeping prices down, since their position would be more fragile. If everyone is locked to CUDA; Nvidia wouldn't care if their GPUs were 10% slower than competing solutions at everything - they could still charge more, since it's not just the cost of the hardware, but the cost of rewriting things if you've already started in CUDA.
And on that point, while Nvidia is generally superior on performance and perf/watt in most cases, they often cost a lot more than AMD cards, in the performance segments Where AMD have competing products.
Also, how do the AMD graphics cards offered on the Mac Pro compare with AMD's Instinct (which I've read doesn't work on the Mac Pro) for GPGPU computing?
There are several Instinct cards, and to be entirely honest I'm not very familiar with any of them. - I know there are specialised Instinct cards that very heavily focus on certain compute needs, like FP64 of INT8 - For FP64 the Vega cards in the Mac Pro aren't actually that great, so if you need high precision you're a bit out of luck. - Well, at least unless Apple has done something special - On AMD's website it's listed as having rather slow FP64, though I can't remember the specific ratio. but it's a fact that Vega 20 GPUs can have 1:2FP64 performance, so if Apple has permission to unlock the capability it would be good, but I doubt it, since it's not listed on AMD's site under the Vega II, and I think only Apple has the GPU. - The equivalent Instinct card does have FP64 enabled though.
But for FP32, the Vega II is actually slightly faster than the equivalent Instinct card, owing to slightly faster clocks and an additional 4CUs, with the Instinct being similar to the Radeon VII in having 4 CUs disabled (though I think there also was a variant of the Instinct with all 64CUs, but it's basically been limited stock).
In total, it depends what computations you do, but if what we're talking is traditional FP32, the Vega II will battle with any Instinct card.
In entirely other and unrelated strands of thought, AMD also made, quite a while ago now, the Radeon SSG - Solid State Graphics, where they put an entire SSD directly on the GPU, so that data could go straight from the drive into the VRAM without having to go through the processor. I believe that if you work with huge GPU datasets; Like hundreds and hundreds of Gigabytes that need to be processed on a GPU, that card is still one of the fastest options out there, even though the GPU itself isn't as fast as most of Nvidia's or even newer AMD offerings. - While an SSD is of course many times slower than VRAM or system memory, once the dataset becomes large enough, being able to skip going through the CPU helps.
[automerge]1590619370[/automerge]
I think you guys are missing the point, it's not about CUDA itself. It's all the software and support that NVIDIA offers. AMD is more than a decade behind and they're never going to catch up. Let's say I want to do research for autonomous driving, I'll simply throw $1 million (get it cheaper with academic discount) at NVIDIA and I get a full system, simulation, real sensor models, hardware to connect real sensors... I can start within hours. Want to do genetics? Physical simulation? Astro physics? Climate modelling? <insert anything here>? Same thing, place the order for hardware or download the software package of your choice and just start. Having trouble getting things to work or port for GPUs? Visit a NVIDIA super computer center for free, bring your students and NVIDIA will help. Need more power? NVIDA, Dell, Lenovo and other will happily supply GPU clusters. 500, 1000 and more GPUs, no problem, including same day service.
And AMD? (imagine chirping cricket sound here) Nothing! They don't have the software, they don't have the support, they can't supply the hardware in numbers and they don't have the cluster solutions that NVIDIA has. And no, porting software and maintain it yourself is not an option, it's way too time consuming. Researchers want to get work done, so it's up to AMD to supply the necessary tools, both hardware and software.
I bought a Radeon VII for my MBP, just to play around a little here and there. But the real work? Done on Titan RTX for playing around, RTX 8000 in the workstation under the desk for digging a little deeper and V100s in the server cluster for real number crunching. AMD doesn't even try, otherwise they'd hire a few thousand people and put a few billion dollars into this.That ship has sailed.
You are absolutely right. Nvidia has a great stack of software on top of CUDA, like Drive and all of what's packaged under CUDA-X. Similar, but of course much broader and extensive, to Apple's MPS (Metal Performance Shaders) - essentially a library of already written functionality that works with the GPU.
And if you're researching fluid dynamics or doing genetic modelling or something, go for whatever's easiest to get working fast - I get that entirely.
What I'm talking about is much smaller scale than that, but still in academia. For example, one of my ex-TAs is currently doing a Ph.D project on a way of sort of tricking GPUs into doing MIMD instead of just SIMD; In other words, being able to not just do one instruction on a whole bunch of data across a GPU core, but being able to perform several independant instructions on each dataset, by moving around the instruction pointer during execution. It's a bit beyond me at this point and there's no working code yet that fully does it, only theory, but I've asked her to send me sample code when it's working - In any case, it's research on the nature of GPUs. It doesn't require a lot of horsepower, or a massive cluster. University paid for a laptop for her with an Nvidia GPU in it, and it was all but demanded that the research be conducted with CUDA, because "That's the academic standard, and anybody reading the paper will expect CUDA and not care to understand OpenCL or anything else - Make it work in CUDA". - That's what I'm opposed to. The rigidity in the academic community, of which I am part. There may also be many performance sensitive, but not large budget or huge projects where you could get much more for the available resources by going with an open standard and an AMD based solution, but "academic tradition" forces CUDA on you.
CUDA locks people in. Use it when it's appropriate, but I don't like how it's gotten the status of "industry standard" in so many circles, to the point where other tools aren't even considered.
EDIT:
Could you imagine a world in which all universities needed to do everything in C# and no project would ever be allowed to be written in C++, Java, Python, whatever, because some large infrastructure existed around C#?
Again, when you need the scale that Nvidia can offer you - take advantage of it. When you don't, at least consider your options. All I ask for