NVIDIA RTX or bust...

Asgorath · Sep 26, 2018

cube said:
Miners prefer AMD GPUs precisely because of compute performance and price.

Quite often one sees that a comparable AMD card offers about double the FP64 power in the consumer space.

Miners prefer custom ASICs for both Bitcoin and Ethereum mining now, though I didn't realize the Mac Pro forum on MacRumors was a place that focused on cryptocurrency mining.

NVIDIA targets FP64 performance with their TITAN cards. A quick Google search reveals:

https://www.pcper.com/reviews/Graphics-Cards/NVIDIA-TITAN-V-Review-Part-2-Compute-Performance

which lists FP64 performance of the TITAN V at 7.45 TFLOPs (boost clocks) vs 0.85 TFLOPs for the Vega 64. I'm not quite sure how you can claim the Vega cards are better at FP64, anyone who is doing serious FP64 work would just buy a TITAN card (since FP64 workloads are typically in the profesional/prosumer space, not consumer space).

Again, still not sure how you can claim the Turing cards have "very low compute performance" as a general blanket statement.

cube · Sep 26, 2018

Asgorath said:
Miners prefer custom ASICs for both Bitcoin and Ethereum mining now, though I didn't realize the Mac Pro forum on MacRumors was a place that focused on cryptocurrency mining.

NVIDIA targets FP64 performance with their TITAN cards. A quick Google search reveals:

https://www.pcper.com/reviews/Graphics-Cards/NVIDIA-TITAN-V-Review-Part-2-Compute-Performance

which lists FP64 performance of the TITAN V at 7.45 TFLOPs (boost clocks) vs 0.85 TFLOPs for the Vega 64. I'm not quite sure how you can claim the Vega cards are better at FP64, anyone who is doing serious FP64 work would just buy a TITAN card (since FP64 workloads are typically in the profesional/prosumer space, not consumer space).

Again, still not sure how you can claim the Turing cards have "very low compute performance" as a general blanket statement.

Mining is an example, anybody seriously looking for FP64 performance in a consumer card will consider AMD.

There's no AMD equivalent to Titan cards usually (maybe Frontier was at some point, not consumer).

The RTX cards offer puny FP64 power compared to Vega.

GrumpyCoder · Sep 26, 2018

Asgorath said:
NVIDIA targets FP64 performance with their TITAN cards..

Some mining application require SHA256 and therein is the advantage of AMD GPUs which offer better integer performance which in turn is required by the 32 bit right rotation operation required for some algorithms. AMD used be lightyears ahead of NVIDIA on this, but NVIDIA got better as well with their funnel shifter. It's been a while since I looked at this stuff, so you'd have to run some benchmarks to get exact numbers on current generation cards. It's probably the reason why AMD GPUs are still more widely used by miners these days (depends on what you're mining of course).

Asgorath · Sep 26, 2018

cube said:
Mining is an example, anybody seriously looking for FP64 performance in a consumer card will consider AMD.

There's no AMD equivalent to Titan cards usually (maybe Frontier was at some point, not consumer).

The RTX cards offer puny FP64 power compared to Vega.

My point was that the number of people who are seriously interested in FP64 performance on "consumer" cards is vanishingly small. If you are really serious about FP64 performance, then an 8.75x improvement (7.45 vs 0.85) in performance for a 5x increase in price ($3k vs $600) is a no-brainer.

There's a pretty simple reason why NVIDIA's consumer cards don't have FP64 -- no consumer workload uses FP64, so it's just wasted die space to put a ton of FP64 horsepower that will be sitting idle while you're playing a game or doing other consumer workloads.

koyoot · Sep 26, 2018

GrumpyCoder said:
I've said this before, in some cases AMD is faster than NVIDIA, but let's put them roughly on the same level for some professional applications. The problem that remains is, if I have to choose between two similar performing cards, but one is giving me much better gaming performance, which one would I choose?

You have a GPU that costs 700$ and offers 100% reference point of compute performance, and 100% graphics performance. Second GPU has 110% of compute performance, and 80% gaming performance, but costs 500$.

Which one do you pick?

GrumpyCoder said:
And here's another problem. How are we going to write those nice OpenCL programs with OpenCL being deprecated in Mojave? I can't get around CUDA in my research area, some stuff works with OpenCL, but not everything. Metal 2 only is going to be problematic. Again, for some stuff this might work, but how do I scale it to clusters for real number crunching? I guess they have to bring back the Xserve with major GPU support then.

Metal has OpenCL inside itself. It combines OpenCL for compute and OpenGL for graphcis in one command.

And please guys, leave this AMD vs Nvidia circle jerk. Its only your fault that you locked yourselves to CUDA applications. If you are locked to CUDA - there is plenty of options for you - and if you need UNIX ecosystem - Linux is for you.

Asgorath said:
My point was that the number of people who are seriously interested in FP64 performance on "consumer" cards is vanishingly small. If you are really serious about FP64 performance, then an 8.75x improvement (7.45 vs 0.85) in performance for a 5x increase in price ($3k vs $600) is a no-brainer.

There's a pretty simple reason why NVIDIA's consumer cards don't have FP64 -- no consumer workload uses FP64, so it's just wasted die space to put a ton of FP64 horsepower that will be sitting idle while you're playing a game or doing other consumer workloads.

No consumer wokload uses FP64 with Nvidia GPUs

.

All of Hawaii cards had full FP64 performance, regardless whether they were consumer, or professional GPUs. It was up to you to decide whether your optimization will use those cores, or not.

Hooray for open standards!

goMac · Sep 26, 2018

koyoot said:
Metal has OpenCL inside itself. It combines OpenCL for compute and OpenGL for graphcis in one command.

Metal has a compute language, but it's not OpenCL.

(Metal's graphics layer is also very much not the same thing as OpenGL. OpenCL is at least a little closer to Metal's compute language.)

cube · Sep 26, 2018

Asgorath said:
My point was that the number of people who are seriously interested in FP64 performance on "consumer" cards is vanishingly small. If you are really serious about FP64 performance, then an 8.75x improvement (7.45 vs 0.85) in performance for a 5x increase in price ($3k vs $600) is a no-brainer.

There's a pretty simple reason why NVIDIA's consumer cards don't have FP64 -- no consumer workload uses FP64, so it's just wasted die space to put a ton of FP64 horsepower that will be sitting idle while you're playing a game or doing other consumer workloads.

Not every professional can afford a $3K card, that's the point.

Asgorath · Sep 26, 2018

cube said:
Not every professional can afford a $3K card, that's the point.

So what applications that use a lot of FP64 math are these people using?

cube · Sep 26, 2018

Asgorath said:
So what applications that use a lot of FP64 math are these people using?

It could be custom programs.

droog · Sep 26, 2018

RTX smells like Geforce3. Price hike with tech nobody is ready for. Historically, this is when AMD makes strides with price/perf.

koyoot · Sep 26, 2018

droog said:
RTX smells like Geforce3. Price hike with tech nobody is ready for. Historically, this is when AMD makes strides with price/perf.

Price hike is because of die size hike. You are paying 500$ for RTX 2070, which has almost the same die size as 699$ GTX 1080 Ti had.

The other side of this coin is that Pascal is more efficient than Turing in terms of performance/mm2 and in performance/dollar.

Also RTX is nothing new. It is proprietary way for Nvidia to lock more people to CUDA ecosystem, on professional side of things if you are interested in RT. You could do the same thing on ANY GPU for past two years with AMD ProRender Engine implemented in your applications.

But who cared about it at that time? When Nvidia starts to offer something like this it gets enough tracition, even if it essentially inferior product. Because CUDA RT will not work anywhere else than Nvidia GPUs. ProRender works on everything.

The other side of this coin is that Navi will have most likely dedicated hardware(acceleration) for this, but this still remains to be confirmed. And don't expect from Navi to be large die. It will be very small dies, compared to Turing. Big Navi is coming in 2020.

droog · Sep 26, 2018

koyoot said:
Price hike is because of die size hike. You are paying 500$ for RTX 2070, which has almost the same die size as 699$ GTX 1080 Ti had.

The other side of this coin is that Pascal is more efficient than Turing in terms of performance/mm2 and in performance/dollar.

Also RTX is nothing new. It is proprietary way for Nvidia to lock more people to CUDA ecosystem, on professional side of things if you are interested in RT. You could do the same thing on ANY GPU for past two years with AMD ProRender Engine implemented in your applications.

But who cared about it at that time? When Nvidia starts to offer something like this it gets enough tracition, even if it essentially inferior product. Because CUDA RT will not work anywhere else than Nvidia GPUs. ProRender works on everything.

The other side of this coin is that Navi will have most likely dedicated hardware(acceleration) for this, but this still remains to be confirmed. And don't expect from Navi to be large die. It will be very small dies, compared to Turing. Big Navi is coming in 2020.

It really doesn't make a difference for me. I can only afford consumer cards.

Asgorath · Sep 26, 2018

cube said:
It could be custom programs.

Okay, so you're claiming "very low compute performance" on a custom program that requires FP64 for a customer that doesn't want to buy a TITAN card. Fair enough then, such a customer should go and buy a Vega GPU, sure.

If you have specific comparisons of where Turing is significanly worse than Vega, then it's worth highlighting those with as much detail as possible, rather than just making blanket statements like "very low compute performance". You said yourself in the next post that the 2080 Ti's FP32 performance is higher than Vega 64, so it's hard to follow what the usage case you're specifically interested in actually is and why a Turing GPU wouldn't be a good fit for it.

Having said that, it's hard to recommend buying an NVIDIA GPU for macOS at all these days, so this whole discussion is somewhat moot.
[doublepost=1538002489][/doublepost]

droog said:
It really doesn't make a difference for me. I can only afford consumer cards.

And as always, you should buy the GPU you can afford that runs the programs you care about the best. If you're one of the folks who wants good FP64 performance at a $500 price point, great, go and buy a Vega card.

cube · Sep 26, 2018

Asgorath said:
Okay, so you're claiming "very low compute performance" on a custom program that requires FP64 for a customer that doesn't want to buy a TITAN card. Fair enough then, such a customer should go and buy a Vega GPU, sure.

If you have specific comparisons of where Turing is significanly worse than Vega, then it's worth highlighting those with as much detail as possible, rather than just making blanket statements like "very low compute performance". You said yourself in the next post that the 2080 Ti's FP32 performance is higher than Vega 64, so it's hard to follow what the usage case you're specifically interested in actually is and why a Turing GPU wouldn't be a good fit for it.

Having said that, it's hard to recommend buying an NVIDIA GPU for macOS at all these days, so this whole discussion is somewhat moot.

But the 2080 Ti is very expensive, so you could buy a Vega 64 instead of a 2080 for more FP32 power, besides more than double FP64.

koyoot · Sep 26, 2018

cube said:
But the 2080 Ti is very expensive, so you could buy a Vega 64 instead of a 2080 for more FP32 power, besides more than double FP64.

For a price of single 2080 Ti, you can buy two Vega 64's. And three Vega 56's.

cube · Sep 26, 2018

koyoot said:
For a price of single 2080 Ti, you can buy two Vega 64's. And three Vega 56's.

If people are going to put 2,3,4 GPUs in a PC for compute, I guess AMD better bring back Crossfire for graphics.

Pressure · Sep 27, 2018

cube said:
If people are going to put 2,3,4 GPUs in a PC for compute, I guess AMD better bring back Crossfire for graphics.

Professional apps can use any and all compute resources available (CPUs and GPUs). No need for Crossfire or anything exotic like that.

People already put 4 GPUs in their workstations.

cube · Sep 27, 2018

Pressure said:
Professional apps can use any and all compute resources available (CPUs and GPUs). No need for Crossfire or anything exotic like that.

People already put 4 GPUs in their workstations.

I said graphics. Professional programs are more likely to use OpenGL than Vulkan or DX12.

Pressure · Sep 27, 2018

cube said:
I said graphics. Professional programs are more likely to use OpenGL than Vulkan or DX12.

When you say graphics, do you mean games? Otherwise I do not understand your point. ProRender can leverage all available resources, mixed together, graphic cards and processors. Neither really uses OpenGL (deprecated now that Vulkan is here), it's either OpenCL or CUDA.

cube · Sep 27, 2018

Pressure said:
When you say graphics, do you mean games? Otherwise I do not understand your point. ProRender can leverage all available resources, mixed together, graphic cards and processors. Neither really uses OpenGL (deprecated now that Vulkan is here), it's either OpenCL or CUDA.

No, I mean scientific and engineering programs. AAA games can use Vulkan or DX12.

bsbeamer · Sep 27, 2018

Used to work with a client that had Mac Pro's with external PCIe expanders that utilized MANY GPUs for CUDA processing. Believe they had a rig of at least 10 identical GPUs on one of their systems at their facility. This was not exactly common at the time, but was possible.

Another client had a franken-rig of PCIe ribbon cables and bakers rack style wire shelving powering multiple GPUs for CUDA processing.

All of that being said, what type of processing are you looking for? Metal, CUDA, OpenCL, OpenGL, something else?

GrumpyCoder · Sep 27, 2018

koyoot said:
You have a GPU that costs 700$ and offers 100% reference point of compute performance, and 100% graphics performance. Second GPU has 110% of compute performance, and 80% gaming performance, but costs 500$.

I think your numbers are off. And the prices as well. Until recently (6 months ago) a Vega 64 was in the €900+ price range (it's less than €500 today). In the same time frame a 1080Ti did cost the same (~€900 to less than €700 today). Probably thanks for the mining hype. I think AMD has a very interesting option in the +11GB range. When your market is machine learning and image processing in particular, it's not always about best compute power but memory. You can get their 32GB cards for a lot less than what NVIDIA is charging with their Quadro cards. This is fine for trying a few things on a local machine. At some point a cluster is needed and than AMD is not really an option.

koyoot said:
Metal has OpenCL inside itself. It combines OpenCL for compute and OpenGL for graphcis in one command.

Not sure I understand what you're saying. Are you saying the principle of Metal and OpenCL (parallel computation) is the same? Or are you saying you can actually run OpenCL code using Metal?

I agree on the first, but the same could be said for OpenGL and DirectX, which share the same principle (e.g. using shaders) but are not compatible. However, you can not simply run OpenCL code using Metal. It would be cool if you could, but whatever you have in OpenCL it has to be properly ported to Metal API and that's time consuming and kind of like reinventing the wheel over and over again. If you release a software product, that's probably fine. But if you need to use libraries and tools that are not your own, good luck. This is particularly annoying in the scientific community, trying to reproduce research results. And the gold standard here is CUDA, thanks to NVIDIAs massiv presence in HPC. AMD is nowhere to be found.

bsbeamer said:
All of that being said, what type of processing are you looking for? Metal, CUDA, OpenCL, OpenGL, something else?

In a perfect world, all of it. Metal is nice when doing native development for macOS or iOS. CUDA is a must have for serious scientific work. OpenCL comes in handy if you port your work to single board computers like ODROID-XU4 (with ARM CPU) and OpenGL is great for anything not Windows-native. Those times are over I guess. I don't blame Apple for making the step to Metal2, it's probably the best option for them and a closed eco system. But therein lies the problem, for anything else but a platform specific application, it makes things much harder.

bsbeamer · Sep 27, 2018

GrumpyCoder said:
In a perfect world, all of it. Metal is nice when doing native development for macOS or iOS. CUDA is a must have for serious scientific work. OpenCL comes in handy if you port your work to single board computers like ODROID-XU4 (with ARM CPU) and OpenGL is great for anything not Windows-native. Those times are over I guess. I don't blame Apple for making the step to Metal2, it's probably the best option for them and a closed eco system. But therein lies the problem, for anything else but a platform specific application, it makes things much harder.

NVIDIA cards are the only way to use CUDA. If it's needed, you'll need NVIDIA. Would suggest waiting to see if/when NVIDIA Web Drivers are available for Mojave. That should give us a little more of an indication about the future of NVIDIA cards on Mac. I'm at a similar crossroads right now with video...

NVIDIA added VOLTA drivers in the latest Web Drivers, but that version was pulled after release. Future additions and drivers seem likely, but nothing has been confirmed. There will always be a delay for NVIDIA additions to the Mac side. If you need to live on the bleeding edge with latest GPUs when they are released, best to move to another OS/platform.

(FYI, some users had install issues with the .108 driver and NVIDIA pulled it. It is working great for me with GTX 1080 FE on 10.13.6.)

koyoot · Sep 27, 2018

GrumpyCoder said:
I think your numbers are off. And the prices as well. Until recently (6 months ago) a Vega 64 was in the €900+ price range (it's less than €500 today). In the same time frame a 1080Ti did cost the same (~€900 to less than €700 today). Probably thanks for the mining hype. I think AMD has a very interesting option in the +11GB range. When your market is machine learning and image processing in particular, it's not always about best compute power but memory. You can get their 32GB cards for a lot less than what NVIDIA is charging with their Quadro cards. This is fine for trying a few things on a local machine. At some point a cluster is needed and than AMD is not really an option.

MSRP vs MSRP pricing compared. And no. My numbers are not off.

About the last part: moving the goalpost, eh?

Funnier even: AMD Vega does not need 32 GB of RAM because it has HBCC which helps with ginormous data sets. And for the same thing Nvidia requires you to have 32 GB GPU, for which you pay more, AMD does the same thing with 16 GB frame buffer. If Vega has a lot of meme tech inside it, HBCC actually WORKS. And works very well.

But who the **** cares about it, right?

GrumpyCoder said:
Not sure I understand what you're saying. Are you saying the principle of Metal and OpenCL (parallel computation) is the same? Or are you saying you can actually run OpenCL code using Metal?

I agree on the first, but the same could be said for OpenGL and DirectX, which share the same principle (e.g. using shaders) but are not compatible. However, you can not simply run OpenCL code using Metal. It would be cool if you could, but whatever you have in OpenCL it has to be properly ported to Metal API and that's time consuming and kind of like reinventing the wheel over and over again. If you release a software product, that's probably fine. But if you need to use libraries and tools that are not your own, good luck. This is particularly annoying in the scientific community, trying to reproduce research results. And the gold standard here is CUDA, thanks to NVIDIAs massiv presence in HPC. AMD is nowhere to be found.

Have you actually ported anything from OpenCL to Metal, or what you have written is your opinion based on your assumption that that has to be the case?

No, you don't have to port your application from OpenCL to Metal per-se. Metal is very close to OpenCL in its philosophy and the code OpenCL code can be executed in Metal easily.

CUDA is gold standard because it was the first implementation of GPUs for compute. There is no gold standard Here. AMD's ROCm platform is great, and approaches full OpenCL 2.0 certification, with ROCm 1.9, on Linux Platform. Nvidia by not opening the platform did great for themselves, but f****** up the whole industry, in essence. Mindshare is too strong, that is why people oppose any changes, people cannot even COMPREHEND that there can be a better way than CUDA.

In Pre-Volta architectures, Nvidia had software advantage over everybody in the market. GCN in compute was most advanced but was dragged down by lack of software, lack of libraries, etc. Right now software is catching up, but the hardware is the other factor. IMO Volta is for compute best GPU there is. HOWEVER - it is WAY too expensive. Single GPU costs at least 3000$, and that is 6 times more than single Vega 64. And you can get 90-95% in machine learning performance on Vega 64, with proper drivers, and proper software, for 1/6th of a price of cheapest Volta.

GrumpyCoder · Sep 27, 2018

koyoot said:
And no. My numbers are not off.

Funny, some benchmarks, especially in gaming disagree with you and put the gaming performance around 50% in comparison with a 1080Ti. But that also depends on the game of course and where the bottle neck is.

koyoot said:
About the last part: moving the goalpost, eh?

Not at all. It's actually what a lot of people are looking for in a local machine. For the serious number crunching a big cluster is needed anyway. Neither AMD or NVIDIA have single card that does it all.

koyoot said:
Funnier even: AMD Vega does not need 32 GB of RAM because it has HBCC which helps with ginormous data sets.

Are we back at the RAM doubler days we had with the G3 and G4? I can't even believe we're discussing this. When your dataset you're currently working on is over 30GB then the one thing you need is memory. Sure, you can make smaller and more batches. There are advantages and disadvantages doing this, also when you update your weights and how. It's a mood point discussing this here as it's a current hot research topic with plenty of papers published and also a lot of unsolved problems, especially when it comes to uncertainties in bayesian nets.

koyoot said:
Have you actually ported anything from OpenCL to Metal, or what you have written is your opinion based on your assumption that that has to be the case?

I have, my research group has, my students during their regular courses and thesis' have and researchers around the world I'm in contact with have. But thanks for asking.

koyoot said:
No, you don't have to port your application from OpenCL to Metal per-se. Metal is very close to OpenCL in its philosophy and the code OpenCL code can be executed in Metal easily.

So what your saying is, I can download any arbitrary code from a GitHub repository, let's say written in C++/OpenCL, push a button on a Mac and it just runs using Metal? No touch up, no code changes required? This would be the holy grail for reproducing results from other research groups (if they're using OpenCL which is unlikely). Sadly most of the time it doesn't even work with the same libraries. We've had our share of trouble running stuff from Google using Keras+Tensorflow and when we tried to run some stuff using C++/OpenCV/Tensorflow which worked flawlessly on Intel/NVIDIA it became a massive problem running it on a Jetson board. Solving these problems is wasting time no researcher or student has, especially if you have to publish x papers per year.

koyoot said:
There is no gold standard Here.

Have you set foot in a university or research center in the past couple of years? How many clusters running AMD cards have you seen? Where's the service from AMD that NVIDIA offers? I get regular invites from NVIDIA to bring my students to their research/compute centers to use their resources and they'll even help doing it. For free. When we buy compute clusters, they're there to help (a lot). When we need small boards for autonomous drone projects, they slice 50% off their Jetson boards for education. I'd say there is a gold standard, one that AMD does not offer. I wish they would, but going with AMD instead of NVIDIA in education and research is pretty much suicide. You can do both if you want, but you NEED NVIDIA.

koyoot said:
Nvidia by not opening the platform did great for themselves, but f****** up the whole industry, in essence.

Oh I agree, they should not have done that. In a perfect world CUDA would be available for AMD cards.

koyoot said:
Mindshare is too strong, that is why people oppose any changes, people cannot even COMPREHEND that there can be a better way than CUDA.

Leaving performance aside, it doesn't matter what's better or not. What matters is what people use and in my field, it's just not 100% possible to get around CUDA unless you want to reinvent the wheel over and over again and waste a lot of time. If I'd be in the business to develop an application from scratch and sell it, that's another story.

koyoot said:
Single GPU costs at least 3000$, and that is 6 times more than single Vega 64.

Oh I agree it is too expensive. It's cheaper to rent a VM in the cloud than to buy. The problem is, once the prototyping and test runs are done, you need to move to a cluster because a single card isn't enough. That's why it's called Big Data which runs on clusters, see above. And again, most of the work out there is done with CUDA. You'd be surprised how many researchers there are prototyping in Matlab and bring it to CUDA with the help of the Parallel Computation Toolbox. Similar attempts have been made for OpenCL and it's pretty much dead. Doing it for MPI from Matlab works better and is more wide spread that OpenCL. I'd happily switch to OpenCL (in fact I've tried years ago) or Metal2. The problem is, the rest of the world would have to do the same and that's just not going to happen anytime soon.

NVIDIA RTX or bust...

macrumors 68000

Suspended

macrumors 68020

macrumors 68000

macrumors 603

macrumors 604

Suspended

macrumors 68000

Suspended

Suspended

macrumors 603

Suspended

macrumors 68000

Suspended

macrumors 603

Suspended

macrumors 603

Suspended

macrumors 603

Suspended

macrumors 601

macrumors 68020

macrumors 601

macrumors 603

macrumors 68020

Our Staff