Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Miners prefer AMD GPUs precisely because of compute performance and price.

Quite often one sees that a comparable AMD card offers about double the FP64 power in the consumer space.

Miners prefer custom ASICs for both Bitcoin and Ethereum mining now, though I didn't realize the Mac Pro forum on MacRumors was a place that focused on cryptocurrency mining.

NVIDIA targets FP64 performance with their TITAN cards. A quick Google search reveals:

https://www.pcper.com/reviews/Graphics-Cards/NVIDIA-TITAN-V-Review-Part-2-Compute-Performance

which lists FP64 performance of the TITAN V at 7.45 TFLOPs (boost clocks) vs 0.85 TFLOPs for the Vega 64. I'm not quite sure how you can claim the Vega cards are better at FP64, anyone who is doing serious FP64 work would just buy a TITAN card (since FP64 workloads are typically in the profesional/prosumer space, not consumer space).

Again, still not sure how you can claim the Turing cards have "very low compute performance" as a general blanket statement.
 
Miners prefer custom ASICs for both Bitcoin and Ethereum mining now, though I didn't realize the Mac Pro forum on MacRumors was a place that focused on cryptocurrency mining.

NVIDIA targets FP64 performance with their TITAN cards. A quick Google search reveals:

https://www.pcper.com/reviews/Graphics-Cards/NVIDIA-TITAN-V-Review-Part-2-Compute-Performance

which lists FP64 performance of the TITAN V at 7.45 TFLOPs (boost clocks) vs 0.85 TFLOPs for the Vega 64. I'm not quite sure how you can claim the Vega cards are better at FP64, anyone who is doing serious FP64 work would just buy a TITAN card (since FP64 workloads are typically in the profesional/prosumer space, not consumer space).

Again, still not sure how you can claim the Turing cards have "very low compute performance" as a general blanket statement.
Mining is an example, anybody seriously looking for FP64 performance in a consumer card will consider AMD.

There's no AMD equivalent to Titan cards usually (maybe Frontier was at some point, not consumer).

The RTX cards offer puny FP64 power compared to Vega.
 
NVIDIA targets FP64 performance with their TITAN cards..
Some mining application require SHA256 and therein is the advantage of AMD GPUs which offer better integer performance which in turn is required by the 32 bit right rotation operation required for some algorithms. AMD used be lightyears ahead of NVIDIA on this, but NVIDIA got better as well with their funnel shifter. It's been a while since I looked at this stuff, so you'd have to run some benchmarks to get exact numbers on current generation cards. It's probably the reason why AMD GPUs are still more widely used by miners these days (depends on what you're mining of course).
 
Mining is an example, anybody seriously looking for FP64 performance in a consumer card will consider AMD.

There's no AMD equivalent to Titan cards usually (maybe Frontier was at some point, not consumer).

The RTX cards offer puny FP64 power compared to Vega.

My point was that the number of people who are seriously interested in FP64 performance on "consumer" cards is vanishingly small. If you are really serious about FP64 performance, then an 8.75x improvement (7.45 vs 0.85) in performance for a 5x increase in price ($3k vs $600) is a no-brainer.

There's a pretty simple reason why NVIDIA's consumer cards don't have FP64 -- no consumer workload uses FP64, so it's just wasted die space to put a ton of FP64 horsepower that will be sitting idle while you're playing a game or doing other consumer workloads.
 
I've said this before, in some cases AMD is faster than NVIDIA, but let's put them roughly on the same level for some professional applications. The problem that remains is, if I have to choose between two similar performing cards, but one is giving me much better gaming performance, which one would I choose?
You have a GPU that costs 700$ and offers 100% reference point of compute performance, and 100% graphics performance. Second GPU has 110% of compute performance, and 80% gaming performance, but costs 500$.

Which one do you pick?
And here's another problem. How are we going to write those nice OpenCL programs with OpenCL being deprecated in Mojave? I can't get around CUDA in my research area, some stuff works with OpenCL, but not everything. Metal 2 only is going to be problematic. Again, for some stuff this might work, but how do I scale it to clusters for real number crunching? I guess they have to bring back the Xserve with major GPU support then. :D
Metal has OpenCL inside itself. It combines OpenCL for compute and OpenGL for graphcis in one command.

And please guys, leave this AMD vs Nvidia circle jerk. Its only your fault that you locked yourselves to CUDA applications. If you are locked to CUDA - there is plenty of options for you - and if you need UNIX ecosystem - Linux is for you.

My point was that the number of people who are seriously interested in FP64 performance on "consumer" cards is vanishingly small. If you are really serious about FP64 performance, then an 8.75x improvement (7.45 vs 0.85) in performance for a 5x increase in price ($3k vs $600) is a no-brainer.

There's a pretty simple reason why NVIDIA's consumer cards don't have FP64 -- no consumer workload uses FP64, so it's just wasted die space to put a ton of FP64 horsepower that will be sitting idle while you're playing a game or doing other consumer workloads.
No consumer wokload uses FP64 with Nvidia GPUs ;).

All of Hawaii cards had full FP64 performance, regardless whether they were consumer, or professional GPUs. It was up to you to decide whether your optimization will use those cores, or not.

Hooray for open standards!
 
Metal has OpenCL inside itself. It combines OpenCL for compute and OpenGL for graphcis in one command.

Metal has a compute language, but it's not OpenCL.

(Metal's graphics layer is also very much not the same thing as OpenGL. OpenCL is at least a little closer to Metal's compute language.)
 
My point was that the number of people who are seriously interested in FP64 performance on "consumer" cards is vanishingly small. If you are really serious about FP64 performance, then an 8.75x improvement (7.45 vs 0.85) in performance for a 5x increase in price ($3k vs $600) is a no-brainer.

There's a pretty simple reason why NVIDIA's consumer cards don't have FP64 -- no consumer workload uses FP64, so it's just wasted die space to put a ton of FP64 horsepower that will be sitting idle while you're playing a game or doing other consumer workloads.
Not every professional can afford a $3K card, that's the point.
 
RTX smells like Geforce3. Price hike with tech nobody is ready for. Historically, this is when AMD makes strides with price/perf.
 
RTX smells like Geforce3. Price hike with tech nobody is ready for. Historically, this is when AMD makes strides with price/perf.
Price hike is because of die size hike. You are paying 500$ for RTX 2070, which has almost the same die size as 699$ GTX 1080 Ti had.

The other side of this coin is that Pascal is more efficient than Turing in terms of performance/mm2 and in performance/dollar.

Also RTX is nothing new. It is proprietary way for Nvidia to lock more people to CUDA ecosystem, on professional side of things if you are interested in RT. You could do the same thing on ANY GPU for past two years with AMD ProRender Engine implemented in your applications.

But who cared about it at that time? When Nvidia starts to offer something like this it gets enough tracition, even if it essentially inferior product. Because CUDA RT will not work anywhere else than Nvidia GPUs. ProRender works on everything.

The other side of this coin is that Navi will have most likely dedicated hardware(acceleration) for this, but this still remains to be confirmed. And don't expect from Navi to be large die. It will be very small dies, compared to Turing. Big Navi is coming in 2020.
 
Price hike is because of die size hike. You are paying 500$ for RTX 2070, which has almost the same die size as 699$ GTX 1080 Ti had.

The other side of this coin is that Pascal is more efficient than Turing in terms of performance/mm2 and in performance/dollar.

Also RTX is nothing new. It is proprietary way for Nvidia to lock more people to CUDA ecosystem, on professional side of things if you are interested in RT. You could do the same thing on ANY GPU for past two years with AMD ProRender Engine implemented in your applications.

But who cared about it at that time? When Nvidia starts to offer something like this it gets enough tracition, even if it essentially inferior product. Because CUDA RT will not work anywhere else than Nvidia GPUs. ProRender works on everything.

The other side of this coin is that Navi will have most likely dedicated hardware(acceleration) for this, but this still remains to be confirmed. And don't expect from Navi to be large die. It will be very small dies, compared to Turing. Big Navi is coming in 2020.
It really doesn't make a difference for me. I can only afford consumer cards.
 
It could be custom programs.

Okay, so you're claiming "very low compute performance" on a custom program that requires FP64 for a customer that doesn't want to buy a TITAN card. Fair enough then, such a customer should go and buy a Vega GPU, sure.

If you have specific comparisons of where Turing is significanly worse than Vega, then it's worth highlighting those with as much detail as possible, rather than just making blanket statements like "very low compute performance". You said yourself in the next post that the 2080 Ti's FP32 performance is higher than Vega 64, so it's hard to follow what the usage case you're specifically interested in actually is and why a Turing GPU wouldn't be a good fit for it.

Having said that, it's hard to recommend buying an NVIDIA GPU for macOS at all these days, so this whole discussion is somewhat moot.
[doublepost=1538002489][/doublepost]
It really doesn't make a difference for me. I can only afford consumer cards.

And as always, you should buy the GPU you can afford that runs the programs you care about the best. If you're one of the folks who wants good FP64 performance at a $500 price point, great, go and buy a Vega card.
 
Okay, so you're claiming "very low compute performance" on a custom program that requires FP64 for a customer that doesn't want to buy a TITAN card. Fair enough then, such a customer should go and buy a Vega GPU, sure.

If you have specific comparisons of where Turing is significanly worse than Vega, then it's worth highlighting those with as much detail as possible, rather than just making blanket statements like "very low compute performance". You said yourself in the next post that the 2080 Ti's FP32 performance is higher than Vega 64, so it's hard to follow what the usage case you're specifically interested in actually is and why a Turing GPU wouldn't be a good fit for it.

Having said that, it's hard to recommend buying an NVIDIA GPU for macOS at all these days, so this whole discussion is somewhat moot.
But the 2080 Ti is very expensive, so you could buy a Vega 64 instead of a 2080 for more FP32 power, besides more than double FP64.
 
But the 2080 Ti is very expensive, so you could buy a Vega 64 instead of a 2080 for more FP32 power, besides more than double FP64.
For a price of single 2080 Ti, you can buy two Vega 64's. And three Vega 56's.
 
  • Like
Reactions: ssgbryan
If people are going to put 2,3,4 GPUs in a PC for compute, I guess AMD better bring back Crossfire for graphics.

Professional apps can use any and all compute resources available (CPUs and GPUs). No need for Crossfire or anything exotic like that.

People already put 4 GPUs in their workstations.
 
  • Like
Reactions: ssgbryan and ETN3
Professional apps can use any and all compute resources available (CPUs and GPUs). No need for Crossfire or anything exotic like that.

People already put 4 GPUs in their workstations.
I said graphics. Professional programs are more likely to use OpenGL than Vulkan or DX12.
 
I said graphics. Professional programs are more likely to use OpenGL than Vulkan or DX12.

When you say graphics, do you mean games? Otherwise I do not understand your point. ProRender can leverage all available resources, mixed together, graphic cards and processors. Neither really uses OpenGL (deprecated now that Vulkan is here), it's either OpenCL or CUDA.
 
  • Like
Reactions: ssgbryan
When you say graphics, do you mean games? Otherwise I do not understand your point. ProRender can leverage all available resources, mixed together, graphic cards and processors. Neither really uses OpenGL (deprecated now that Vulkan is here), it's either OpenCL or CUDA.
No, I mean scientific and engineering programs. AAA games can use Vulkan or DX12.
 
Used to work with a client that had Mac Pro's with external PCIe expanders that utilized MANY GPUs for CUDA processing. Believe they had a rig of at least 10 identical GPUs on one of their systems at their facility. This was not exactly common at the time, but was possible.

Another client had a franken-rig of PCIe ribbon cables and bakers rack style wire shelving powering multiple GPUs for CUDA processing.

All of that being said, what type of processing are you looking for? Metal, CUDA, OpenCL, OpenGL, something else?
 
You have a GPU that costs 700$ and offers 100% reference point of compute performance, and 100% graphics performance. Second GPU has 110% of compute performance, and 80% gaming performance, but costs 500$.
I think your numbers are off. And the prices as well. Until recently (6 months ago) a Vega 64 was in the €900+ price range (it's less than €500 today). In the same time frame a 1080Ti did cost the same (~€900 to less than €700 today). Probably thanks for the mining hype. I think AMD has a very interesting option in the +11GB range. When your market is machine learning and image processing in particular, it's not always about best compute power but memory. You can get their 32GB cards for a lot less than what NVIDIA is charging with their Quadro cards. This is fine for trying a few things on a local machine. At some point a cluster is needed and than AMD is not really an option.


Metal has OpenCL inside itself. It combines OpenCL for compute and OpenGL for graphcis in one command.
Not sure I understand what you're saying. Are you saying the principle of Metal and OpenCL (parallel computation) is the same? Or are you saying you can actually run OpenCL code using Metal?

I agree on the first, but the same could be said for OpenGL and DirectX, which share the same principle (e.g. using shaders) but are not compatible. However, you can not simply run OpenCL code using Metal. It would be cool if you could, but whatever you have in OpenCL it has to be properly ported to Metal API and that's time consuming and kind of like reinventing the wheel over and over again. If you release a software product, that's probably fine. But if you need to use libraries and tools that are not your own, good luck. This is particularly annoying in the scientific community, trying to reproduce research results. And the gold standard here is CUDA, thanks to NVIDIAs massiv presence in HPC. AMD is nowhere to be found.

All of that being said, what type of processing are you looking for? Metal, CUDA, OpenCL, OpenGL, something else?
In a perfect world, all of it. Metal is nice when doing native development for macOS or iOS. CUDA is a must have for serious scientific work. OpenCL comes in handy if you port your work to single board computers like ODROID-XU4 (with ARM CPU) and OpenGL is great for anything not Windows-native. Those times are over I guess. I don't blame Apple for making the step to Metal2, it's probably the best option for them and a closed eco system. But therein lies the problem, for anything else but a platform specific application, it makes things much harder.
 
In a perfect world, all of it. Metal is nice when doing native development for macOS or iOS. CUDA is a must have for serious scientific work. OpenCL comes in handy if you port your work to single board computers like ODROID-XU4 (with ARM CPU) and OpenGL is great for anything not Windows-native. Those times are over I guess. I don't blame Apple for making the step to Metal2, it's probably the best option for them and a closed eco system. But therein lies the problem, for anything else but a platform specific application, it makes things much harder.

NVIDIA cards are the only way to use CUDA. If it's needed, you'll need NVIDIA. Would suggest waiting to see if/when NVIDIA Web Drivers are available for Mojave. That should give us a little more of an indication about the future of NVIDIA cards on Mac. I'm at a similar crossroads right now with video...

NVIDIA added VOLTA drivers in the latest Web Drivers, but that version was pulled after release. Future additions and drivers seem likely, but nothing has been confirmed. There will always be a delay for NVIDIA additions to the Mac side. If you need to live on the bleeding edge with latest GPUs when they are released, best to move to another OS/platform.

(FYI, some users had install issues with the .108 driver and NVIDIA pulled it. It is working great for me with GTX 1080 FE on 10.13.6.)
 
I think your numbers are off. And the prices as well. Until recently (6 months ago) a Vega 64 was in the €900+ price range (it's less than €500 today). In the same time frame a 1080Ti did cost the same (~€900 to less than €700 today). Probably thanks for the mining hype. I think AMD has a very interesting option in the +11GB range. When your market is machine learning and image processing in particular, it's not always about best compute power but memory. You can get their 32GB cards for a lot less than what NVIDIA is charging with their Quadro cards. This is fine for trying a few things on a local machine. At some point a cluster is needed and than AMD is not really an option.
MSRP vs MSRP pricing compared. And no. My numbers are not off.

About the last part: moving the goalpost, eh?

Funnier even: AMD Vega does not need 32 GB of RAM because it has HBCC which helps with ginormous data sets. And for the same thing Nvidia requires you to have 32 GB GPU, for which you pay more, AMD does the same thing with 16 GB frame buffer. If Vega has a lot of meme tech inside it, HBCC actually WORKS. And works very well.

But who the **** cares about it, right?
Not sure I understand what you're saying. Are you saying the principle of Metal and OpenCL (parallel computation) is the same? Or are you saying you can actually run OpenCL code using Metal?

I agree on the first, but the same could be said for OpenGL and DirectX, which share the same principle (e.g. using shaders) but are not compatible. However, you can not simply run OpenCL code using Metal. It would be cool if you could, but whatever you have in OpenCL it has to be properly ported to Metal API and that's time consuming and kind of like reinventing the wheel over and over again. If you release a software product, that's probably fine. But if you need to use libraries and tools that are not your own, good luck. This is particularly annoying in the scientific community, trying to reproduce research results. And the gold standard here is CUDA, thanks to NVIDIAs massiv presence in HPC. AMD is nowhere to be found.
Have you actually ported anything from OpenCL to Metal, or what you have written is your opinion based on your assumption that that has to be the case?

No, you don't have to port your application from OpenCL to Metal per-se. Metal is very close to OpenCL in its philosophy and the code OpenCL code can be executed in Metal easily.

CUDA is gold standard because it was the first implementation of GPUs for compute. There is no gold standard Here. AMD's ROCm platform is great, and approaches full OpenCL 2.0 certification, with ROCm 1.9, on Linux Platform. Nvidia by not opening the platform did great for themselves, but f****** up the whole industry, in essence. Mindshare is too strong, that is why people oppose any changes, people cannot even COMPREHEND that there can be a better way than CUDA.

In Pre-Volta architectures, Nvidia had software advantage over everybody in the market. GCN in compute was most advanced but was dragged down by lack of software, lack of libraries, etc. Right now software is catching up, but the hardware is the other factor. IMO Volta is for compute best GPU there is. HOWEVER - it is WAY too expensive. Single GPU costs at least 3000$, and that is 6 times more than single Vega 64. And you can get 90-95% in machine learning performance on Vega 64, with proper drivers, and proper software, for 1/6th of a price of cheapest Volta.
 
Last edited:
And no. My numbers are not off.
Funny, some benchmarks, especially in gaming disagree with you and put the gaming performance around 50% in comparison with a 1080Ti. But that also depends on the game of course and where the bottle neck is.

About the last part: moving the goalpost, eh?
Not at all. It's actually what a lot of people are looking for in a local machine. For the serious number crunching a big cluster is needed anyway. Neither AMD or NVIDIA have single card that does it all.

Funnier even: AMD Vega does not need 32 GB of RAM because it has HBCC which helps with ginormous data sets.
Are we back at the RAM doubler days we had with the G3 and G4? I can't even believe we're discussing this. When your dataset you're currently working on is over 30GB then the one thing you need is memory. Sure, you can make smaller and more batches. There are advantages and disadvantages doing this, also when you update your weights and how. It's a mood point discussing this here as it's a current hot research topic with plenty of papers published and also a lot of unsolved problems, especially when it comes to uncertainties in bayesian nets.

Have you actually ported anything from OpenCL to Metal, or what you have written is your opinion based on your assumption that that has to be the case?
I have, my research group has, my students during their regular courses and thesis' have and researchers around the world I'm in contact with have. But thanks for asking.

No, you don't have to port your application from OpenCL to Metal per-se. Metal is very close to OpenCL in its philosophy and the code OpenCL code can be executed in Metal easily.
So what your saying is, I can download any arbitrary code from a GitHub repository, let's say written in C++/OpenCL, push a button on a Mac and it just runs using Metal? No touch up, no code changes required? This would be the holy grail for reproducing results from other research groups (if they're using OpenCL which is unlikely). Sadly most of the time it doesn't even work with the same libraries. We've had our share of trouble running stuff from Google using Keras+Tensorflow and when we tried to run some stuff using C++/OpenCV/Tensorflow which worked flawlessly on Intel/NVIDIA it became a massive problem running it on a Jetson board. Solving these problems is wasting time no researcher or student has, especially if you have to publish x papers per year.

There is no gold standard Here.
Have you set foot in a university or research center in the past couple of years? How many clusters running AMD cards have you seen? Where's the service from AMD that NVIDIA offers? I get regular invites from NVIDIA to bring my students to their research/compute centers to use their resources and they'll even help doing it. For free. When we buy compute clusters, they're there to help (a lot). When we need small boards for autonomous drone projects, they slice 50% off their Jetson boards for education. I'd say there is a gold standard, one that AMD does not offer. I wish they would, but going with AMD instead of NVIDIA in education and research is pretty much suicide. You can do both if you want, but you NEED NVIDIA.

Nvidia by not opening the platform did great for themselves, but f****** up the whole industry, in essence.
Oh I agree, they should not have done that. In a perfect world CUDA would be available for AMD cards.

Mindshare is too strong, that is why people oppose any changes, people cannot even COMPREHEND that there can be a better way than CUDA.
Leaving performance aside, it doesn't matter what's better or not. What matters is what people use and in my field, it's just not 100% possible to get around CUDA unless you want to reinvent the wheel over and over again and waste a lot of time. If I'd be in the business to develop an application from scratch and sell it, that's another story.

Single GPU costs at least 3000$, and that is 6 times more than single Vega 64.
Oh I agree it is too expensive. It's cheaper to rent a VM in the cloud than to buy. The problem is, once the prototyping and test runs are done, you need to move to a cluster because a single card isn't enough. That's why it's called Big Data which runs on clusters, see above. And again, most of the work out there is done with CUDA. You'd be surprised how many researchers there are prototyping in Matlab and bring it to CUDA with the help of the Parallel Computation Toolbox. Similar attempts have been made for OpenCL and it's pretty much dead. Doing it for MPI from Matlab works better and is more wide spread that OpenCL. I'd happily switch to OpenCL (in fact I've tried years ago) or Metal2. The problem is, the rest of the world would have to do the same and that's just not going to happen anytime soon.
 
  • Like
Reactions: 09872738
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.