Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Gaming GPU vs workstation GPU, the gaming GPUs are great for gaming but don't do as well at computations. Workstation GPUs do better at computations but aren't as good for gaming.

My D700 and my wife's GTX 680 are both in the 3.1-3.5 tflop range, my D700 kills her GTX 680 in computational benchmarks like OpenCL, her GTX 680 wins when doing rendering benchmarks like Valley.

This is an incorrect assessment of the situation. TFLOPs is a measurement of raw computational power. If the RX 470 has more TFLOPs than the D700s, then it has more raw computational horsepower. However, this raw power is very rarely the bottleneck in benchmarks, for both OpenCL and OpenGL/Metal. Many OpenCL benchmarks have been written for or tuned for the AMD architecture, and thus run extremely inefficiently on the NVIDIA architecture (since they are fundamentally different). Most compute code written/tuned for NVIDIA uses CUDA, as it exposes more of the underlying architecture to the application. There are a few OpenCL examples like Oceanwave and a face recognition benchmark that run much faster on NVIDIA than AMD, but again, that's probably because they were written on NVIDIA and thus have an implicit bias for that architecture.

As always, it really just boils down to the applications you want to run. If you care about LuxMark, then buy an AMD card. If you care about DaVinci Resolve, then buy an NVIDIA card.
 
This is an incorrect assessment of the situation. TFLOPs is a measurement of raw computational power. If the RX 470 has more TFLOPs than the D700s, then it has more raw computational horsepower. However, this raw power is very rarely the bottleneck in benchmarks, for both OpenCL and OpenGL/Metal. Many OpenCL benchmarks have been written for or tuned for the AMD architecture, and thus run extremely inefficiently on the NVIDIA architecture (since they are fundamentally different). Most compute code written/tuned for NVIDIA uses CUDA, as it exposes more of the underlying architecture to the application. There are a few OpenCL examples like Oceanwave and a face recognition benchmark that run much faster on NVIDIA than AMD, but again, that's probably because they were written on NVIDIA and thus have an implicit bias for that architecture.

As always, it really just boils down to the applications you want to run. If you care about LuxMark, then buy an AMD card. If you care about DaVinci Resolve, then buy an NVIDIA card.

Both are AMD cards other than the noted differences in models. Geekbench supports Metal/OpenCL/CUDA. But looking up stats for Geekbench 4 a staffer does say it only uses one GPU at a time for the compute benchmark. Interesting....

So should my metal score actually be 107,852 for 2 cards? Haha..lol
 
Gaming GPU vs workstation GPU, the gaming GPUs are great for gaming but don't do as well at computations. Workstation GPUs do better at computations but aren't as good for gaming.

My D700 and my wife's GTX 680 are both in the 3.1-3.5 tflop range, my D700 kills her GTX 680 in computational benchmarks like OpenCL, her GTX 680 wins when doing rendering benchmarks like Valley.
Ok, cheers. Makes sense.
 
This is an incorrect assessment of the situation. TFLOPs is a measurement of raw computational power. If the RX 470 has more TFLOPs than the D700s, then it has more raw computational horsepower. However, this raw power is very rarely the bottleneck in benchmarks, for both OpenCL and OpenGL/Metal. Many OpenCL benchmarks have been written for or tuned for the AMD architecture, and thus run extremely inefficiently on the NVIDIA architecture (since they are fundamentally different). Most compute code written/tuned for NVIDIA uses CUDA, as it exposes more of the underlying architecture to the application. There are a few OpenCL examples like Oceanwave and a face recognition benchmark that run much faster on NVIDIA than AMD, but again, that's probably because they were written on NVIDIA and thus have an implicit bias for that architecture.

As always, it really just boils down to the applications you want to run. If you care about LuxMark, then buy an AMD card. If you care about DaVinci Resolve, then buy an NVIDIA card.

If you go back to what the original discussion was, the complaint was that the RX470 (AMD card) is slower at Metal than the D700 (Also AMD) even though it has more TFLOPS. So, while the AMD vs Nvidia comparison is valid and benchmarks run better on what they've been optimized for, that doesn't explain why a "faster" gaming GPU is slower running the Metal benchmark than a "slower" D700. I still stand by it being gaming vs workstation GPU and what each is optimized for. Workstation/Computational GPUs with more power simply don't do as well gaming and gaming GPUs don't do as well with computations regardless of their TFLOPS.

Just my $0.02...
 
Wow, in CUDA it scores 237366?

? No, in CUDA it scores 139735: https://browser.geekbench.com/v4/compute/614412

Well, I could install my GTX 980 Ti as second GPU in the Mac Pro to achieve that score, but it is already in my Kaby Lake-PC. :D

CUDA.png
 
Last edited:
If you go back to what the original discussion was, the complaint was that the RX470 (AMD card) is slower at Metal than the D700 (Also AMD) even though it has more TFLOPS. So, while the AMD vs Nvidia comparison is valid and benchmarks run better on what they've been optimized for, that doesn't explain why a "faster" gaming GPU is slower running the Metal benchmark than a "slower" D700. I still stand by it being gaming vs workstation GPU and what each is optimized for. Workstation/Computational GPUs with more power simply don't do as well gaming and gaming GPUs don't do as well with computations regardless of their TFLOPS.

Just my $0.02...

I was specifically commenting on this:

"My D700 and my wife's GTX 680 are both in the 3.1-3.5 tflop range, my D700 kills her GTX 680 in computational benchmarks like OpenCL, her GTX 680 wins when doing rendering benchmarks like Valley."

which is an AMD vs NVIDIA comparison. Also, the OpenCL tests might make good use of 2 GPUs and thus the 2 D700s could beat a single RX 470 if the test isn't limited by raw TFLOPs.

Edit: My main point is that we see a lot of posts along the lines of "GPU X has more TFLOPs than GPU Y but GPU Y runs application Z faster, what's up?". The simple answer is that most applications are not limited by raw GPU TFLOPs and the limiting factor is something else. As a result, you should always take the raw TFLOPs numbers with a huge grain of salt.
 
I was specifically commenting on this:

"My D700 and my wife's GTX 680 are both in the 3.1-3.5 tflop range, my D700 kills her GTX 680 in computational benchmarks like OpenCL, her GTX 680 wins when doing rendering benchmarks like Valley."

which is an AMD vs NVIDIA comparison. Also, the OpenCL tests might make good use of 2 GPUs and thus the 2 D700s could beat a single RX 470 if the test isn't limited by raw TFLOPs.

Edit: My main point is that we see a lot of posts along the lines of "GPU X has more TFLOPs than GPU Y but GPU Y runs application Z faster, what's up?". The simple answer is that most applications are not limited by raw GPU TFLOPs and the limiting factor is something else. As a result, you should always take the raw TFLOPs numbers with a huge grain of salt.

Emphasis added by my, you are incorrect regarding the Geekbench OpenCL benchmarks as well. These only run on a single GPU at a time. One of my D700 GPUs handily beats my wife's GTX 680. Yes, it's AMD vs Nvidia, but you seem to be commenting on things you're not fully understanding, either. Which is fine. And yes, raw TFLOPS is just a number that doesn't translate well into real-world performance expectations.

My point also stands. Gaming GPUs and computing GPUs are better at different things. My car may have 500HP but my 350HP truck will easily tow more, faster, and for longer periods.
 
? No, in CUDA it scores 139735: https://browser.geekbench.com/v4/compute/614412

Well, I could install my GTX 980 Ti as second GPU in the Mac Pro to achieve that score, but it is already in my Kaby Lake-PC. :D

View attachment 695022

Just new to this and I was thinking of taking back the new 1080 Ti for 2 X R9 390x due to my high use in FCPX - however this ego boost and virtual forum whipping stick that geek bench just gave me I might just stick with it.

All jokes aside - what are the thoughts on future support on the Nvidia side to push for better OpenCL capability as my work depends on it. Y'all think I should keep the 1080ti or switch to CF 390x's

6700k
32gb ram
msi 1080ti aero oc (can't overclock it in OS X as far as I know - any suggestions?)
250gb 960 evo m.2
 

Attachments

  • CPU.jpeg
    CPU.jpeg
    181 KB · Views: 489
  • cuda.jpeg
    cuda.jpeg
    171 KB · Views: 681
  • metal.jpeg
    metal.jpeg
    173.2 KB · Views: 657
I actually took my 1080 Ti back yesterday. I had 14 days before I couldn't return it anymore and those days were up so...

The 1080 Ti is great hardware in itself. I found performance in Resolve in CUDA mode to be great. I use Maxwell Render 4 with GPU support and saw nice performance there too, but not in all conditions.

FCPX playback wasn't problematic per se, but BruceX could take a minute to export. F1 2016 hung every now and then. Luxmark Luxball worked, but the heavier scenes didn't. Geekbench OpenCL didn't work.

As a test I put my RX 480 in again and tested the F1 2016 benchmark that made the 1080 Ti hang, and it turned out not only didn't the 480 hang, but it also beat the 1080 Ti in performance.

So.. ups and downs. At the end it came down to simply recognising the fact that the drivers aren't completely up to speed yet. They may, or may not, work better in the future. But I decided not to wait and find out and returned the card.

I'll try to wait for Vega and see if that will work. I also think the Radeon Pro Duo looks very interesting with 11.5 TFlops for $995. I could even drop two Pro Duo in the Mac Pro for some sweet 23 TFlops. =)
 
Benchmark suites are odd things. My RX470 has a higher teraflop output than both of your D700s combined yet gets a lower score.

No, the RX470 is NOT stronger than 2x D700.

Also, the driver for D700 is very mature and highly optimised. On the other hard, there is no official support for RX470. You can make it work by kext edit doesn't mean that the driver can release the card's full potential. In fact, MacOS may not even able to use all 32CU.
 
No, the RX470 is NOT stronger than 2x D700.

Also, the driver for D700 is very mature and highly optimised. On the other hard, there is no official support for RX470. You can make it work by kext edit doesn't mean that the driver can release the card's full potential. In fact, MacOS may not even able to use all 32CU.
My bad. I find it funny that they would post individual specs for memory but not for throughput, ie. say 6GB VRAM each but not say 3.5 TFLOPS each.

Dual AMD FirePro

D700

graphics processors


Dual AMD FirePro D700 graphics processors with 6GB of GDDR5 VRAM each

  • 2,048 stream processors
  • 384-bit-wide memory bus
  • 264 GB/s memory bandwidth
  • 3.5 teraflops performance
 
I actually took my 1080 Ti back yesterday. I had 14 days before I couldn't return it anymore and those days were up so...

The 1080 Ti is great hardware in itself. I found performance in Resolve in CUDA mode to be great. I use Maxwell Render 4 with GPU support and saw nice performance there too, but not in all conditions.

FCPX playback wasn't problematic per se, but BruceX could take a minute to export. F1 2016 hung every now and then. Luxmark Luxball worked, but the heavier scenes didn't. Geekbench OpenCL didn't work.

As a test I put my RX 480 in again and tested the F1 2016 benchmark that made the 1080 Ti hang, and it turned out not only didn't the 480 hang, but it also beat the 1080 Ti in performance.

So.. ups and downs. At the end it came down to simply recognising the fact that the drivers aren't completely up to speed yet. They may, or may not, work better in the future. But I decided not to wait and find out and returned the card.

I'll try to wait for Vega and see if that will work. I also think the Radeon Pro Duo looks very interesting with 11.5 TFlops for $995. I could even drop two Pro Duo in the Mac Pro for some sweet 23 TFlops. =)

Yes, it really does come down to whether your intended usage gets the speed increase from the new hardware.

I also bought the 1080 Ti the other day to see if it would accelerate my mostly 4K Adobe cc workflow, over the current Titan X Maxwell I am using. All the synthetic benchmarks were indeed showing a roughly 70% increase in CUDA and OpenCL performance, via GeekBench, LuxMark and Unigine.

But then I tried out some real-world rendering tests, pertinent to my daily workload.

In Adobe Premiere, I rendered out a DCI 4K ProRes(HQ) 30-sec clip with 4 effects applied: Lumtri Color with 2 LUTs applied, another Lumetri Color with optical mask tracking, a Colortista/Mojo Filter, and NeatVideo noise reduction. The Neatvideo filter itself can be assigned full resources of your hardware, so I applied 11 of my physical CPU cores and 100% of the VRAM and computing from the GPUs to the filter applied to the footage).

This is where I was a bit surprised with the results:

Titan X - CUDA - 05:59
Titan X - OpenCL - 06:04

1080 Ti - CUDA - 05:53
1080 Ti - OpenCL - 05:52

Then I took a 02:15 DCI 4K ProRes (HQ) clip and exported it out in Adobe Media Encoder as a 2K H.264 master at 25mbps.

Titan X - AME - CUDA - 01:29
Titan X - AME - OpenCL - 01:29

1080 Ti - AME - CUDA - 01:30
1080 Ti - AME - OpenCL - 01:29

Suffice to say, though the new GPU hardware itself was more powerful, the results were mostly the same as my older GPU in my workflow.
 
Yes, it really does come down to whether your intended usage gets the speed increase from the new hardware.

I also bought the 1080 Ti the other day to see if it would accelerate my mostly 4K Adobe cc workflow, over the current Titan X Maxwell I am using. All the synthetic benchmarks were indeed showing a roughly 70% increase in CUDA and OpenCL performance, via GeekBench, LuxMark and Unigine.

But then I tried out some real-world rendering tests, pertinent to my daily workload.

In Adobe Premiere, I rendered out a DCI 4K ProRes(HQ) 30-sec clip with 4 effects applied: Lumtri Color with 2 LUTs applied, another Lumetri Color with optical mask tracking, a Colortista/Mojo Filter, and NeatVideo noise reduction. The Neatvideo filter itself can be assigned full resources of your hardware, so I applied 11 of my physical CPU cores and 100% of the VRAM and computing from the GPUs to the filter applied to the footage).

This is where I was a bit surprised with the results:

Titan X - CUDA - 05:59
Titan X - OpenCL - 06:04

1080 Ti - CUDA - 05:53
1080 Ti - OpenCL - 05:52

Then I took a 02:15 DCI 4K ProRes (HQ) clip and exported it out in Adobe Media Encoder as a 2K H.264 master at 25mbps.

Titan X - AME - CUDA - 01:29
Titan X - AME - OpenCL - 01:29

1080 Ti - AME - CUDA - 01:30
1080 Ti - AME - OpenCL - 01:29

Suffice to say, though the new GPU hardware itself was more powerful, the results were mostly the same as my older GPU in my workflow.

Did these same tests on this forum more than a year ago. Same AME results on Open CL and CUDA. Then someone informed us that the GPU doesn't encode h.264. Sure enough I turned on software rendering and the result was the same. There's no GPU rendering for some codecs on macOS.

Then I rebooted into Bootcamp and the AME render result was exactly 4x faster than macOS. On software rendering! On the same machine!
 
Did these same tests on this forum more than a year ago. Same AME results on Open CL and CUDA. Then someone informed us that the GPU doesn't encode h.264. Sure enough I turned on software rendering and the result was the same. There's no GPU rendering for some codecs on macOS.

Then I rebooted into Bootcamp and the AME render result was exactly 4x faster than macOS. On software rendering! On the same machine!

Yes, it seems like AME is using the GPU in the same way as Premiere Pro mostly. So if someone is rendering out a standalone master clip to, say H264, the GPU is only handling the scaling, if there is any at all. Otherwise, it is mostly CPU in this case (my 12-core Mac Pro was using all cores in this instance).

The GPU looks like it will come into play far more in AME, if exporting from a Premiere timeline that hasn't been rendered out. In that case, the GPU will accelerate any effects/scaling/etc. that have been optimized as using the GPU for such.

So GPU acceleration can still have quite an impact, but it depends on how one works on their machine.
 
Gaming GPU vs workstation GPU, the gaming GPUs are great for gaming but don't do as well at computations. Workstation GPUs do better at computations but aren't as good for gaming.

My D700 and my wife's GTX 680 are both in the 3.1-3.5 tflop range, my D700 kills her GTX 680 in computational benchmarks like OpenCL, her GTX 680 wins when doing rendering benchmarks like Valley.

The big problem with the theory you've picked (gaming vs workstation) is that the D700 is really just an HD7970. They are the same card, they use the same drivers, and they bench the same. Heck, they even have the exact same ID--AMD didn't bother to give the D700 a different one. There have been many discussions about this in the past. The D700 is not a workstation card except in branding.

ATI did an article where they explained the difference between workstation and gaming cards. Workstation cards are not faster than (or slower than) their equivalent gaming cards. The exception is where there are highly optimized workstation-GPU-only drivers. But these are on Windows only, not OS X, and wouldn't help a D700 there anyway because the card reports itself identically to an HD7970.

Your D700 vs GTX680 comparison is apples and oranges. Differences in benchmarks for that particular pair of cards can be explained many different ways, from having different architectures, to using different drivers, and to which brand the software is optimized for. A 7970 will perform just as well as a D700 against a GTX680 in OpenCL and just as poorly in Valley, so "workstation" vs "gaming" is not the explanation.
 
Last edited:
  • Like
Reactions: itdk92 and H2SO4
RX460 here. I'm having second thoughts with this card - I don't think its as OOB as one might think - it seems to crash some programs after running for a while.
 

Attachments

  • Capture.JPG
    Capture.JPG
    24 KB · Views: 364
The big problem with the theory you've picked (gaming vs workstation) is that the D700 is really just an HD7970. They are the same card, they use the same drivers, and they bench the same. Heck, they even have the exact same ID--AMD didn't bother to give the D700 a different one. There have been many discussions about this in the past. The D700 is not a workstation card except in branding.

ATI did an article where they explained the difference between workstation and gaming cards. Workstation cards are not faster than (or slower than) their equivalent gaming cards. The exception is where there are highly optimized workstation-GPU-only drivers. But these are on Windows only, not OS X, and wouldn't help a D700 there anyway because the card reports itself identically to an HD7970.

Your D700 vs GTX680 comparison is apples and oranges. Differences in benchmarks for that particular pair of cards can be explained many different ways, from having different architectures, to using different drivers, and to which brand the software is optimized for. A 7970 will perform just as well as a D700 against a GTX680 in OpenCL and just as poorly in Valley, so "workstation" vs "gaming" is not the explanation.

Again, AMD/Nvidia base a lot of their Quattro/Firepro cards on other Radeon/GTX offerings. Besides differences in drivers/support and perhaps slight chip differences. But I would not call them a workstation card just because of that distinction alone.
 
So CUDA is more than three times faster than Metal.

SAD! ^H^H^H^H^H That's rather disappointing.
Ahem. There is M395X from iMac in there. And it still scores higher in OpenCL than GTX 1080 Ti, in Metal ;).

So it appears its not matter of Metal, but matter of Nvidia rubbish Metal/OpenCL drivers.

P.S. R9 395X has 3.7 TFLOPs of compute power. GTX 1080 Ti - 11.5 TFLOPs.

So CUDA performance actually does not reflect the difference in performance that should be apparent between both GPUs.

But this is MacOS.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.