Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
They
Some people actually think games benchmarks are the only thing that matters.. ;)

They're important for two reasons:

1. It gives you an idea of how future proof your investment is. If the best GPU right now can barely hit 50FPS average at 4K on ultra settings then you can hold off on purchasing the card and a 4K monitor for a while longer if you need 4K gaming.

2. Multiplayer online people are nuts and they don't want to skip a frame or see dips in frame rate.
 
They


They're important for two reasons:

1. It gives you an idea of how future proof your investment is. If the best GPU right now can barely hit 50FPS average at 4K on ultra settings then you can hold off on purchasing the card and a 4K monitor for a while longer if you need 4K gaming.

2. Multiplayer online people are nuts and they don't want to skip a frame or see dips in frame rate.

Right, my point was that LuxMark has obviously been tuned for the AMD compute architecture, so it's not really clear how useful it is as a benchmark overall. It's pretty easy to write OpenCL code that runs fantastically on one GPU and really poorly on all other GPUs, even other GPUs from the same vendor as the one that was used for the tuning.
 
Doesn't seem to me that Polaris 10 has anything more hidden, seems full fat as it is.
So, either 490 is GRRD5X which I already doubt the controller supports, or Vega it is which is also unlikely. Overclocked 480 also seems doubtful cause of the heat and power draw excess already.
 
They


They're important for two reasons:

1. It gives you an idea of how future proof your investment is. If the best GPU right now can barely hit 50FPS average at 4K on ultra settings then you can hold off on purchasing the card and a 4K monitor for a while longer if you need 4K gaming.

2. Multiplayer online people are nuts and they don't want to skip a frame or see dips in frame rate.
Yes, we all know this Soy. The problem begins when you drive your mindshare about a GPU brand based on gaming benchmarks, without even considering compute benchmarks.

You know perfectly well about what, and who particularly I write this.
Right, my point was that LuxMark has obviously been tuned for the AMD compute architecture, so it's not really clear how useful it is as a benchmark overall. It's pretty easy to write OpenCL code that runs fantastically on one GPU and really poorly on all other GPUs, even other GPUs from the same vendor as the one that was used for the tuning.
It is tuned for OpenCL. Every time when something shows that AMD is better than Nvidia in something, someone has to say that it is tuned for AMD architecture?
 
Yes, we all know this Soy. The problem begins when you drive your mindshare about a GPU brand based on gaming benchmarks, without even considering compute benchmarks.

You know perfectly well about what, and who particularly I write this.

It is tuned for OpenCL. Every time when something shows that AMD is better than Nvidia in something, someone has to say that it is tuned for AMD architecture?

Oh yes indeed, Nvidia is targeting the gaming market mostly and they haven't been serious about compute since the original Titan.
 
Oho, trouble in Redland? Lower compute power but less power hungry and more power efficient?!
http://videocardz.com/61753/nvidia-geforce-gtx-1060-specifications-leaked-faster-than-rx-480
http://wccftech.com/nvidia-gtx-1060-specifications-benchmarks-leaked/
They might come down with the price point to really hurt AMD.

Their current pricing structure doesn't look like they were interested in a price war with AMD. Yes, their current Pascal lineup doesn't have direct competition, but I don't think they'll leave much more than a $100 gap between GTX 1060 and 1070, which would mean >$300 for the GTX 1060 Founders Edition.

GTX 1070 and 1080 prices would look even more ridiculous if they introduced the 1060 e.g. for $219.
 
Oh yes indeed, Nvidia is targeting the gaming market mostly and they haven't been serious about compute since the original Titan.

I guess that explains the lack of CUDA usage or number of supercomputers that are built with AMD GPUs, right? Not sure how you can say NVIDIA doesn't care about compute, they just don't care about OpenCL as much as CUDA.
[doublepost=1467388215][/doublepost]
It is tuned for OpenCL. Every time when something shows that AMD is better than Nvidia in something, someone has to say that it is tuned for AMD architecture?

There's some other benchmark (face detection?) where NVIDIA crushes AMD in OpenCL performance. I'd consider that a case that is either tuned for or is just a very good fit for their architecture. Am I going to make purchasing decisions based on that one test? No, because I don't think it's representative of compute performance in general, just like I personally don't consider Luxmark to be all that representative either.
 
I guess that explains the lack of CUDA usage or number of supercomputers that are built with AMD GPUs, right? Not sure how you can say NVIDIA doesn't care about compute, they just don't care about OpenCL as much as CUDA.
[doublepost=1467388215][/doublepost]

There's some other benchmark (face detection?) where NVIDIA crushes AMD in OpenCL performance. I'd consider that a case that is either tuned for or is just a very good fit for their architecture. Am I going to make purchasing decisions based on that one test? No, because I don't think it's representative of compute performance in general, just like I personally don't consider Luxmark to be all that representative either.
First of all, CUDA was first compute API that simplified programming of applications for GPU acceleration. Before that, all compute was done on the CPU. OpenCL, was brought to life after CUDA.

OpenCL is an API that exposes whole hardware for the application. Effects of it show how powerful in compute is particular architecture. If Nvidia is slower in it than AMD, there are two possibilities why: either it is due to drivers not being optimized for it, or... the architecture has very weak compute capabilities.

Compare OpenCL results from GPUs with similar compute power: R9 390X and GTX 980 TI. Very often in OpenCL R9 390X is much faster than Nvidia counterpart. Doesn't matter if it is due to drivers or weaker compute capabilities. Similar thing we see with for example Final Cut Pro X which is optimized for OpenCL, but brand agnostic.

On the other hand CUDA allows to "polymorph" graphics capabilities into compute, and vice versa. Therefore Nvidia arch is more... universal, whereas AMD architecture is focused solely on compute.

As for face detection, no. It simply shows that Nvidia hardware is much, much better in image analysis than AMD hardware.
 
  • Like
Reactions: orph
This is all explained in the nvida sticky after the GTX 5XX cards with the GTX680+ they moved away from compute performance and towards pure gaming in there GTX cards. if your talking supercomputers then Tesla cards & quadro cards are more compute optimized. (to stop supercomputers and pro users using the cheep GTX cards)
AMD has not made this split so they still have compute power in there consumer cards.
if your only playing games this may not matter (maybe volcan or DX12 will change this) but if your doing work that relies on that compute power for say openCL then it will matter so id depends on what you want from your graphics card.
(depends on your work flow)

even cuda per core was slower on the GTX6XX cards but kind of brute forced with the number of cores.

i think thats correct
 
https://forums.overclockers.co.uk/showpost.php?p=29723592&postcount=75

2000 units Sold in 24h only on OC.uk. Looks like quite big success for AMD.

Manuel, I posted direct link to AMD forum with this problem, but... after that I thought that the guys didn't provide ANY proof on the matter. I would wait for conclusions, until they will bring any proofs, or they only troll on AMD forum.

Especially that this is someone who posts for the first time on AMD forums.
 
  • Like
Reactions: Crosscreek
Right now supply is being purposefully controlled by Nvidia distribution to keep profit margins high. The retailer I have known for many years told me himself how the supply is being controlled and how much profit margin there is.

Risky, and that's why I chose 1070, but please let's make sure we see solid evidence of these board failures. After Britain, a whole country, ****ed itself because of Internet stories I think we should now be properly awakened to the damage of this so called misinformation superhighway before Idiocracy becomes full scale reality.
 
  • Like
Reactions: Crosscreek
This is all explained in the nvida sticky after the GTX 5XX cards with the GTX680+ they moved away from compute performance and towards pure gaming in there GTX cards. if your talking supercomputers then Tesla cards & quadro cards are more compute optimized. (to stop supercomputers and pro users using the cheep GTX cards)
AMD has not made this split so they still have compute power in there consumer cards.
if your only playing games this may not matter (maybe volcan or DX12 will change this) but if your doing work that relies on that compute power for say openCL then it will matter so id depends on what you want from your graphics card.
(depends on your work flow)

even cuda per core was slower on the GTX6XX cards but kind of brute forced with the number of cores.

i think thats correct

This is true for double precision performance only. So yeah, if you care about double precision performance, Fermi cards or the original TITAN had a better ratio of double precision to single precision horsepower. Nothing can compare with the GP100 though, which will likely only be found as a high-end Quadro card.

NVIDIA's "gaming" cards have tons of single-precision horsepower, so if your compute algorithm needs that, then it can run fine on pretty much any card. If you tune your compute algorithm to run well on the AMD design (factoring in things like their cache heirarchy, balance between math and texture throughput, and so on), then it will likely run poorly on other GPUs. The same can be said for compute algorithms that are tuned for the NVIDIA architecture. Given the significant differences between Kepler/Maxwell/Pascal and GCN, and how important that level of tuning is for compute performance in general, it really is difficult to come up with a single compute application that runs well on every GPU in the world.

But hey, you guys can keep blaming the NVIDIA OpenCL drivers as being ****, that's a much easier explanation. How many supercomputers are using AMD GPUs? Zero, as far as I know. Given the number that are running on NVIDIA GPUs, they must be doing something right.
 
Right now supply is being purposefully controlled by Nvidia distribution to keep profit margins high. The retailer I have known for many years told me himself how the supply is being controlled and how much profit margin there is.

This sounds like a conspiracy theory. I think reality is a bit simpler. Nvidia decided to launch cards quicker than in previous launches. Instead of building up a large supply and then launching they launched almost as soon as they started producing cards. Thus, founder editions is the premium you pay to get a card before supply has caught up with demand. This is a win for Nvidia in that they get to charge a higher price for reference cards and those who want the cards first and then in a couple months AIB manufacturers can sell all their cards at the suggested retail price.
 
This is true for double precision performance only. So yeah, if you care about double precision performance, Fermi cards or the original TITAN had a better ratio of double precision to single precision horsepower. Nothing can compare with the GP100 though, which will likely only be found as a high-end Quadro card.

NVIDIA's "gaming" cards have tons of single-precision horsepower, so if your compute algorithm needs that, then it can run fine on pretty much any card. If you tune your compute algorithm to run well on the AMD design (factoring in things like their cache heirarchy, balance between math and texture throughput, and so on), then it will likely run poorly on other GPUs. The same can be said for compute algorithms that are tuned for the NVIDIA architecture. Given the significant differences between Kepler/Maxwell/Pascal and GCN, and how important that level of tuning is for compute performance in general, it really is difficult to come up with a single compute application that runs well on every GPU in the world.

But hey, you guys can keep blaming the NVIDIA OpenCL drivers as being ****, that's a much easier explanation. How many supercomputers are using AMD GPUs? Zero, as far as I know. Given the number that are running on NVIDIA GPUs, they must be doing something right.
What you describe is true for gaming scenarios, where optimization is required for specific architecture. Compute is just that: compute. Those are mathematical algorithms. Only optimization is done on software, be it CUDA or OpenCL. You tune your application for API, not the hardware. Here is where you have to optimize. Don't misplace those two things. OpenCL, which you appear to not understand, is brand agnostic API. It can be used on every hardware capable of running compute algorithms, even on mobile GPUs in smartphones. What matters however in it is this: how powerful in compute is your hardware.

P.S. One of recent supercomputers left down CUDA in favor of OpenCL and AMD GPUs. You can find information about it over the internet. Thanks to FirePro S9300X2, and AMD HIP(Boltzmann initiative) with CUDA to OpenCL compiler.
 
A conspiracy is when people work together in secret. This is nothing like that. It's a standard business practice of all companies who have a tiered product line up.
http://www.bitsandchips.it/english/52-english-news/7196-how-to-slow-down-rx480-sales
It’s simple. You have to start spreading doubts (NVIDIA has the best marketing department!): how much fast is the GTX1060? Will it be better than RX480? Will it be cheaper than RX480? Etc. Thanks to this strategy, a lot of users will not buy the AMD cards.

According to our sources, NVIDIA has limited 16nm productive slots at TSMC, so it has to choose the GPU to produce carefully. Intel is attacking HPC market thanks to Xeons Phi, AMD is attacking consumer market thanks to Polaris 10.

Here there are some numbers (NOT official):

  • At the present moment TSMC's output is about 60k wafers/month @16nm (80k during 4Q16). The main customers are: Apple, NVIDIA, Huawei/HiSilicon, Qualcomm, LG, MediaTek and Xilinx (and others). 60/7 = about 9k wafers/month each company
  • GloFo's output (14nm) is about 40k wafers/months (60k during 4Q16). Main customers are: AMD and Qualcomm = 20k wafers/month each company
To add to all this misery of production, TSMC will produce: A9, A9X, A10, A10X for Apple.
 
Right now supply is being purposefully controlled by Nvidia distribution to keep profit margins high. The retailer I have known for many years told me himself how the supply is being controlled and how much profit margin there is.


Risky, and that's why I chose 1070, but please let's make sure we see solid evidence of these board failures. After Britain, a whole country, ****ed itself because of Internet stories I think we should now be properly awakened to the damage of this so called misinformation superhighway before Idiocracy becomes full scale reality.

What are you talking about Britain ****ed itself ?
 
What you describe is true for gaming scenarios, where optimization is required for specific architecture. Compute is just that: compute. Those are mathematical algorithms. Only optimization is done on software, be it CUDA or OpenCL. You tune your application for API, not the hardware. Here is where you have to optimize. Don't misplace those two things. OpenCL, which you appear to not understand, is brand agnostic API. It can be used on every hardware capable of running compute algorithms, even on mobile GPUs in smartphones. What matters however in it is this: how powerful in compute is your hardware.

P.S. One of recent supercomputers left down CUDA in favor of OpenCL and AMD GPUs. You can find information about it over the internet. Thanks to FirePro S9300X2, and AMD HIP(Boltzmann initiative) with CUDA to OpenCL compiler.

Here's a short list of things that can greatly affect compute performance.
  • Grid dimensions
  • Memory access patterns
  • Ratio of math to memory accesses
If you think you can just write one OpenCL program and achieve maximum performance on every GPU out there, you are sorely mistaken. For example, AMD has a whole chapter on optimizing OpenCL for GCN:

http://developer.amd.com/tools-and-...processing-app-sdk/opencl-optimization-guide/

Why would they need a guide like this if all OpenCL programs magically ran at peak performance? An OpenCL kernel that runs well on your phone will generally be radically different to one that runs well on a high-end desktop GPU. For example, if you don't tune things so that you get full occupancy on the GPU, then you can end up running orders of magnitude slower than you expect.

Edit: Here's a really specific example. Let's say your GPU has 2MB of L2 cache. You tune your algorithm to fit perfectly into that L2 cache, so that you avoid thrashing. Now you try and run your OpenCL kernel on a GPU that only has 512KB of L2 cache. Instead of running at peak performance, the GPU spends a ton of time thrashing the L2 cache and the memory accesses are now a major bottleneck.
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.