Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Status
Not open for further replies.
From first data, that I have got access to for GV100 chip, it appears that FP32 performance is clock-for-clock, core-for-core 7% higher than it was for GP100.

So there you might have a proof that without increasing the core throughput, but balancing the scheduling you might get higher performance.

Interesting times for consumers. Nvidia has to use GV100 arch. Layout for consumers... It may actually have more than 50% more performance per clock, per core, than consumer Pascal had...
 
From first data, that I have got access to for GV100 chip, it appears that FP32 performance is clock-for-clock, core-for-core 7% higher than it was for GP100.
Why do you continue to focus on "clock-for-clock"? (Other than maybe that's what you read in ATI press releases.)

And "theoretical TFLOPs" are nearly as useless. If the GPUs internal datapaths and schedulers can't hit the theoretical numbers, it's just a fraud.

For example, say that I have a 1 GHz clock on a GPU that delivers in real benchmarks 5 FP32 TFLOPs.

And, for example, say that I have another GPU that has a 2 GHz clock that delivers (in the same real benchmarks) 7 TFLOPs.

Will you say that the 7 TFLOP GPU is inferior to the 5 TFLOP GPU because it's not so good on the ridiculous "clock for clock" metric?

I think that most people would prefer the 7 TFLOP GPU over the 5 TFLOP GPU. (Especially if the 5 TFLOP GPU needs 100 more watts for poorer performance - looking at you Vega.)
 
If you look at the on paper spec for Volta, if you take the Tensor cores away (which won't be in consumer GPUs) it's barely got more TFLOPs than Vega.

Yes TFLOPs -> game performance are two different things, but in terms of getting compute done, Vega looks to compare pretty well to Volta to be honest. It's certainly much, much cheaper.
 
Maybe everyone should stop looking at spec sheets for a minute and look at what the software everyone is using is actually going to use. Some will use cuda some won't, some are open cl or open gl. Fp32 or fp62.

Spec sheets won't tell us what the software wants.
 
If you look at the on paper spec for Volta, if you take the Tensor cores away (which won't be in consumer GPUs) it's barely got more TFLOPs than Vega.

Yes TFLOPs -> game performance are two different things, but in terms of getting compute done, Vega looks to compare pretty well to Volta to be honest. It's certainly much, much cheaper.

Vega 64 has more theoretical TFLOPs than a GTX 1080 Ti (12.7 vs 11.3), and yet can barely compete with a GTX 1080 (9) in real applications. So, that 41% increase in TFLOPs that Vega 64 has over 1080 means basically nothing in real apps. Volta has 15, which is considerably more than the Vega 64 (15 vs 12.7 or +18%). However, NVIDIA's architecture is generally way better at converting this theoretical performance into real-world performance (as seen by how badly the 1080 Ti beats Vega in games).
 
Vega 64 has more theoretical TFLOPs than a GTX 1080 Ti (12.7 vs 11.3), and yet can barely compete with a GTX 1080 (9) in real applications. So, that 41% increase in TFLOPs that Vega 64 has over 1080 means basically nothing in real apps. Volta has 15, which is considerably more than the Vega 64 (15 vs 12.7 or +18%). However, NVIDIA's architecture is generally way better at converting this theoretical performance into real-world performance (as seen by how badly the 1080 Ti beats Vega in games).

And it smashes the Geforce Titan XP in other applications...

Games are a bit of an anomaly due to Nvidia gameworks, in real world compute, Vega hits Nvidia pretty hard.

You talk about "real apps" and then use games for reference...
 
I have been doing a lot of research lately on videos cards. As I plan on buying a new one soon. I had a hard time finding benchmarks that did not relate to gaming as for my mac pro it is not a gaming machine so those benchmarks weren't relevant.

To the rescue though was Puget Systems however. They have done Nvidia GPU benchmarking for a few of the more common editing apps, cad software, and various compute programs.

When looking at the cards I was interested in which on their lists were mostly pascals. It was interesting to see that while there were differences in performance between lower cards and powerful cards in the apps I use, the performance difference was not massive.

Generally the difference in time taken to do a task was seconds between cards, in some cases up to half a minute between the lowest card and the highest. But Generally between cards the performance differences were less then 10 seconds.

There were benefits in some cases to having the more powerful cards in cases where you need the extra VRAM say for high resolution playbacks or multiple high res monitors. But still the difference was seconds not minutes.
[doublepost=1504856834][/doublepost]
And it smashes the Geforce Titan XP in other applications...

Games are a bit of an anomaly due to Nvidia gameworks, in real world compute, Vega hits Nvidia pretty hard.

You talk about "real apps" and then use games for reference...

What other applications? I think it would be beneficial to the conversation if what apps AMD were better at were mentioned.

I don't think this needs to be an AMD/Nvidia war as going off what I see most people saying that is really a war of the wallets rather then a war of performance.

I think by saying what apps one brands cards perform better in over another will allow people to make good decisions based on their needs.
 
Last edited:
Vega beats the titan XP in...

Solid works (by 50%)
Catia (by 28%)
If you set AA to CMAA instead of TSAA it is competitive with the 1080 TI in Dirt 4 also (this is one of those game anomalies ... you need to be careful what settings are used comparing games because some of the tech in them heavily favours team red or team green due to drivers)

Anything that can use FP16 extensively (like deep learning applications) Vega stomps the 1080/TI or Titan XP (as it gets 2x FP16 performance rather than 1/16th (from memory) like the Nvidia cards).

Although it doesn't have application certified drivers like the Quadros, it also beats Quadros costing 2-3x the price in the above applications as well.

Vega driver support is very new.
 
Vega beats the titan XP in...

Solid works (by 50%)
Catia (by 28%)
If you set AA to CMAA instead of TSAA it is competitive with the 1080 TI in Dirt 4 also.
Anything that can use FP16 extensively (like deep learning applications) Vega stomps the 1080/TI or Titan XP

Vega driver support is very new.

I will not deny this as why would a 1080ti or a titan xp both not exactly deep learning compute cards (though the titan does lean more that way) be as good. Just because a card is high performance doesn't make it good at thinking.

I personally would be willing to question the choice of geforce cards in those applications.
 
Nvidia artificially gimp the geforce cards so they can charge more for quadros, which are essentially a Geforce without the driver crippling it.

The fact they released a driver update to unlock 300% improvement in performance for the Titan XP (in Maya from memory - immediately after Vega release) just reinforces this. The Nvidia geforce hardware is capable of performing a lot better than Nvidia allows it to. Purely due to marketing.

The titan XP could and should be deep learning cards if Nvidia didn't artificially cripple them via firmware/driver to force those markets to buy Quadros.
 
Nvidia artificially gimp the geforce cards so they can charge more for quadros, which are essentially a Geforce without the driver crippling it.

The fact they released a driver update to unlock 300% improvement in performance for the Titan XP (in Maya from memory - immediately after Vega release) just reinforces this. The Nvidia geforce hardware is capable of performing a lot better than Nvidia allows it to. Purely due to marketing.

The titan XP could and should be deep learning cards if Nvidia didn't artificially cripple them via firmware/driver to force those markets to buy Quadros.

For all intents and purposes the titan xp is the inbred child of a geforce father and a quadro mother. What it lacks in somethings it gains in others.
[doublepost=1504858680][/doublepost]Also we tend to forget just what kind of outfits buy these cards. There are places out there that will spend stupidly crazy high amounts of money just for peace of mind that something will work, even if it doesn't perform as better.
 
Why do you continue to focus on "clock-for-clock"? (Other than maybe that's what you read in ATI press releases.)

And "theoretical TFLOPs" are nearly as useless. If the GPUs internal datapaths and schedulers can't hit the theoretical numbers, it's just a fraud.

For example, say that I have a 1 GHz clock on a GPU that delivers in real benchmarks 5 FP32 TFLOPs.

And, for example, say that I have another GPU that has a 2 GHz clock that delivers (in the same real benchmarks) 7 TFLOPs.

Will you say that the 7 TFLOP GPU is inferior to the 5 TFLOP GPU because it's not so good on the ridiculous "clock for clock" metric?

I think that most people would prefer the 7 TFLOP GPU over the 5 TFLOP GPU. (Especially if the 5 TFLOP GPU needs 100 more watts for poorer performance - looking at you Vega.)
It appears to me that you have zero clue what clock-for-clock, core-for-core means...


It basically means that in FP32, 3840 CUDA core, 1.1 GHz GP100 chip is 38% faster than 3840 1,1 GHz GP102 chip.
It basically means that normalized performance for 3840 CUDA core, 1.1 GHz GV100 chip is 7% faster than GP100 chip.

Core-for-core, clock-for-clock has zero to do with TFLOPs numbers. Im dumbfounded that you had no idea about this. This is what defines: "IPC" performance.

Even if GPU has 7 TFLOPs it still can be slower, because what defines performance of a GPU are two things: Core throughput, and software performance.

Code for Kepler architecture was able to extract 70% of its maximum potential. Code for Maxwell/Consumer Pascal - 75-80%. The architectures were pretty straight forward, and low-IPC, High clock speeds.

For example code for GCN1 extracted 30% out of its maximum potential. Code for GCN4, even if the architectures from Compute perspective have not changed a bit: around 60%.

In the case of comparisons of GP100 and GV for IPC, for both architectures code is the same, and even knowing this, and remembering that it still should be slightly tweaked for GV100, because of different scheduling, and core layout it still maintained 7% higher IPC performance than GP100 is very meaningful, and very telling.

In essence: GPU with that 5 TFLOPs can still be faster than that 7 TFLOPs GPU. And this is what happened in the past: GTX 980 was faster than GTX 780 Ti in both Compute and games, because of new, higher core throughput layout.
And it smashes the Geforce Titan XP in other applications...

Games are a bit of an anomaly due to Nvidia gameworks, in real world compute, Vega hits Nvidia pretty hard.

You talk about "real apps" and then use games for reference...
This forum, if you will hang on for quite a while, will tell you that only way you should judge GPUs usefulness is by looking at game benchmarks. Not compute, not editing, not anything meaningful - just games. Thats how professionals judge GPUs these days.

I suppose we are on Pro-Gamer forum. ;)
 
This forum, if you will hang on for quite a while, will tell you that only way you should judge GPUs usefulness is by looking at game benchmarks. Not compute, not editing, not anything meaningful - just games. Thats how professionals judge GPUs these days.

I suppose we are on Pro-Gamer forum. ;)

Haha I would laugh but its kind of sad how right this statement is as a description.
[doublepost=1504865621][/doublepost]
I would add one more thing as a statement that describes this forum: Its better to use slower hardware, for which software is optimized, rather than to optimize the software for faster hardware.

Unfortunately we don't have much say on software development. It's like my post further up this page regarding my recent experience researching information about gpu performance in various software.
 
We agree.

And I'm talking about benchmarked performance, not theoretical TFLOPs.
Benchmarked performance has zero to do with hardware, but is all about software. It is beyond me how you are unable to see this.
 
Benchmarked performance has zero to do with hardware, but is all about software. It is beyond me how you are unable to see this.

Oh, so the only reason a GTX 1030 can't beat a Vega 64 is that NVIDIA has crippled its performance in the drivers? Wow.
 
  • Like
Reactions: AidenShaw
[snip wall of charts]

This is why my "clock-for-clock, core for core comparison" is meaningful. Its a way of measuring IPC of each GPU. TFLOPs numbers are meaningless in this context. Its always core throughput, and software maturity.
It's still not useful if you are looking for the fastest card for your important tasks.

You look at benchmarks - actual delivered performance - for that.

If the 1 GHz card runs your job in 100 seconds, and the 2 GHz card runs it in 75 seconds - you'll probably want the 2 GHz card unless its price and/or power consumption make it impractical.

I doubt that anyone would buy the 1 GHz card just because it has better performance per clock (PPC). They might buy it because it's cheaper, or because it fits their power envelope - but they're not going to buy a slower card because of abstract PPC specs.

That's why I think that PPC is a rather useless metric.

And I say "PPC" because that's what you are talking about. If a GPU has instructions that do more (short vector arithmetic, tensor cores, ...) per instruction, then the actual IPC might be much lower than the PPC. (A GPU executing fewer, but more powerful, instructions per clock might outperform a GPU executing more, but simpler, instructions per clock.)
 
[MOD NOTE]
Closing the thread down, as its devolved into bickering and arguing.
 
Status
Not open for further replies.
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.