Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
That's fine, keep ignoring my point.

If the game is limited by nothing but raw shader math horsepower, then TFLOPs will give an accurate representation of how well a given GPU will run the game. The game runs faster on GTX 1060 than raw TFLOPs suggests. My conclusion is that the game is not limited by raw TFLOPs. Your conclusion is that I'm an idiot and NVIDIA and AMD are artificially limiting the performance in this game in a grand conspiracy to make the GTX 1060 look better than it actually is. I wonder which is more likely to be correct?

Is stuff like the delta memory compression taken into consideration in your raw TFLOPs measurement?

http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/8

I don't think it is, and it's one of the major new things in the Pascal architecture.

Oh look, there's a better implementation of asynchronous compute as well:

http://www.anandtech.com/show/10325/the-nvidia-geforce-gtx-1080-and-1070-founders-edition-review/9

Let's just ignore all of the non-shader related improvements in Pascal, and the real benchmark data, and conclude that I don't know what I'm talking about.

Again, my fundamental point relates to the comparison between the 1060 and the RX 480. On paper, the RX 480 should destroy the 1060, but in nearly all game benchmarks the 1060 wins (and often wins by a significant margin). How do you explain this?
Delta Memory compression can help, in Memory constrained environment. But GTX 980 Ti has the same amount of it as GTX 1060 and has much higher memory bandwidth. So it will not help here.

Asynchronous Compute would help. But what we have seen to this day is that GTX 1060 is loosing performance with Asynchronous Compute turned on in games using DX12. Secondly, for Pascal GPUs the difference is within 1-2% of performance when this setting is on.

And again. The difference in performance of both GPUs is 4.4 vs 5.9 TFLOps.

I do not know why you are putting here RX 480.
 
Delta Memory compression can help, in Memory constrained environment. But GTX 980 Ti has the same amount of it as GTX 1060 and has much higher memory bandwidth. So it will not help here.

Asynchronous Compute would help. But what we have seen to this day is that GTX 1060 is loosing performance with Asynchronous Compute turned on in games using DX12. Secondly, for Pascal GPUs the difference is within 1-2% of performance.

And again. The difference in performance of both GPUs is 4.4 vs 5.9 TFLOps.

I do not know why you are putting here RX 480.

What? Pascal has a whole new generation of memory compression, it's not the same as Maxwell.

PascalEdDay_FINAL_NDA_1463156837-012.png


Let's circle back to the original point. 980 Ti beats 1060 in raw TFLOPs by a lot. 1060 beats the 980 Ti in a real game. Why do you think this result has anything to do with the raw TFLOPs score? Why is the simple answer of "the game is not limited by raw TFLOPs" so hard for you to accept?

The reason I keep bringing up the RX 480 is because it also "beats" the 1060 in terms of raw TFLOPs by something like 30%. How do you explain results like this, then?

83276.png


Is AMD also limiting the performance of their new generation of cards to make the 1060 look better than it actually is? I find that very hard to believe, but that's a logical extension of the argument you're making.
 
  • Like
Reactions: AidenShaw
What? Pascal has a whole new generation of memory compression, it's not the same as Maxwell.

PascalEdDay_FINAL_NDA_1463156837-012.png


Let's circle back to the original point. 980 Ti beats 1060 in raw TFLOPs by a lot. 1060 beats the 980 Ti in a real game. Why do you think this result has anything to do with the raw TFLOPs score? Why is the simple answer of "the game is not limited by raw TFLOPs" so hard for you to accept?

The reason I keep bringing up the RX 480 is because it also "beats" the 1060 in terms of raw TFLOPs by something like 30%. How do you explain results like this, then?

83276.png


Is AMD also limiting the performance of their new generation of cards to make the 1060 look better than it actually is? I find that very hard to believe, but that's a logical extension of the argument you're making.
Again, Memory Compression will help you in memory constrained environments. Like for example 4K resolution, or if your GPUs have small amount of Memory like for example GTX 1060 3 GB. It will not help you in situation where you are not memory constrained. Like in this particular case.

Maybe I have not put this to you enough simple so you could understand it: There is nothing in Pascal GPUs that would make it faster clock for clock from Maxwell architecture. The only thing that can make GTX 980 Ti slower in any scenario is gimping its performance through Drivers. The end.
 
Asynchronous Compute would help. But what we have seen to this day is that GTX 1060 is loosing performance with Asynchronous Compute turned on in games using DX12. Secondly, for Pascal GPUs the difference is within 1-2% of performance when this setting is on.

Another piece of evidence to refute this is the 3dmark time spy benchmark that specifically tests for async compute performance. Pascal based GPUs saw an increase of 5-7% when async compute was enabled, vs no performance difference in maxwell based GPUs.

Maybe I have not put this to you enough simple so you could understand it: There is nothing in Pascal GPUs that would make it faster clock for clock from Maxwell architecture. The only thing that can make GTX 980 Ti slower in any scenario is gimping its performance through Drivers. The end.

Like I said above, you could be using those resources more efficiently, which is exactly what Pascal does in DX12.
 
Again, Memory Compression will help you in memory constrained environments. Like for example 4K resolution, or if your GPUs have small amount of Memory like for example GTX 1060 3 GB. It will not help you in situation where you are not memory constrained. Like in this particular case.

Maybe I have not put this to you enough simple so you could understand it: There is nothing in Pascal GPUs that would make it faster clock for clock from Maxwell architecture. The only thing that can make GTX 980 Ti slower in any scenario is gimping its performance through Drivers. The end.

Okay I'm done, putting you on ignore with SCSC. I'm just really tired of discussing these things with rabid AMD fanboys. We've pointed out several things that have been proven to make Pascal perform better than Maxwell, which you then simply ignore. Occam's Razor suggests the simple explanation of "the performance must be affected by something other than raw TFLOPs" is very likely what's going on, but sure, keep believing that NVIDIA and AMD are artificially limiting performance in this and other games.
 
  • Like
Reactions: TheStork
Okay I'm done, putting you on ignore with SCSC. I'm just really tired of discussing these things with rabid AMD fanboys. We've pointed out several things that have been proven to make Pascal perform better than Maxwell, which you then simply ignore. Occam's Razor suggests the simple explanation of "the performance must be affected by something other than raw TFLOPs" is very likely what's going on, but sure, keep believing that NVIDIA and AMD are artificially limiting performance in this and other games.
Somebody should have warned you. Sorry about that.
 
Another piece of evidence to refute this is the 3dmark time spy benchmark that specifically tests for async compute performance. Pascal based GPUs saw an increase of 5-7% when async compute was enabled, vs no performance difference in maxwell based GPUs.

Like I said above, you could be using those resources more efficiently, which is exactly what Pascal does in DX12.
It doesn't. Its exactly the same architecture, with only addition of Dynamic Scheduling, which does not help anywhere else than Asynchronous Compute. In raw performance numbers, again, in DX12 GTX 980 Ti should be faster than GTX 1060. There is nothing in GTX 1060 that would make it otherwise.
Okay I'm done, putting you on ignore with SCSC. I'm just really tired of discussing these things with rabid AMD fanboys. We've pointed out several things that have been proven to make Pascal perform better than Maxwell, which you then simply ignore. Occam's Razor suggests the simple explanation of "the performance must be affected by something other than raw TFLOPs" is very likely what's going on, but sure, keep believing that NVIDIA and AMD are artificially limiting performance in this and other games.
That other thing are Nvidia drivers. You are free to believe otherwise.

So far the only one person trying to spin this in Nvidia vs. AMD is you. Goodbye, anyway.

The only thing apart from Nvidia Drivers which would make GTX 980 Ti perform so badly is the fact that Nvidia Maxwell GPU performance tanks in DX12.
 
Last edited:
It doesn't. Its exactly the same architecture, with only addition of Dynamic Scheduling, which does not help anywhere else than Asynchronous Compute. In raw performance numbers, again, in DX12 GTX 980 Ti should be faster than GTX 1060. There is nothing in GTX 1060 that would make it otherwise.
That other thing are Nvidia drivers. You are free to believe otherwise.

So far the only one person trying to spin this in Nvidia vs. AMD is you. Goodbye, anyway.

Maybe go and read the page about the 4th generation memory compression stuff that was added in Pascal that I posted a link to? I'm pretty sure that counts as something new that will help Pascal GPUs perform better than Maxwell, even in DX12. And think about why AMD's RX 480 has something like 30% more raw horsepower, yet still loses to the 1060. This is why TFLOPs is not the right metric for comparing GPUs, and why game benchmarks matter. If everything was simply TFLOPs limited, then all the tech review websites would have to do is list the TFLOPs number and call it a day, because it would perfectly predict performance in every application out there. Thing is, it doesn't predict real-world performance because not every application is raw TFLOPs horsepower limited, which is why those websites actually run games and post their results.
 
Maybe go and read the page about the 4th generation memory compression stuff that was added in Pascal that I posted a link to? I'm pretty sure that counts as something new that will help Pascal GPUs perform better than Maxwell, even in DX12. And think about why AMD's RX 480 has something like 30% more raw horsepower, yet still loses to the 1060. This is why TFLOPs is not the right metric for comparing GPUs, and why game benchmarks matter. If everything was simply TFLOPs limited, then all the tech review websites would have to do is list the TFLOPs number and call it a day, because it would perfectly predict performance in every application out there. Thing is, it doesn't predict real-world performance because not every application is raw TFLOPs horsepower limited, which is why those websites actually run games and post their results.
It helps in memory constrained situation. Not when you have enough memory. There is no difference between 4 GB texture file on GTX 980 Ti and 2.3 GB on GTX 1060. Thats how it works. You would have tanking performance if it would exceed the boudaries of physical size, like we see with GTX 1060 3 GB in 4K in Gears of War 4(0.1 FPS) vs GTX 1060 6 GB(27.3 FPS) in the same resolution. How fast here is GTX 980 Ti? 30.6 FPS. 10% difference between GPUs that have over 30% difference in overall performance.

Only thing apart from drivers that would make GTX 980 Ti behave like this is the Asynchronous Compute feature constantly on in the engine of the game. And we know how it affects performance of the Maxwell GPUs.
 
It helps in memory constrained situation. Not when you have enough memory. There is no difference between 4 GB texture file on GTX 980 Ti and 2.3 GB on GTX 1060. Thats how it works. You would have tanking performance if it would exceed the boudaries of physical size, like we see with GTX 1060 3 GB in 4K in Gears of War 4(0.1 FPS) vs GTX 1060 6 GB(27.3 FPS) in the same resolution. How fast here is GTX 980 Ti? 30.6 FPS. 10% difference between GPUs that have over 30% difference in overall performance.

Only thing apart from drivers that would make GTX 980 Ti behave like this is the Asynchronous Compute feature constantly on in the engine of the game. And we know how it affects performance of the Maxwell GPUs.

That's not how NVIDIA's memory compression stuff works. If you're rendering to a 4K surface, Pascal can do 4:1 and 8:1 compression (i.e. reducing bandwidth by a factor of 8x). Maxwell is limited to 2:1, so Pascal can get 4x better throughput in ideal cases. It doesn't magically make your 4GB texture take up less space in video memory, it's just that the contents may be stored in a compressed format that requires significantly less memory bandwidth to read and write.

PascalEdDay_FINAL_NDA_1463156837-008_575px.png


So, in the case in the bottom right, the GPU only has to read 8 pixels in order to get the full 8x8 pixel grid's worth of data. This can obviously (or maybe it's not so obvious?) have a massive impact in real-world performance in games like Gears of War.
 
That's not how NVIDIA's memory compression stuff works. If you're rendering to a 4K surface, Pascal can do 4:1 and 8:1 compression (i.e. reducing bandwidth by a factor of 8x). Maxwell is limited to 2:1, so Pascal can get 4x better throughput in ideal cases. It doesn't magically make your 4GB texture take up less space in video memory, it's just that the contents may be stored in a compressed format that requires significantly less memory bandwidth to read and write.

PascalEdDay_FINAL_NDA_1463156837-008_575px.png


So, in the case in the bottom right, the GPU only has to read 8 pixels in order to get the full 8x8 pixel grid's worth of data. This can obviously (or maybe it's not so obvious?) have a massive impact in real-world performance in games like Gears of War.
That does not matter. What does matter is how it affects performance of the GPU. And overall size of memory consumed from physical memory Pool. Memory Compression will not help you in situations where you are not constrained by memory. And GTX 980 Ti is not constrained by the amount of GPU memory it has. It will not be for few upcoming months.

In the benchmarks from Gears of War 4, in 4K GTX 1060 3 GB is constrained by memory size, therefore it scores 0.1 FPS in this resolution. GTX 1060 - is not. Neither is GTX 980 Ti.
 
That does not matter. What does matter is how it affects performance of the GPU. And overall size of memory consumed from physical memory Pool. Memory Compression will not help you in situations where you are not constrained by memory. And GTX 980 Ti is not constrained by the amount of GPU memory it has. It will not be for few upcoming months.

In the benchmarks from Gears of War 4, in 4K GTX 1060 3 GB is constrained by memory size, therefore it scores 0.1 FPS in this resolution. GTX 1060 - is not. Neither is GTX 980 Ti.

Okay, two points:

- Raw TFLOPs will not help you when you are not constrained by shader horsepower. Thus, just because GPU A has more TFLOPs than GPU B doesn't mean it will run any faster for certain applications. Right?

- Again, this has nothing to do with memory footprint, though that can be an issue. If the working set of a game does not fit in video memory, then as you point out, the game will run very poorly. Pascal's memory compression has nothing to do with the overall working set size. It has everything to do with the GPU reading from and writing to resources in that working set, which is generally a very important aspect of a modern game engine.

Let's take a concrete example. Let's say you have a post-processing pass in your game engine that is ROP limited. Maxwell has 2:1 memory compression, and so for a given 8x8 or 64-pixel tile, it can write out only 32 pixels (e.g. 128 bytes for RGBA8). Pascal can get up to 8:1 memory compression, which means for a given 8x8 or 64-pixel tile, it can write out only 8 pixels (e.g. 32 bytes for RGBA8). So, Pascal is capable of writing out 4 times less data for the same shader pass, which might translate into an overall 4x speedup for the given shader execution. If you apply this kind of speedup to a bunch of different ROP-limited shaders, you might see a 10, 20, or even 30% gain across the whole frame.

Oh hey, this kind of lines up with the real-world benchmark data that you were posting, where Pascal is performing better than the raw TFLOPs suggests it should. That indicates that the game in question is not simply limited by raw TFLOPS, and many other things (including a reduction in the amount of memory that needs to be read and written via Pascal's 4th generation compression) can improve performance.
 
Would it kill you to write one or two sentences that give us a little hint as to why we might want to look at that link?

I don't follow blind links (and even if the full URL gives a good hint as to the content, the board often trims the URL and obscures that).

For example, ".../editorial-nvidia-geforce-20-series" is uninteresting.

And "...rx480-in-tomb-raiders-2017" is uninteresting.

But ".../pascal-1060-destroys-rx480-in-tomb-raiders-2017" would probably get a click.

Seriously, koyoot, short descriptions of what's behind the links would help you a lot.
 
  • Like
Reactions: koyoot
When even wccftech (in my opinion little more than a rumor copy and paste site) say's it's pure, unadulterated speculation, you got to believe it is! Or do they only say that about other site's pure, unadulterated speculation?
We will see if this is speculation. Already in industry rumors are floating about Pascal refresh. Why would Nvidia do this? Well, in this industry nothing happens without a reason. AMD Vega architecture is coming, and extremely possible Polaris refresh on Samsung process(the same that is making the embedded versions of Polaris GPUs).

Nvidia has a brand to protect.
 
I have the sneaking suspicion Apple will not implement new Nvidia cards in any of the Mac's until all cMP's have gone obsolete. Same intention as not implementing Pascal drivers in the newest driver releases.

However, I can wait, installed a GTX Titan X, which serves me well in the near future. I hope this will not be the last card in my cMP.
 
Last edited:
Guys, lets cut to the chase. There will be no Nvidia GPUs on the Macs for foreseeable future. The end.
 
Guys, lets cut to the chase. There will be no Nvidia GPUs on the Macs for foreseeable future. The end.

True, but I don't think anyone is speculating about Apple including Pascal in Macs.

There are no Maxwell GPUs delivered in Macs either, yet there are Maxwell drivers, so it's not unreasonable for people to speculate about Pascal drivers.
 
True, but I don't think anyone is speculating about Apple including Pascal in Macs.

There are no Maxwell GPUs delivered in Macs either, yet there are Maxwell drivers, so it's not unreasonable for people to speculate about Pascal drivers.
If that thing can be called a Maxwell driver. In the PC world is they had such bad drivers there would be petitions, protests and class action lawsuits. They make enough noise if a game is released with simple bugs.
 
If that thing can be called a Maxwell driver. In the PC world is they had such bad drivers there would be petitions, protests and class action lawsuits. They make enough noise if a game is released with simple bugs.

https://en.wikipedia.org/wiki/Software_release_life_cycle#Beta

Beta phase generally begins when the software is feature complete but likely to contain a number of known or unknown bugs. Software in the beta phase will generally have many more bugs in it than completed software, as well as speed/performance issues and may still cause crashes or data loss.

NVIDIA has been pretty clear about the level of support for Maxwell GPUs on macOS. They explicitly say it's beta support, which means it's not perfect. Don't use beta drivers on gaming cards for professional work, e.g. Adobe apps, if you don't want to deal with any instability.
 
First reviews for GTX 1050 Ti. Very good performance and efficiency in gaming. I like this GPU very much.
 
First reviews for GTX 1050 Ti. Very good performance and efficiency in gaming. I like this GPU very much.

GP107 is a nice little chip. Maintains the same efficiency as other Pascal GPUs despite being manufactured by Samsung/Global Foundries.

I gotta admit I am starting to get worried about AMD. Nvidia continues to get more performance per watt and performance per die area. Hopefully Vega delivers and they can keep getting better gains out of modern APIs.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.