Ok, I did not wanted to put it this way. DirectX11, because it is serial API it will always bottleneck AMD GPUs. What happens when we look at DirectX 12 benchmarks? Completely different story.
Lets compare 180W GPU with 180W GPU, that both have around 4 TFLOPs of compute power.
http://cdn.wccftech.com/wp-content/uploads/2016/03/Hitman-PC-DirectX-12-Benchmarks_3.jpg
Can you count the TFLOPs numbers? And the performance of the both GPUs?
Lets look now at 250W GPUs, both with the same core count, and similar TFLOPs performance.
http://cdn.wccftech.com/wp-content/uploads/2016/03/Hitman-PC-DirectX-12-Benchmarks_2.jpg Reference GPUs.
How come, 4 TFLOPs, 180W GPU has almost exactly the same level of performance of a GPU with 4 TFLOPs of compute power, and 180W of power draw?
How come 6 TFLOPs GPU 250W GPU has similar performance as 250W GPU with 6 TFLOPs of compute power?
Because the software is not bottlenecking it. You know what is even funnier? In latest DX11 games R9 290X is ALMOST on par with GTX 980 Ti. How Come? Because it has similar level of performance, but DX11 drivers are much better. The same thing goes for... 4 TFLOPs R9 380X and 4 TFLOPs GTX 980. Like here, for example:
http://pic.yupoo.com/ztwss/Ftoyshb7/medish.jpg
http://pic.yupoo.com/ztwss/FtoysJQA/medish.jpg
Because of mindshare, and people like MVC, AMD is not considered being comparable to Nvidia, even if their software caught up. Most of industry caught up with software. That was whole point of the beginning of my discussion with Zarniwoop.
But hey, MVC has contract with Nvidia. He will always threadcrap about how awful AMD is, and how Apple awful is for using their hardware.