nVidia hardware is so far behind AMD that it jumps ahead of it in all the gaming benchmarks?
Show me the benchmarks. So far the one benchmarks where Nvidia jumps are those of GameWorks titles, and High end, but it changes. Currently Fury X ties GTX980Ti in Techpowerup review suite in 1440p, and is faster in 11 from 15 games in 4K resolution than REFERENCE GTX 980 Ti. All of other brackets are dominated by AMD with also better pricing.
https://forums.macrumors.com/threads/2016-nmp.1952250/page-8#post-22545804 Read. All thread is extremely interesting. Also check Beyond3D and Guru3D forums in threads about Asynchronous Compute, DX12, etc.
All has be written, already.
MVC does not know the answer to my questions, so I will answer it. On hardware level on Nvidia GPUs you have MegaThread Engine, CUDA cores, cache, VRAM and thats pretty much it. All of "magic", "efficiency" on Nvidia hardware is thats to drivers, and software. There is absolutely no hardware scheduling. Nvidia got rid of it, because last time(Fermi) their hardware got it it became hot, inefficient and unreliable. Also it made everyone reliant on Nvidia optimization of the drivers, so they could control the life of their GPUs.
MegaThread Engine is for feeding the cores, but without drivers it has absolutely no clue what to do with application. Drivers are doing all the job of scheduling. Now lets think of world with low-level access to hardware. What happens where your hardware must manage itself? Its performance tanks. What happens when you have to manage it asynchronously and your hardware does not have anything on that side that useful? Performance tanks. Every current benchmark of DirectX 12 game shows that is the case on Nvidia side. And we have to remember that Ashes does not have compute done in the engine, yet. It will be in future. What will happen for Nvidia? Performance will tank. If anyone of you think that Pascal will change this - no chance. It is basically Maxwell with FP64, and deep learning.
AMD went other way. They have Hardware Schedulers(Only Fiji and Tonga so far), there are Asynchronous Compute Engines. Hardware will adapt itself to the task within the boarders of engine. That is the whole point of low-level access. All you have in drivers is device name, API driver, system drivers, and thats it. No optimization of applications, nothing. All of it is done on application level by developers. That is one of reasons why Apple went with AMD. Because they will not need to fiddle with drivers for each application. They will give API, and thats it. Developers will get to do everything from the ground up. To open it up AMD launched lately initiative. OpenGPU its called.
If anyone of you think it is wrong way find similar things about Vulkan, SPIR-V and OpenCL and Metal, OpenCL, OpenGL, and Swift. What is even funnier here is that Vulkan will include OpenCL in it.
Polaris will have modified core. With Scheduler that can handle both DirectX 12 and 11. Here is example of it:
http://forums.anandtech.com/showpost.php?p=38011479&postcount=171
(all of this means that Amount of cores of AMD GPUs will not exactly reflect past. So lower core amount will bring higher performance than it was before.)
Again, I encourage everyone to read this thread and pay attention to posts from Zlatan, who is game developer, and Mahigan. Also check the Beyond3D and Guru3D forums and threads about DX12, scheduling, Asynchronous Compute, etc. Eye opening. Link:
http://forums.anandtech.com/showthread.php?t=2462951&page=3