For those who are interested in hardware, not fanboyism.
Nvidia finally released Nvidia Volta whitepaper.
https://images.nvidia.com/content/volta-architecture/pdf/Volta-Architecture-Whitepaper-v1.0.pdf
Two things are important.
Simultaneous Execution of FP32 and INT32 Operations Unlike Pascal GPUs, which could not execute FP32 and INT32 instructions simultaneously, the Volta GV100 SM includes separate FP32 and INT32 cores, allowing simultaneous execution of FP32 and INT32 operations at full throughput, while also increasing instruction issue throughput. Dependent instruction issue latency is also reduced for core FMA math operations, requiring only four clock cycles on Volta, compared to six cycles on Pascal.
Many applications have inner loops that perform pointer arithmetic (integer memory address calculations) combined with floating-point computations that will benefit from simultaneous execution of FP32 and INT32 instructions. Each iteration of a pipelined loop can update addresses (INT32 pointer arithmetic) and load data for the next iteration while simultaneously processing the current iteration in FP32.
This basically means, that Nvidia finally has 1:1 parity in compute throughput as GCN, and will not have to rely on software.
We have to look at the layout this way:
64 core/256 KB Register File Size, that has warp size of 32 KB Warp, that has 4 cycle cadence latency. This is first hardware layout from Nvidia that I am content of, and first time that I know will not require software to gain performance, because alone is enough capable. Also, separate FP32 and INT32 are meaningful for throughput, and latency. And last thing. Increased L1 cache size will reduce latency even more, and increase bandwidth, and resources available to the cores.
Volta has finally proper compute capabilities, just like GCN in sheer throughput. AMD will have huge problem to compete with Nvidia because for the first time, they have in compute, maybe not advantage in hardware, but have on par hardware, and their software is simply better.
Nvidia finally released Nvidia Volta whitepaper.
https://images.nvidia.com/content/volta-architecture/pdf/Volta-Architecture-Whitepaper-v1.0.pdf
Two things are important.
Simultaneous Execution of FP32 and INT32 Operations Unlike Pascal GPUs, which could not execute FP32 and INT32 instructions simultaneously, the Volta GV100 SM includes separate FP32 and INT32 cores, allowing simultaneous execution of FP32 and INT32 operations at full throughput, while also increasing instruction issue throughput. Dependent instruction issue latency is also reduced for core FMA math operations, requiring only four clock cycles on Volta, compared to six cycles on Pascal.
Many applications have inner loops that perform pointer arithmetic (integer memory address calculations) combined with floating-point computations that will benefit from simultaneous execution of FP32 and INT32 instructions. Each iteration of a pipelined loop can update addresses (INT32 pointer arithmetic) and load data for the next iteration while simultaneously processing the current iteration in FP32.
This basically means, that Nvidia finally has 1:1 parity in compute throughput as GCN, and will not have to rely on software.
We have to look at the layout this way:
64 core/256 KB Register File Size, that has warp size of 32 KB Warp, that has 4 cycle cadence latency. This is first hardware layout from Nvidia that I am content of, and first time that I know will not require software to gain performance, because alone is enough capable. Also, separate FP32 and INT32 are meaningful for throughput, and latency. And last thing. Increased L1 cache size will reduce latency even more, and increase bandwidth, and resources available to the cores.
Volta has finally proper compute capabilities, just like GCN in sheer throughput. AMD will have huge problem to compete with Nvidia because for the first time, they have in compute, maybe not advantage in hardware, but have on par hardware, and their software is simply better.
Last edited: