For the second paragraph: Nvidia will cut manufacturing costs, and we know that GDDR6 is coming next year, to GPUs. AMD is not going to use it, they are going to use HBM2 so only Nvidia is the obvious choice. And those are costing die size, and power consumption.That covers your first paragraph - I'd like to see links for the speculations in your second paragraph.
The third paragraph is clearly opinion, no explicit need for links - although it would be nice to explain how you were reasoning..
As for third paragraph. If there is no change from process for the design, and we are looking at bigger GPU, on the same process, as GV102, with similar tech of memory but slightly different, it will result in higher TDP, and similar theoretical performance. You have to bare in mind however, that most likely we are looking at 64 core/256 KB Register File size architecture layout so the throughput of the cores will be higher.
Compare performance between GP100 and GP102, to know how much higher performance you can expect from new layout. There are benchmarks in the wild comparing both architectures.
I have no idea where you see fanboyism in my post, but I guess, fanboyism is in the eye of beholder...Fine, lets try another way to estimate performance.
GP106 - 200 mm2 - 5120 GFLOP - 25.6 GFLOP/mm2
GP104 - 314 mm2 - 9216 GFLOP - 29.4 GFLOP/mm2
GP102 - 471 mm2 - 12150 GFLOP - 25.8 GFLOP/mm2
So without thinking about GP100 or GV100, lets see what we get with a potential GV102.
First, lets be conservative and say we keep the ~25 GFLOP/mm2 of GP102
GV102 - 600 mm2 - 25 GFLOP/mm2 - 15000 GFLOP
Or lets say that Nvidia does a little better and manages 27.5 GFLOP/mm2
GV102 - 600 mm2 - 27.5 GFLOP/mm2 - 16500 GFLOP
Nvidia is relatively predictable in what they do with their GPUs. Except it was very surprising to see them announce a gigantic 815 mm2 GPU. Don't forget that Nvidia could go bigger with GV102, since obviously GV100 is huge.
Math, not fanboyism, is the best way to project their next generation GPUs.
So how do you think Nvidia will be able to achieve 16 TFLOPs, on the same node, with smaller die size, that has more memory controllers which cut into die size even more?
815 mm2 size GPU, with just 4 memory controllers, that are smaller than GDDR6, has ONLY 5376 CUDA cores.
How many CUDA cores will you be able to fit in 12 nm(16 nm) 600 mm2 die size GPU, with 12, 32 bit memory controllers, that give 384 bit memory bus, even accounting for higher density?
Let me give you an example, if you want maths. To get to 16500 TFLOPs, you need 5120 CUDA cores, clocked at 1.6 GHz. Doable, knowing those requirements?
That is why don't expect more than 13-14 TFLOPs out of 600 mm2 GPU on 12 nm in 250W TDP, and not more than 13 TFLOPs out of GPU on 16 nm FF+ in 300W TDP.