IMO. It will still be similar setup to todays. 140W Intel CPU, plus dual 125-130W GPUs.
I was theory crafting what is possible with 16 nm FF+ based on what Nvidia was able to achieve on this process.
They pretty much got that 70% reduction in power consumption, or 60% higher frequency, at 50% smaller density.
3840 CC chip from Nvidia has 476 mm2 die size. Lets compare this to 3072 CC's and 600 mm2.
Same TDP on both chips - 250W, however the clocks are on average 55% higher on GP102 chips. All of what they achieved was due to process itself.
GTX 1080 - 1.9 GHz boost clocks, at 200W average power consumption with 2560 CC's.
Tesla P4 - same GP104 chip, with 1000 MHz core clock, and... 75W TDP. A third, compared to 2816 CC chip from 28 nm generation: (GTX 980 Ti).
Now lets take this into what AMD can have. There are three ways to achieve the rumored 12 TFLOPs of compute power performance level.
3072 GCN core chip with 2 GHz core clock.
4096 GCN core chip with 1.5 GHz core clock.
6144 GCN core chip with 1 GHz core clock.
AMD touts that upcoming Vega architecture will have 3 times higher perf/watt compared to previous generation GPUs. If they count it from FPS in games - no problem at all, new technology in graphics, plus much improved throughput o the GPU, and much higher clocks will tell whole story.
So lets use simply Fiji as a reference here. 4096 GCN core chip on 28 nm process consumed on average 246W, with 280W peaks under load. Scaling it to 16 nm FF+ gives us around 120W on average under load, with 140W, at... 1.25 GHz core clocks. Whole process scales for up to 70% reduction. So it can be 50% reduction in power and 20% higher frequency at the same time. Here, however, we have 1.5 GHz core clocks. So how it will fare? Knowing past of AMD it will consume much more. 50-60W alone more. So we are looking at 1.5 GHz, 4096 GCN core chip with 185W of TDP.
Its hard to predict how it will behave. If AMD created more balanced architecture, with much higher throughput, we can expect breakthrough for them, similar to what was Maxwell for Nvidia. I know you will say that will make them behind Nvidia. I will not debate it until I will see the end results, however, I am highly skeptical it will make them behind Nvidia. AMD already had Maxwell-like increase in perf/watt, from just process shrink. Same thing as Nvidia with Pascal. They reused the Maxwell arch on 16 nm, called it Pascal, but added finally dynamic scheduling.
Why do I talk about Graphics throughput? There is absolutely no point for GPUs to have higher than 1 GHz core clocks if their graphics throughput is not very robust. Maxwell for Nvidia created big leap in throughput, and they used very high clocks for 28 nm process, for any GPUs(1.2 GHz was not uncommon). Pascal only increased this throughput, with even higher core clocks.
Now its AMD's turn. Im not very optimistic about achieving their 3 times higher perf/watt, if we compare directly the power consumption to compute output. On graphics: depends what they will change in their Graphics IPv9 compared to Polaris, Tonga and Fiji. Here they can be much, much closer to this.
If I have to say my opinion about how it will perform, and about the specs: 4096 GCN core chip with 1.5 GHz core clock, with 16 GB of HBM2 and 1 TB/s, and 96 ROPs, can be 20% faster than Pascal Titan X, and directly compete with Volta/GP100 reused in mainstream core.
The last bit. Is this the smaller or bigger Vega chip? There will be quite a big performance gap between RX 480 and potentially RX 490/RX Fury.
And one last bit. For Mac Pro the best thing would be if it would pack coherent fabric that unifies all of the components of the computer, for it to achieve highest possible efficiency/performance from the parts. If Mac Pro will use Vega GPUs, they will be extremely heavily bottlenecked by CPUs. Same thing that happens with Pascal Titan X.
Edit: It turns out, that Videocardz, have quite nice compilation of Vega rumors.
http://videocardz.com/63700/exclusive-first-details-about-amd-vega10-and-vega20
Vega 10 - 225W TBP, most likely because of 8 pin connector only. So it can even fall in line with my personal expectations.
Vega 11 - smaller than 4096 GCN cores, bigger than 2304, so my sources were correct on this
.