Hard to follow this. Can you explain, in plain terms, how you think the video is wired up and allocated?
The one I think most probable because TB controllers are sensitive to placement so will be juggling distance between GPU and TB controller (option C in original list but other variants should be obvious derivatives of the below) :
GPU1 output 1 --> TB controller 1 input 1
GPU1 output 2 --> TB controller 1 input 2
GPU1 output3 ---> HDMI
GPU2 output 1 ---> TB controller 2 input 1
GPU2 output 2 ---> TB controller 2 input 2
GPU2 output 3 ---> TB controller 3 input 1
GPU2 output 4 ---> TB controller 3 input 2
The groups are split above just as the edge connectors of the GPUs would be segmented on cards that ran output to edge connectors. The user manual could simply state to start using the top two TB ports for video TB and/or DisplayPort chains. HMDI is obviously just doing video output ( or digital audio in wierd corner cases.) If want to engage both GPU cards to split loads for two screens then start using TB ports vertically from the top.
The current TB controllers have a maximum of two DisplayPort v1.1 inputs. I'm not sure if the DisplayPort v1.2 is going to keep that the same or that the pass-through section will need more inputs to get multiples higher than two out on purely DP v1.2 traffic or that DP 1.2 instances just put more on wires already there (but there has to be a limit to that.)
Frankly if GPU2 is primarily used for GPGPU most of the time then those outputs aren't used. If folks only have 1-2 monitors that's is just fine. The DP inputs are there so pass TB compliance testing that controllers have do video out.
In the current macs with two GPUs can have something like
GPU1 output 1 --------|
GPU2 output 1 ----> DP switch --> TB input 1
GPU2 output 2 ----> DP switch ---> TB input 2
GPU1 output 2 --------|
where the switches are synchronized to be all on GPU1 or GPU2. that approach or a more complicated routing approach increases complexity for little gain. In a mobile case may want to switch because the GPUs are asymmetrc ( one much bigger power saver than the other. If far more closely matched, there isn't alot of upside). With more TB controllers the complexity goes up.
Making GPU2 push all of its frame buffer into GPU1 doesn't really make it any more easier. In fact it probably slightly increases trace lengths since can skew the TB controllers on the board slight to be closer to the one they hook to and pack all three relatively close to the physical ports. )
GPU1 output 1 --> TB 1 input 1
GPU1 output 2 --> TB 1 input 2
GPU1 output 3 --> TB 2 input 1
GPU1 output 4 --> TB 2 input 2
GPU1 output 5 --> TB 3 input 1
GPU1 output 6 --> TB 3 input 2
GPU1 output 7 --> HDMI
GPU1 is so loaded down now it has to have farmed out work to GPU2 and still be at high performance.