.....
But still unclear about bus and port. If each TB2 port can handle 20GB/s of data each way, does that mean the bandwidth of the bus is 40Gb/s each way since there are two TB2 ports per bus?
Only if double counting. TB controllers are switches. It is far more so bandwidth across the switch.
TB-dev1 < --- 20 Gb/s ---> TB-dev2 <---- 20 Gb/s ---> TB-dev3
If TB-dev1 wants to pass 20Gb/s of data to TB-dev3 then one port of TB-dev2 has to take in 20 Gb/s and the other port has to output 20 Gb/s. It is the same data coming in and going out. That is the max bandwidth of TB data that is being specified. The only way to "max blast " a TB port is to use another TB port.
At any one specific target the 'freeway' off ramps are x4 PCI-e and DisplayPort 1.2. If only offloading only one of those at the destination, you are going to be extremely hard pressed to get 20 Gb/s "off the freeway". The single off ramps are not as 'wide' as the freeway; nor should they be.
Lastly, I read somewhere that you can't or shouldn't connect two 4k displays on the same bus, so you have to remember to connect the displays to say, port #1, #3, and #5.
Actually #1 and #3 are on same TB bus.
http://support.apple.com/kb/HT5918
Apple's directive is not to hook more than 2 monitors to a single "TB bus". Pragmatically the 4K monitor is going to count as at least 2 ( since 4 x HD resoution dimensions). The DisplayPort bandwidth to each "TB bus" is likely limited. Three 4K monitors probably means something like #1 , #2 and either HDMI , #5 , or #6.
By "Thunderbolt Bus" I think Apple means the two type inputs ( PCI-e and DisplayPort there are multiple, but finite, DP signals coming in ) and TB controller in combination.
I don't understand what you mean: Displays don't use the data of the TB network, so daisy chaining an external storage device off a 4k display doesn't take away any GB/s for the external storage device?
When you hook a DisplayPort device to a host computer the TB controller is effectively bypassed. The "backward compatibility" is more pragmatically a "by-pass" mode. The DisplayPort signal is never encoded into TB protocol and never put on the TB network. It is put directly on a DisplayPort network. On the port with the DisplayPort device plugged in there is no TB traffic at all. Hence TB bandwidth consumption is zero. DP bandwidth is non zero, but that isn't TB bandwidth. The bandwidth maximums on that "by-pass" to DP port are going to be DP's maximums not Thunderbolt's.
If you daisy chain so that the DisplayPort device is on some external , downstream device than the DP traffic is encoded into TB traffic and moved to that last TB device. At point it is decoded back into DP data. On that device's output port it is just DP data going out. But it came in as encoded TB data on the other port.
However, if start at the host there is zero transport done. No transport. No TB bandwidth consumption.
Similarly with the HDMI port. More than likely the DisplayPort signal is diverted before it ever gets to the TB controller. Pretty sure there is next to no good reason to even send it there at all.