TB now is on chip with new CPUs so it's not going anywhere, i
It is on some 'ultrabook'/'Project Athena' laptop chips. it is unclear if even Intel is going to normalize that across most of the consumer line up (i.e., into the mainstream desktop ). It is quite unlikely to make it to the Xeon SP and W class CPUs. ( as there is no natural GPU to pair with ).
No, it is not only this. Embeeding into the laptop iGPUs also inherently (and more cost effectively) gets couples the output from the that GPU into Thunderbolt output.
TB has been discrete in part to be able to loop in discrete GPU output also. Embedding the controllers into the CPU die doesn't enable that. In fact, it gets in the way. ( Yet another indicator that this isn't going to be uniform across CPU product line ups. )
and sometimes the 4 lanes are split over ports
Thunderbolt is basically in part a switch so this "split' is pretty what switches do.
so i assume the next progresion will just be a move to x8 or x16 or more independent lanes
Errrr, technically no. that is quite unlikely.
"... has 16 PCIe 3.0 lanes for external use, although there are actually 32 in the design but 16 of these are tied up with Thunderbolt support. ..."
https://www.anandtech.com/show/14514/examining-intels-ice-lake-microarchitecture-and-sunny-cove
"... Each Ice Lake CPU can support up to four TB3 ports, with each TB3 port getting a full PCIe 3.0 x4 root complex link internally for full bandwidth. (For those keeping count, it means Ice Lake technically has 32 PCIe 3.0 lanes total). ..."
Pragmatically what Intel has done with Ice lake it create two TB controllers internal to the CPU. The only shift is that now have two x4 inputs to the individual controllers. This both increases substantially on the individual port bandwidth and also cuts back a bit on the switching. I wouldn't bet the farm on that happening with discrete TB controllers that will be also evolve later.
This is feed through the CPU package internal data bus so may not necessarily see a 4x increase in total aggretage data throughput if there is other major internal CPU traffic, but in bursts there should be a diffrence.
In most cases (e.g. mainstream laptop/desktop CPU packages ) the PCI-e input lane budget is going to be smaller than that. The iteration at x4 PCI-e v4 ( or more) might get 'stepped down' into the two x4's here in this embedded version.
Also more likely will see an extension of future discrete USB/TB controllers to do "all traffic going out" DisplayPort V2.0 "alternative" mode. Again this more so in context of being embedded integrated GPU context.
but at the same time on consumer CPU’s there’s so few lanes im not sure how more than x4 can happen.
see above.... at least where integrated. The key is avoiding more pin outs and trace running. In a narrow niche of "box with slots" than will see individual lane bandwidth bumps over time ( v4 then v5 ) that can be allowed. Again this limits pins and trace running.