PCI is actually kind of slow and will go away soon enough in favor of things like TB4. Same with RAM sticks, they are just too slow and will go away soon enough
Thunderbolt is *based* on PCIe, in the same way that 802.11 WiFi is a superset of Ethernet (aka IEEE 802.x standards for physical and data link layers).
TB
multiplexes signals from PCIe lanes and DisplayPort outputs, in order to send them all over a 19-wire TB cable, and then demultiplexes them at the other end.
Because of this mux / demux process and the physical limits of signals over a TB cable, the bandwidth of TB will
always be less than that of a full set of PCIe lines inside a computer.
Ideally, those PCIe lines are
direct lines to and from the CPU if it has an integrated northbridge, or more or less direct if passing through a motherboard chipset northbridge, so no mux/demux process, compression, packetization, etc.
Each PCIe device on the motherboard has multiple pairs of wires for sending and receiving, called
lanes. Therefore, a 16x (16 lane) slot has 32 wires connecting it to the CPU or northbridge.
If you have a high-end workstation motherboard with, say, eight full-length 16x slots, that means 256 wires. If they're running at PCIe 4.0 speeds, that means
each lane can carry 2 gigabytes per second, so in total, that's 512 GB/s of data transfer per second (presuming ideal scenario, no bottlenecks etc.)
Thunderbolt 4 has 40 gigabit/s bidirectional speed - so dividing by 8 to get bytes, that's only 5GB/s up and down; so TB4 really can't replace the bandwidth of actual PCIe lanes inside a computer.
Also, PCIe 5.0 is 2x the speed of 4.0, so that's 4GB/second per lane, and in our theoretical workstation, that's
1 terabyte of data that can be shuttled around the bus, every single second.
Regarding RAM, there's always tradeoffs - backwards compatibility and expandability vs. possibly only having a fixed amount of memory in the machine. The SOC approach packages memory closer to the CPU, and in fact the
kind of RAM in there is good old package-on-package DDR4, according to Apple's Jonathan Kang. It's not just the type, it's how it's deployed that counts.
"Unified memory" doesn't
just mean a single pool of RAM for system and GPU, it's
heterogeneous cache coherency, which means the system doesn't have to waste cycles reading and writing from off-CPU storage; it can do direct cache-to-cache transfers within that pool, never writing to disk, speeding up operations and requiring drastically less power.
In theory, that kind of memory management approach
could be used with traditional expandable memory (i.e. like a giant RAM disk) but at the cost of latency and higher power use.