A. The iMac Thunderbolt bus is effectively akin to one TB3 connection. It just shares it across two physical USB-C style TB3 ports. Bandwith is about per bus rather than per port.
Well, it should be per port, but there are limits in current discrete Thunderbolt controllers (Alpine Ridge, Titan Ridge). They have 4 lanes of PCIe 3.0 (31.5 Gbps) so the total combined (two ports) available PCIe bandwidth should be more like 28 Gbps, rather than 23 Gbps which is basically a flaw in the design. I have a Maple Ridge (Thunderbolt 4 controller) but haven't tried a Thunderbolt connection yet.
If you look at integrated Thunderbolt solutions such as the M1 Mac, or the Ice Lake based MacBook Pro or the Tiger Lake PC laptops with two Thunderbolt ports, you'll find that they do not have the 23 Gbps limit and you can get more like 40 Gbps bandwidth (no relation to Thunderbolt 40 Gbps).
B. In reality, a TB3 connection is limited to 22gbp/s in each direction when all traffic being converted to Thunderbolt packets originated as PCIe. Although not to despair, (as I was thinking TB3 on an iMac equated to just 22gb's shared by the bus across both ports), you can still exploit the remaining bandwith when adding in Displayport data into the tunnelling, which can utitlise the remaining bandwith up to 40gbp/s (minus any overhead), or on it's own, use up to 25.92 gbp/s (the limit of DisplayPort 1.4 HBR3), alongside other PCIe data up to 40gbp/s, combined.
Correct. Do note that the 22 Gbps number is just an approximation. I have some benchmarks that show slightly above 23 Gbps (2874 MB/s read speed from an NVMe). There's a YouTube video showing nearly 3000 MB/s (24 Gbps) from Thunderbolt 4 on Windows.
For DisplayPort, remember that Thunderbolt can have two separate DisplayPort connections, so you can do two HBR2 connections or one HBR3 connection with a HBR connection. Then there's Apple's trick of having two HBR3 connections for the XDR. DisplayPort traffic has priority (because a display doesn't work if it doesn't get all the bits at the same rate all the time). PCIe can have whatever is remaining.
C. The theoretical TB3 bandwith works bi-directionally, simultaneously across 4 lanes. 2 lanes can use up to 40gbp's output and while 2 lanes concurrently use up to 40gbp/s input. So in theory, up to 80gbp's net transmission (minus any overheads).
Correct. But I don't know of a situation where receiving more than 24 Gbps is possible. That's PCIe traffic. I don't think you can receive DisplayPort traffic. I'm not sure how the iMac Thunderbolt Target Display Mode works - but the iMacs that support that are only 2560x1440 (7 Gbps which is less than HBR link)
D. A device connected to a port or in chain will not comsume bandwith as long as it is inactive. Two ports on a single controller could both have many daisy chained devices running to them that would exceed the total bandwith of the Thunderbolt controller but as long as enough devices are inactive for the total traffic to come in at under 40gbp/s, having them all connected at once would not cause an issue.
Yup. Should note that devices further away from the host in a chain will have slightly less performance (latency). Max chain length is 6 for Thunderbolt 3/2/1. I think its 5 for Thunderbolt 4/USB4.
1.
From your first reply. When re-reading everything in the thread and doing more research, this got me confused. All the discussion seems to talk of the whole Thunderbolt bus being limited to 40gbp's max and 22g gbp/s where PCI data is concerned by both ports SHARED, NOT individually. It's just that this statement seems at odds with that and reads like each of the two ports can can do 22gbp\s of PCIe transmission on a single shared bus. Unless this is intended to mean that each single port can transmit ≈22 Gbps on it's own if it's the only port being actively used?
Right. Connect one NVMe to each port. You can read 22 Gbps from one if you're not reading from the other. If you try to read from both at the same time (like with a software RAID 0) then you'll see something like 12 Gbps from each. You can read 22 Gbps from one and write 22 Gbps to the other simultaneously though.
2.
Had a bit of trouble resolving this. I may have point C above incorrect as a result. I thought the controller carried was 40gbp/s total in either direction, made up of x2 20gbp/s lanes going each way, which seems to be depicted her on page 4:
Yes. 40 Gbps each direction for a single Thunderbolt port. A Thunderbolt controller usually has two ports. So that would be 80 Gbps each direction total.
Also, I'm curious to understand what is meant by it not being able to fill either direction? Is this some sort of real world limitation when data is being transmitted and received simultaneously?
The Thunderbolt controller can send/receive PCIe up to 24 Gbps (two ports total). The Thunderbolt controller can send 51.84 Gbps of DisplayPort (one DisplayPort HBR3 connection per Thunderbolt port for this max). 24 + 51.84 = 75.84 Gbps transmit but only 24 Gbps receive (because host can't receive DisplayPort, only transmit it). I don't think other methods of sending/receiving Thunderbolt data (USB tunnelling or host to host communication) will add to that.
Just because the connection can't be filled with actual data doesn't mean stuff isn't happening. Thunderbolt probably sends idle packets or something to fill the extra space.
3. How does a Thunderbolt Controller or the one in the 2019 iMac specifically (if different manufacturers wire things differently), allocate the bandwith between the 2 ports that share it? I had presumed this was dynamic so that one port could take more than half the bandwith if it's data throughput required it. But I've read here in a discussion regarding the 2019 iMac that when both ports have a physical connection plugged in, the bandwith is automatically divided exactly in half with 50% limit of 20gbp/s imposed on each port.
discussions.apple.com
If that was correct, it seems that such scenarios like below would not be possible as each port on the iMac would only be able to transmit 20gbp/s and receive 20gbp/s maximum, so a combined total of 40gbp/s transmit and 40gbp/s receive for the whole controller?
That would be ridiculous. How it works is probably something like this: You request a read of x bytes from NVMe #1 on port 1 and a read of x bytes from NVMe #2 on port 2. They share a single upstream connection (the Thunderbolt controller has two downstream ports but only one upstream PCIe connection to the host). So each transaction must come after the other. They are scheduled or occur in the order received. Bytes are received from NVMe #1 so we request x more bytes from NVMe #1 but that can't happen until we get the bytes from NVMe #2. In that way the bandwidth is shared. The requests may be split up into smaller chunks by PCIe or whatever to better share the bandwidth and make sure everyone gets a turn.
4.
Thanks again for taking the time to illustrate data througput with examples. I'm struggling with interpeting the direction of data for my setup. The interchangable nomenclature - would it be correct to group it as such:
Ouput/Send/Transmit -and- Input/Receive?
Yes.
a. What is the PCIe data in these examples? Maybe this 12gbp/s both transmit and receive was in reference to the Decklink PCIe cards previously discussed?
12 Gbps is half of 24 Gbps which is approximately the total PCIe bandwidth for both ports simultaneously. It's a nice round number for illustrative purposes. I suppose the best example is a RAID 0 of two NVMe devices, one connected to each port. RAID 0 is probably the best way to test total bandwidth unless you can find a benchmark that can send/receive to multiple devices at the same time and add them up. ATTO Disk Benchmark.app can do that without a RAID 0. There is an app called CL!ng.app that can measure bandwidth to/from a GPU but it doesn't have an option for multiple GPUs.
b. The DisplayPort part element (HBR) in these examples must be transmit only if, along with 12gbp/s PCIe transmit, the total transmit is 37gbp/s but there is no contibution ot the receiev total? Does this infer that where displays are concerned data is being only transmitted *to* the display, i.e. a display always receives a video signal but doesn't transmit any data it back in the opposite direction?
A display can transmit EDID and DPCD data but that's a very small amount of data and it doesn't happen often - mostly just when you connect the display. Maybe there's a vertical sync interrupt once per frame. Is horizontal sync interrupt possible?
5. Why is it that PCIe data limited to 22gbp/s over Thunderbolt 3 when the PCIe 3.0x4 protocol which TB3 uses, is capable of 32 gpb/s? I have read that this is to reserve bandwith for USB 3 gen 2 over Thunderbolt, which Intel guarantees as 10gbp/s? Would make sense.
Reserving bandwidth for something that isn't connected seems silly to me. Anyway, USB over Thunderbolt 3 is just PCIe so that doesn't make sense. I don't know the cause of the single port limit or the two port limit. I don't know how much overhead a Thunderbolt packet has when encoding a PCIe packet. Most of this info can be found in the USB4 spec. The USB4 spec does say for PCIe tunnelling: "The amount of buffering at the PCIe Adapter is implementation specific as it balances the tradeoff between PCIe tunneling performance and PCIe link latency. It is recommended that implementations make the amount of buffers configurable." It says the same thing for USB3 tunnelling (a new feature of USB4/Thunderbolt 4). If it were configurable, then you could just poke some bytes and see a change in performance. There were cases where eGPUs had lower than expected bandwidth over Thunderbolt - the bandwidth could be increased with a different Thunderbolt firmware.
I presume Thunderbolt 4 will also increase the useable bandwith by using PCIe 4.0? Even with a 10gbp's reservation for USB 3.2 gen 2x1 This should allow the PCIe data cap to go as theoretically high as 30gbps as PCI 4.0 could share the full 40gbp/s capable bandwith. Although with the arrival of USB 4.0 who knows what rules will govern how it will all be shared? Hopefully just more busses and ports on newer machines to make it easier to begin with!
Currently, the only discrete Thunderbolt 4 chip is Maple Ridge which uses PCIe 3.0 like previous discrete Thunderbolt controllers. The integrated Thunderbolt controllers in the M1 Mac or Ice Lake MacBook Pro or Tiger Lake PC laptops don't use a PCIe connection to the host and they don't have the two port bandwidth limit (though they may have a limit with more than two ports). Like I said before, I haven't tested Thunderbolt performance of Maple Ridge yet (I have GC-ALPINE RIDGE, GC-TITAN RIDGE, and now ThunderboltEX 4 in my MacPro3,1).
The USB4 spec is open so nothing is stopping someone from making a USB4 controller with better performance (at least better performance for multiple ports, if not for a single port).
6. Video/data over Thunderbolt - PCIe is always data. Is that correct? Even video monitor and capture cards are converting video signals to and from PCIe data. If so, then Displayport Alt Mode is the only protocol that is truly using a video signal over Thunderbolt?
Correct. Video has been digital since DVI/HDMI (ignoring older stuff like the 1 bit modes of the Apple II or Mac or the 1 bit per RGB component modes of CGA). DisplayPort (Alt Mode or not) sends DisplayPort data over the DisplayPort cable. Whether it's PCIe or DisplayPort or DVI/HDMI (or any other connection type/protocol like USB, SATA, FireWire), it's all just bits and bytes. PCIe is a method to connect the controllers that handle the other connection types/protocols to the CPU or RAM (PCIe controllers use DMA to transfer data directly to the RAM so that the CPU can do other stuff - the drivers executed by the CPU setup the DMA stuff for each transfer).
Since DisplayPort is just bits and bytes, Thunderbolt can encapsulate the DisplayPort data and send it as Thunderbolt data. With this method, you can mix other types of data such as PCIe and USB. On the other hand, Displayport Alt Mode uses separate lines for DisplayPort and USB which means that reducing USB bandwidth won't let you have more DisplayPort bandwidth. Displayport Alt Mode has two modes - 4 lanes of DisplayPort + USB 2.0, and 2 lanes of DisplayPort + USB 3.x. Thunderbolt basically has one mode: 2 lanes of Thunderbolt (ignoring Thunderbolt 1). So the Thunderbolt mode of mixing different types of data makes more efficient use of the connection. One improvement would be to allow Thunderbolt to use 3 lanes for transmit and 1 lane for receive, in the case where it's sendibng a lot of DisplayPort data. Another improvement would be to replace the USB 2.0 lines with another lane of Thunderbolt. Then you could have up to 5 lanes in one direction, 6 lanes total). VirtualLink is an example of a USB-C Alt Mode that repurposes the USB 2.0 lines. With 5 lines for transmit, you could include DisplayPort 2.0.
Clearing up:
Found the website again which caused a lot of confusion for me:
It states:
'Of the ~40 Gbps = 5 GB/sec bandwidth of Thunderbolt 3, ~8 Gbps can be used ONLY for video'.
That sounded absolute but is not even true. Not to mention later on they list display resolutions with higher data rates and mention that video bandwidth can eat into the available bandwidth for regular data when it is greater than 8gbp/s so not sure why they wrote the above statemet in the way it is worded.
Confusing coincidence for me is that they also state that the maximum theoretical bandwidth for non-video data is 32.4 gbp/s with a true *usable* bandwidth that drops to 25.92 Gbps. 25.92 gbp/s also happens to be the bandwith required by Displayport 1.4 HBR3. Got me really confused on whether this figure was the max for video or non video data!
I had a long e-mail discussion back in August with Lloyd Chambers about his conclusions regarding DisplayPort, DSC, and Thunderbolt. I'm not sure I was able to change his mind even with benchmarks. He's a busy guy with not enough time to read my long winded remarks or something like that (also, inadequate air conditioning, forest fires, workload, etc.)
HBR3 is 8.1 Gbps per lane. That is the encoding on the wire. DisplayPort uses 8b/10b encoding which means it takes 10 bits to transmit 8 bits of data. DisplayPort has 4 lanes, so the total is 8.1 Gbps x 4 lanes x 8b/10b = 25.92 Gbps. Thunderbolt does not use the 8b/10b encoding when transmitting Thunderbolt DisplayPort packets. Thunderbolt uses 64b/66b encoding. The 10 Gbps and 20 Gbps numbers for Thunderbolt take that into account - Thunderbolt actually transmits at 10.3125 Gbps or 20.625 Gbps. Thunderbolt does not transmit the DisplayPort stuffing symbols that DisplayPort uses to fill the DisplayPort bandwidth - this is how Apple can send two HBR3 connections (>50 Gbps) over a single Thunderbolt 3 cable - it works because each of the 3008x3384 60Hz tiles of the 6K display does not require all the bandwidth of HBR3.
Here's some of the info I came up with (sent to Lloyd on August 23):
I redid the tests with higher bandwidth devices.
- Replaced RAID with eGPU.
- Replaced 3840x2160 60Hz 10 bpc displays with 4096x2304 68.595Hz 8 bpc displays.
Setup:
Two displays 4096 x 2304 x 68.595 Hz x 24 bpp = 15.5 Gbps (681.76 MHz pixel clock)
Using eGPU connected to port 1, below are read / write Gbps for eGPU with total that includes display(s) bandwidth.
Benchmark using CL!ng.app from the App Store.
Bencharks (receive/transmit Gbps):
Code:
1) 2 displays alone: total (1 port ) = 0.0 / 31.0
2) eGPU alone: 21.6 / 16.3 total (1 port ) = 21.6 / 16.3
3) with both displays at port 2: 21.7 / 16.3 total (2 ports) = 21.7 / 47.3
4) 1 display at end of eGPU chain: 20.2 / 13.8 total (1 port ) = 20.2 / 29.3 (corrected from 28.8)
5) 2 displays at end of eGPU chain: 11.2 / 4.1 total (1 ports) = 11.2 / 35.1
Comments:
1) 31.0 Gbps is very near 34.56 Gbps DisplayPort 1.2 limit of Thunderbolt 3.
2) 21.6 Gbps is very near 22 Gbps PCIe limit of Thunderbolt 3.
3) 47.3 Gbps means that a bus can exceed 40 Gbps limit of a single port; also, displays on other port don't impact read/write of first port.
4) A single 15.5 Gbps display can impact write speed of a port.
5) Two displays destroys write speed; surprisingly nearly halves read speeds. Total is only 35.1 Gbps meaning there is 4.9 Gbps of overhead. Including horizontal and vertical sync pixels makes the total 36.8 Gbps which still leaves 3.2 Gbps of overhead.
Notes:
The eGPU has no displays connected - the displays are powered by the iGPU of the Mac Mini 2018 in all tests.
The eGPU is a Sapphire RX 580. It is connected to the PCIe slot of a OWC Mercury Helios 3 using a PCIe riser cable and external ATX PSU.
Displays are connected using Thunderbolt 3 to Dual DisplayPort adapter which is attached to Mac Mini in test 1 and 3, and attached to Helios 3 in test 4 and 5.
SwitchResX is used to create maximum resolution/bandwidth timing to maximize DisplayPort 1.2 bandwidth.
7. Outliers in the discussion - USB and HDMI:
I'm not sure if Thunderbolt tunnels USB data in the same way as PCIe and Displayport? Since both use the same style USB-C port it gets confusing. Is there a USB controller behind the ports too?
A USB4 or Thunderbolt 4 host can tunnel USB to a USB device connected to a downstream USB4 or Thunderbolt 4 device. The way to tell if this is happening is by looking in System Information.app. First check that the USB4 or Thunderbolt 4 device is connected as such in the Thunderbolt tab. Then in the USB tab, find the USB device and see if it is connected to the USB controller of the host. If it is connected to the USB controller of the downstream USB4 or Thunderbolt 4 device. Here's a screenshot which appears to show USB tunnelling:
https://forums.macrumors.com/thread...lso-definitely-not-usb4.2269777/post-29828042
The AppleT8013USBXHCI is the M1 Mac's USB controller. The Intel 0x0b40 is the USB hub of the Goshen Ridge Thunderbolt 4 controller in the OWC Thunderbolt 4 Dock. The OWC 0xde48 is the second hub in the Dock. The Ugreen Storage Device is the USB device being tested.
Now, on my Thunderbolt 3 Mac mini 2018, there is no USB tunnelling, so when I connect a USB device to the CalDigit Element Hub (a Thunderbolt 4 hub that also uses the Goshen Ridge), I see that the USB device is connected to the Goshen Ridge's PCIe USB controller (8086:0b27) which uses the AppleUSBXHCITR driver. Between the USB device and the USB controller is the same Intel 0xb40 USB hub. The CalDigit includes a second hub (CalDigit 0x0032) for the 4 USB Type A ports.
I have a USB 3.2 gen2x1 SSD connected into one of my Thunderbolt ports and it seems from previous discussion that it is counted within the use of the total Thunderbolt 3 bandwith. If so, is it counted within the PCIe element of this bandwith? Not sure where USB falls in all this. Is it one and the same as PCIe data?
I haven't tested how a USB device connected to one port affects the Thunderbolt or USB bandwidth of the second port. The USB controller is a PCIe device. I tried it now using ATTO Disk Benchmark.app which can test multiple disks at the same time. Results are read/write MB/s. Mac mini 2018.
USB alone port 1: 1033/1059
Thunderbolt alone port 2: 2636/1466 (ADATA XPG SX8200 Pro 2TB)
USB port 1 + Thunderbolt port 2: 2687/2481
---
So we seem to be hitting the discrete Thunderbolt controller two port limit even with a USB device instead of a Thunderbolt device. The Intel Mac mini 2018 has two different Thunderbolt controllers/buses, so all I need to do is put each device on a different bus to get max performance (4000 or 5000 MB/s or whatever). With two Thunderbolt NVMe on separate buses and the internal drive, ATTO shows 8076/7052 MB/s.
How does HDMI factor in all of the above? It's video data presumably? Is it transmitted as either PCIe from video card connections or becomes Displayport through converter cables or when it is conected via hardware that terminates in a Thunderbolt connection back to a computer? In the case of the Mac Mini though, I notice it has an HDMI port alongside Thunderbolt/USB 4 ports. Would it's video data be handled seperately from the Thunderbolt bandwith in this configuration on the M1 Mini?
For Mac mini 2018 or M1 Mac mini or Apple TV, HDMI starts out as DisplayPort from the GPU and is converted to HDMI using the MegaChips MCDP2920A4 DisplayPort 1.4 to HDMI 2.0b converter.
https://www.ifixit.com/Teardown/Mac+mini+Late+2018+Teardown/115210
https://megachips.com/mcdp2900-displayport1-4-to-hdmi2-0-converter/
A Thunderbolt Dock needs a similar HDMI converter chip to convert DisplayPort to HDMI (after the Thunderbolt dock converts tunnelled DisplayPort to DisplayPort). I don't like HDMI ports because it means they might have chosen an inferior DisplayPort to HDMI converter that cannot be upgraded. Perhaps in the future Apple will switch to the Realtek RTD2173 chip which does HDMI 2.1 which is used in DisplayPort 1.4 to HDMI 2.1 adapters.
https://www.anandtech.com/show/1453...s-rtd2173-displayport-14-to-hdmi-21-converter
A discrete GPU will probably produce an HDMI signal directly (AGDCDiagnose output won't have DisplayPort DPCD registers for the HDMI port).
DisplayPort data doesn't affect PCIe bandwidth unless it is being transmitted on the same Thunderbolt line as the PCIe data and the PCIe data could exceed the remaining bandwidth of Thunderbolt. The HDMI output of the M1 Mac mini is not related to Thunderbolt at all.
OWC has done some tests on M1 Macs with strange result showing that connecting a Thunderbolt dock (maybe with a display) may increase performance of the other Thunderbolt port by a significant amount (at least according to benchmarks).
https://eshop.macsales.com/blog/74383-faster-drive-performance-with-m1-mac/