Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

mattspace

macrumors 68040
Original poster
Jun 5, 2013
3,342
2,975
Australia
Just looking for a confirmation about my suspicions here:
1703498665849.png


My Pool B is showing 50% allocation with only the x4 I/O card assigned. Is it the DP return streams from the GPUs that are using the "missing" 4 lanes (as in, I would expect x4 to be 25%)? Does that mean I'm going to go over budget if I plug in an x8 device?
 
Last edited:

joevt

macrumors 604
Jun 21, 2012
6,965
4,259
You only have to worry about combining DisplayPort and PCIe outside the Mac Pro when they are transmitted over a Thunderbolt cable.

DisplayPort and PCIe are completely seperate Inside the Mac Pro.

There are 4 pools: Slot 1, Slot 3, Pool A, and Pool B.

Pool B also includes the Thunderbolt controllers of any MPX modules in Slot 1 or Slot 3. An MPX module may have up to two Thunderbolt controllers.

Does switching Slot 8 to Pool A decrease Pool B Allocation to 25% and inscreas Pool A alloction to 125%?

Each Thunderbolt controller is x4. The Mac Pro has a built in Thunderbolt controller for the top Thunderbolt ports (not changeably to Pool A?) and another Thunderbolt controller on the I/O card in Slot 8. Maybe these account for Pool B's 50% Allocation? That would be x8, so another x8 should give 100%?

Do you have any MPX modules? Your signature says you have two W5700X. That's 4 Thunderbolt controllers. Wouldn't that be 100% right there? Maybe it doesn't count Thunderbolt controllers as x4? Or maybe it doesn't count Thunderbolt controllers that don't have Thunderbolt devices connected? What if you disconnect a W5700X?

If no Thunderbolt device using PCIe tunnelling is connected to them, then they are basically x0. Also, any idling device (idle means not sending or receiving data) can be considered x0. You only need to worry about Pool allocation if you plan on doing more than 126 Gbps of data transfer in one direction (either receive or transmit). Receive and Transmit use seperate lines, so you can transmit 126 Gbps while receiving 126 Gbps.

A Thunderbolt port is ≈ 25 Gbps PCIe max even though the Thunderbolt cable can do 40 Gbps (the 40 Gbps can only be filled by DisplayPort, whatever remains can be used by PCIe up to ≈25 Gbps).
A Thunderbolt controller is also ≈25 Gbps even though a Thunderbolt controller has two Thunderbolt ports and even though x4 should allow ≈31.5 Gbps.

All other I/O devices (SATA, NVMe, Ethernet, USB, etc) go through a seperate DMI connection to the PCH.

ioreg or pcitree.sh can show what PCI devices are connected to each pool.

https://www.apple.com/by/mac-pro/pdf/Mac_Pro_White_Paper_Aug_2021.pdf

There's this thread:
https://forums.macrumors.com/threads/allocating-pci-pools.2246466/

A Thunderbolt controller has a NHI device and a USB device. Each report as PCIe gen 1 x4 even though they can transmit more data than that. I don't know if the Pool calculations take this into consideration.
A Thunderbolt controller has downstream bridges for the Thunderbolt ports but I don't think the Pool calculations include downstream bridges. I don't know if the Pool calculations include downstream PCIe devices.
 
  • Like
Reactions: ZombiePhysicist

mattspace

macrumors 68040
Original poster
Jun 5, 2013
3,342
2,975
Australia
You only have to worry about combining DisplayPort and PCIe outside the Mac Pro when they are transmitted over a Thunderbolt cable.

DisplayPort and PCIe are completely seperate Inside the Mac Pro.

I was more thinking abut the DP return to the system via the MPX slot. My understanding is the MPX bay is DMI x16 to exchange data with the GPU, and then the second set of pins reroutes the second PCI Slot in an MPX bay (Slot 2 & 4) when an MPX GPU is connected to supply TB peripheral bandwidth, and bring DP-Out back into the system, via the Switched A & B PCI pools.


There are 4 pools: Slot 1, Slot 3, Pool A, and Pool B.

Pool B also includes the Thunderbolt controllers of any MPX modules in Slot 1 or Slot 3. An MPX module may have up to two Thunderbolt controllers.

Does switching Slot 8 to Pool A decrease Pool B Allocation to 25% and inscreas Pool A alloction to 125%?

Slot 8 can't be changed off Pool B, but switching the x16 Afterburner to Pool B sets it to 150%

I think what I'm unclear on, is whether the PCI switch actually works to manage a greater number of lanes than the processor supports - especially if things are idle. For example, I'm not using any displays plugged into the top or I/O cards - all my displays are via Type-C to DP connectors, so do the lanes that would be supplying Slot 2 (which AFAIK are assigned to the MPX DP-Return) get returned to the pool?


Each Thunderbolt controller is x4. The Mac Pro has a built in Thunderbolt controller for the top Thunderbolt ports (not changeably to Pool A?) and another Thunderbolt controller on the I/O card in Slot 8. Maybe these account for Pool B's 50% Allocation? That would be x8, so another x8 should give 100%?

As I understand it, for a TB-equipped MPX GPU, TB Bus 0 is the HDMI, TB Port 1, the top and the I/O card. Bus 1 & 2 are the 4 remaining TB ports on the card. So Bus 0 should account for x4 lanes.

From this table:

1703581345635.png


So there's 32 DMI lanes for MPX GPUs, leaving 32 CPU lanes, distributed across a potential 60 lanes on the non-DMI slots, that are managed through a 96 lane switch.

Currently I have 16 for the Afterburner (100% of Pool A), and 4 for the I/O card (50% of Pool B) - theoretically that should be 12 CPU lanes remaining...

If the full x8 of Slot 2 is assigned to cover Thunderbolt duties for Bus 1 & 2, that would leave 4 lanes available... which would make the I/O card having 4 lanes and using 50% of the pool work out correctly for the maths.

So Pool B is I/O Card (TB Bus 0: x4, DP In), plus Slot 2 (TB Bus 1: x4, TB Bus 2: x4).

So that would mean there's only 4 lanes available... unless the switch is managing over-subscription, and can scavenge lanes from TB Bus 1 & 2 if they're not in use for TB devices.

I'll see what the Mac Pro team specialists at Apple Support have to say about it (that support contract has to be good for something)...
 

mattspace

macrumors 68040
Original poster
Jun 5, 2013
3,342
2,975
Australia
Hmm I wonder if the afterburner is not the x factor somehow...

Yeah, i mean it's sitting there using 16 lanes, and is completely idle... it really does irk me that Apple didn't make it dynamically reconfigurable. "It's a software reconfigurable FPGA, for which we'll never offer a reconfiguration"
 

ZombiePhysicist

Suspended
May 22, 2014
2,884
2,794
Yeah, i mean it's sitting there using 16 lanes, and is completely idle... it really does irk me that Apple didn't make it dynamically reconfigurable. "It's a software reconfigurable FPGA, for which we'll never offer a reconfiguration"

I hate to say it, and maybe it wont work with your work flow, but maybe yanking it out and running for a while with your controller card in that slot might be a good test?
 

mattspace

macrumors 68040
Original poster
Jun 5, 2013
3,342
2,975
Australia
I hate to say it, and maybe it wont work with your work flow, but maybe yanking it out and running for a while with your controller card in that slot might be a good test?

It's a thought, though interestingly in the Mac Pro white paper from 2021 they specifically posit an example machine with dual MPX GPUs, an Afterburner, AND suggest the remaining slots are suitable for SSDs Fibre Channel (how many lanes does that use?).

I can't recall if it's been discussed here previously, but I'm still unclear on how the 96 lane PCI switch functions to manage bandwidth over 32 physical lanes. As in did they use a 96 lane switch because there wasn't a 32 lane one of sufficient performance, or are you supposed to be able to over-subscribe those lanes 3:1.

We'll see what Apple engineering say. If I only have 4 lanes remaining, I might just put a single SSD on a card, and put my photo library on a 4TB SATA SSD, rather than an M.2 🤷‍♂️
 
Last edited:

joevt

macrumors 604
Jun 21, 2012
6,965
4,259
I was more thinking abut the DP return to the system via the MPX slot. My understanding is the MPX bay is DMI x16 to exchange data with the GPU, and then the second set of pins reroutes the second PCI Slot in an MPX bay (Slot 2 & 4) when an MPX GPU is connected to supply TB peripheral bandwidth, and bring DP-Out back into the system, via the Switched A & B PCI pools.
Like I said, DisplayPort is completely separate from PCIe. MPX slots have different lines for the DisplayPort signals that are sent to the top or I/O card's Thunderbolt controller.

MPX doesn't use DMI. DMI is the connection from the CPU to do PCH. The PCH has the other devices not connected to the PCIe switch (SATA, NVME, Ethernet, WiFi, USB, etc).

Slot 8 can't be changed off Pool B, but switching the x16 Afterburner to Pool B sets it to 150%
That's logical. If Pool B is 150% then Pool A is reduced to 0%?

I think what I'm unclear on, is whether the PCI switch actually works to manage a greater number of lanes than the processor supports - especially if things are idle. For example, I'm not using any displays plugged into the top or I/O cards - all my displays are via Type-C to DP connectors, so do the lanes that would be supplying Slot 2 (which AFAIK are assigned to the MPX DP-Return) get returned to the pool?
The CPU has 64 lanes. 16 for Slot 1, 16 for Slot 3, 16 for Pool A, and 16 for Pool B, This is on page 11 of the Mac Pro White Paper.

Pool A and Pool B are controlled by the 96 lane PCIe switch. Since 32 lanes are used for the upstream connections (16 for Pool A and 16 for Pool B), there remains 64 lanes for the downstream slots and devices.

The Mac Pro therefore has 128 total usable lanes controlled by 64 lanes from the CPU.

A display does not use PCIe lanes unless it is a Thunderbolt display with PCIe devices (USB controller, Ethernet controller, ...).

If things are idle then it doesn't matter if a Pool is 100%, 200%, or 300%. Over allocation becomes a problem only if you happen to be sending > 126 Gbps at any given moment in one direction.

An MPX module in slot 1 will assign the PCIe lanes of slot 2 to the PCIe lanes of the Thunderbolt controllers of the MPX module. I'm not sure if slot 3 is the same - slot 4 has 16 lanes so does it get changed to x8? The White Paper says slot 4 gets disabled.

The Radeon Pro 580X MPX and Radeon Pro W5500X modules don't have any Thunderbolt controllers and are only double wide, so they shouldn't affect slot 2 or slot 4.
Only the quad wide MPX modules have Thunderbolt controllers.

As I understand it, for a TB-equipped MPX GPU, TB Bus 0 is the HDMI, TB Port 1, the top and the I/O card. Bus 1 & 2 are the 4 remaining TB ports on the card. So Bus 0 should account for x4 lanes.
I don't know what the bus numbers are. Each Thunderbolt controller is a separate Thunderbolt bus. There can be between 2 and 6 Thunderbolt buses. Each Thunderbolt bus has two Thunderbolt ports.
- I/O card
- Mac Pro top Thunderbolt ports
- 1st Thunderbolt controller of MPX module in slot 1
- 2nd Thunderbolt controller of MPX module in slot 1
- 1st Thunderbolt controller of MPX module in slot 3
- 2nd Thunderbolt controller of MPX module in slot 3

HDMI is separate from Thunderbolt Bus. What you're confusing here is the DisplayPort outputs of the GPU. A GPU has up to 6 DisplayPort outputs. A Thunderbolt bus has 2 DisplayPort inputs.

The W5700X has a switch (MUX) for one of the DisplayPort outputs of the GPU. The switch switches the DisplayPort output between a DisplayPort to HDMI converter and a DisplayPort input of one of the Thunderbolt controllers. In this case, there are 7 display outputs to choose from but only 6 are useable because the GPU has only 6 DisplayPort outputs and one of them is switched.

So there's 32 DMI lanes for MPX GPUs, leaving 32 CPU lanes, distributed across a potential 60 lanes on the non-DMI slots, that are managed through a 96 lane switch.
Not DMI. They are all PCIe lanes. 64 lanes from the CPU. 32 to the PCIe switch. 32 to slot 1 and slot 3. There's 64 downstream lanes from the PCIe switch but the white paper only shows 56 of them.

Currently I have 16 for the Afterburner (100% of Pool A), and 4 for the I/O card (50% of Pool B) - theoretically that should be 12 CPU lanes remaining...
I don't know if the I/O card is using 50% or 25%. Disconnect it to find out. Remember there are also PCIe lanes going to the Thunderbolt controller for the Mac Pro's top Thunderbolt ports.

Yeah, i mean it's sitting there using 16 lanes, and is completely idle... it really does irk me that Apple didn't make it dynamically reconfigurable. "It's a software reconfigurable FPGA, for which we'll never offer a reconfiguration"
If it's idle then it's not using bandwidth and doesn't affect anything. All the bandwidth can be used by something else while it's idle.

I can't recall if it's been discussed here previously, but I'm still unclear on how the 96 lane PCI switch functions to manage bandwidth over 32 physical lanes. As in did they use a 96 lane switch because there wasn't a 32 lane one of sufficient performance, or are you supposed to be able to over-subscribe those lanes 3:1.

We'll see what Apple engineering say. If I only have 4 lanes remaining, I might just put a single SSD on a card, and put my photo library on a 4TB SATA SSD, rather than an M.2 🤷‍♂️
The PCIe bus is like a network. You can connect a 100 devices to a single PCIe lane using PCIe switches like a network switch.

Similar to USB. You can connect many devices to a single USB port. A USB hub handles moving traffic to the proper USB device.

Don't worry about over allocation. Think about what devices you are going to be using at the same time. Can they together send 126 Gbps? Or receive 126 Gbps? If so then shuffle them around if possible.
 

mattspace

macrumors 68040
Original poster
Jun 5, 2013
3,342
2,975
Australia
Don't worry about over allocation. Think about what devices you are going to be using at the same time. Can they together send 126 Gbps? Or receive 126 Gbps? If so then shuffle them around if possible.

Ahh, I had assumed over-allocating would be a "your mac will become unusable with warning alerts" situation.

So theoretically, if I'm not using any TB peripherals on my main display GPU - that's 2 TB busses worth of PCI I have in hand to feed an SSD card? I was primarily thinking about what the effect of PCI starvation has on the stability of an operating system running from that SSD card.

I assume a Type-C to type-A USB adapter plugged into a TB port on a GPU would activate PCI lanes... but might only actually use bandwidth on them when transferring data...?
 

joevt

macrumors 604
Jun 21, 2012
6,965
4,259
So theoretically, if I'm not using any TB peripherals on my main display GPU - that's 2 TB busses worth of PCI I have in hand to feed an SSD card? I was primarily thinking about what the effect of PCI starvation has on the stability of an operating system running from that SSD card.
PCI starvation isn't really a thing. If you exceed 126 Gbps then something is going to be slightly slower than usual. That's all. The devices will take turns transmitting or receiving data so no one device can monopolize the bus.

You can connect a 16x GPU to a 1x PCIe slot and it will still do GPU work. It might take longer to send info for a frame, but you can compensate by making the GPU render higher resolution frames so that it's not doing nothing while waiting for the next frame.

The OWC Express 4M2 connects 4 gen 3 NVMe devices using 1x for each device. So each one is limited to 780 MB/s instead of the usual 2800 MB/s.

I assume a Type-C to type-A USB adapter plugged into a TB port on a GPU would activate PCI lanes... but might only actually use bandwidth on them when transferring data...?
True, a Thunderbolt controller includes a USB 3.x controller so that will use up to 9.7 Gbps for each USB-C port.
I believe a USB 2.0 device connected to a Thunderbolt controller in the MacPro7,1 will use separate USB 2.0 lines to the USB controller of the PCH.
In any case, if a device isn't doing and sending or receiving at the moment, then something else can use that bandwidth.

One PCIe lane is < 7.877 Gbps.
 
  • Like
Reactions: mattspace

mattspace

macrumors 68040
Original poster
Jun 5, 2013
3,342
2,975
Australia
PCI starvation isn't really a thing. If you exceed 126 Gbps then something is going to be slightly slower than usual. That's all. The devices will take turns transmitting or receiving data so no one device can monopolize the bus.

This really seems to be something Apple isn't very good at - they don't like informing users because some graphic designer thinks providing explanations in-situ looks messy, so they give just enough information to cause worry (*glares at expansion slot utility*), but not enough to reassure.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.