If you had to choose: Thunderbolt Vs Upgradeable GPU + PCIe slots?

subsonix · Aug 16, 2013

VirtualRain said:
The only thing Falcon Ridge does over Cactus Ridge is aggregate the two 10Gbps TB channels on each connector/cable into a single 20Gbps channel so that it can pass 4K DisplayPort signals.

Here's the issue I have, a x4 PCIe v2 controller with two ports that promises up to 20Gb/s can not deliver on that promise unless only one port is used.

VirtualRain · Aug 16, 2013

subsonix said:
Here's the issue I have, a x4 PCIe v2 controller with two ports that promises up to 20Gb/s can not deliver on that promise unless only one port is used.

Yeah, the marketing and tech pundits have distorted the truth on this. When TB2 rumors first started circulating, you saw headlines like "TB2 doubles bandwidth to 20Gbps" which of course, is very misleading.

At any rate, in only the most extreme cases will the constraints become a problem.

Let's say you're lucky enough (or rich enough) to have three 4K displays, a 10GigE TB adapter, a TB RAID0 array of SSDs that can do 2GB/s and a BlackMagic video capture card. Even this is not going to bottleneck anywhere if you connect your 4K displays to different TB controllers and each of your data peripherals to a different controller. No worries.

subsonix · Aug 16, 2013

VirtualRain said:
Yeah, the marketing and tech pundits have distorted the truth on this. When TB2 rumors first started circulating, you saw headlines like "TB2 doubles bandwidth to 20Gbps" which of course, is very misleading.

At any rate, in only the most extreme cases will the constraints become a problem.

Let's say you're lucky enough (or rich enough) to have three 4K displays, a 10GigE TB adapter, a TB RAID0 array of SSDs that can do 2GB/s and a BlackMagic video capture card. Even this is not going to bottleneck anywhere if you connect your 4K displays to different TB controllers and each of your data peripherals to a different controller. No worries.

Well, doesn't it strike you as a bad design decision that you need to know how your ports are wired to not limit your bandwidth? To me it doesn't make sense to have the left port deliver 20Gb/s and the right port 0Gb/s.

The only way to mitigate this while staying with x4 PCIe v2 controllers is to only use one port per controller.

And regarding marketing and tech pundits, there are more outlets than Anand and the way Apple puts it makes it seem like it's indeed 20Gb/s per port. I will at least remain cautiously optimistic about it until release.

Apple said:
Up to 20Gb/s data transfer speed

Thunderbolt is the fastest, most versatile I/O technology there is. And with Mac Pro, we’re jumping even further ahead. Thunderbolt 2 delivers twice the throughput, providing up to 20Gb/s of bandwidth to each external device. So you’re more than ready for the next generation of high-performance peripherals. You can connect massive amounts of storage, add a PCI expansion chassis, and work with the latest external displays — including 4K desktop displays and peripheral devices capable of broadcast monitoring in 4K. And since each Thunderbolt 2 port allows you to daisy-chain up to six peripherals, you can go all out by plugging in up to 36 external devices via Thunderbolt alone.

Tesselator · Aug 16, 2013

VirtualRain said:
Here's some additional reading on the difference between Cactus and Falcon...
http://www.anandtech.com/show/7049/intel-thunderbolt-2-everything-you-need-to-know

(see the 2nd to last paragraph)...

Here's my attempt to explain/understand it...

The only thing Falcon Ridge does over Cactus Ridge is aggregate the two 10Gbps TB channels on each connector/cable into a single 20Gbps channel so that it can pass 4K DisplayPort signals.

In more details... With TB1, one channel is reserved for PCIe, the other for DisplayPort. 10Gbps for each type of signal. The problem with that is a 10Gbps channel is not enough bandwidth for a 4K display signal (which is about 16Gbps). So in order for Intel to support 4K displays, they needed to combine the two 10Gbps channels in TB1 into a single 20Gbps channel, and they called that TB2. Now instead of PCIex4 and DisplayPort having their own 10Gbps channel, they are now muxed together on a single 20Gbps channel.

Now, even though the controller can toggle each connector at 20Gbps, it still appears that it only has a PCIex4 input from the computer to send across either of those connectors. So I assume (don't know this for sure) that it's switching PCIe x4 across both connectors. So you could hook up a x4 peripheral to either connector and it would work at full speed as long as they weren't trying to saturate the bus at the same time. If you hooked up a x4 peripheral to both connectors at the same time and they were both saturating their bus, they would be fighting for that same x4 connection with the computer and would be bottlenecking each other. It's the only way that I can see it making sense.

EDIT: With this design, three TB2 controllers in the new Mac Pro would utilize a total of 12 PCIe 2.0 lanes (x4 for each controller). This makes sense from a PCIe lane budget perspective...

Lanes available: 40 PCIe 3.0 lanes on CPU, 8 PCIe 2.0 lanes on PCH
- GPU 1 = 16 lanes
- GPU 2 = 16 lanes
- TB Controller 1 = 4 lanes
- TB Controller 2 = 4 lanes
- TB Controller 3 = 4 lanes
- PCIe SSD 1 = 2 lanes
- PCIe SSD 2 (?) = 2 lanes

That's 48 lanes which is all the system has.

Well that's a pretty good explanation for that case. I hope you don't mind if I try something here tho.

- GPU 1 = 8 lanes
- GPU 2 = 8 lanes
- TB Controller 1 = 8 lanes (2x4)
- TB Controller 2 = 8 lanes (2x4)
- TB Controller 3 = 8 lanes (2x4)
- PCIe SSD 1 = 4 lanes (needed for the 1.25GB/s Apple claims) PCIe v2
- PCIe SSD 2 (?) = 4 lanes (PCIe v2)

48 lanes total (40 v3 and 8 v2). Hmmm.

So far I haven't met a GPU that is faster given 16 lanes than when set to 8 on a v3 buss and barely a difference between 8 and 16 on v2 buss, so this could be a possibility. They did it with their rMBP too according to that article you linked - "GeForce GT 650M in the system only gets the use of 8 PCIe 3.0 lanes instead of the full 16, but with PCIe 3.0 this is not an issue (it wouldn’t be an issue with PCIe 2.0 either to be honest)."

Where I find the room for my attempted logic is in the descriptions given for TB1. They all say that the 10Gb lines were unidirectional. One line could only send and the other line could only receive - like USB is. One line is Tx the other Rx if I'm not mistaken. And each connector had 4 for those (I think) - 2 Tx and 2 Rx where either pair could be Data or Display with reservations. etc.

I'm starting to think you're right and I've got it mixed up but there's still a possibility for either and without something more clear from Intel it's pretty frustrating to think about let alone discuss.

Also in that article you linked there's another sneaky sentence - even for Cactus Ridge: "I suspect if I had another Pegasus SSD array I’d be able to approach 1800MB/s, all while driving video over the ports." OK, but if it's really only x4 v2 switched between 2 ports then where's the bandwidth for video coming from or is he talking about 640x480 @ 24fps?

VirtualRain · Aug 16, 2013

Tesselator said:
Well that's a pretty good explanation for that case. I hope you don't mind if I try something here tho.

- GPU 1 = 8 lanes
- GPU 2 = 8 lanes
- TB Controller 1 = 8 lanes (2x4)
- TB Controller 2 = 8 lanes (2x4)
- TB Controller 3 = 8 lanes (2x4)
- PCIe SSD 1 = 4 lanes (needed for the 1.25GB/s Apple claims) PCIe v2
- PCIe SSD 2 (?) = 4 lanes (PCIe v2)

48 lanes total (40 v3 and 8 v2). Hmmm. So far I haven't met a GPU that is faster given 16 lanes than when set to 8 on a v3 buss and barely a difference between 8 and 16 on v2 buss, so this could be a possibility. They did it with their rMBP too according to that article you linked - "GeForce GT 650M in the system only gets the use of 8 PCIe 3.0 lanes instead of the full 16, but with PCIe 3.0 this is not an issue (it wouldn’t be an issue with PCIe 2.0 either to be honest)."

Where I find the room for my attempted logic is in the descriptions given for TB1. They all say that the 10Gb lines were unidirectional. One line could only send and the other line could only receive - like USB is. One line is Tx the other Rx if I'm not mistaken. And each connector had 4 for those (I think) - 2 Tx and 2 Rx where either pair could be Data or Display with reservations. etc.

I'm starting to think you're right and I've got it mixed up but there's still a possibility for either and without something more clear from Intel it's pretty frustrating to think about let alone discuss.

I agree that x16 3.0 lanes for a GPU is overkill but given the aspirations that Apple has for the GPU playing a key role in computational tasks, maybe not for long.

Anyway, I'll be very surprised if the new TB2 controllers double the PCIe capacity... there's just nothing written to support that and that Anand link I posted above indicates fairly clearly that there is no added PCIe bandwidth.

----------

subsonix said:
Well, doesn't it strike you as a bad design decision that you need to know how your ports are wired to not limit your bandwidth? To me it doesn't make sense to have the left port deliver 20Gb/s and the right port 0Gb/s.

The only way to mitigate this while staying with x4 PCIe v2 controllers is to only use one port per controller.

And regarding marketing and tech pundits, there are more outlets than Anand and the way Apple puts it makes it seem like it's indeed 20Gb/s per port. I will at least remain cautiously optimistic about it until release.

Yes, you're not getting it though. There IS 20Gbps per port, but there are only two inputs to the controller to be carried over either port... PCIe x4 and/or DisplayPort. Don't confuse the signalling speed of the TB connection with the speed of the traffic it's carrying. The slower PCIe and DP signals are being MUXED onto the 20Gbps TB pipe. In the end, you're still only getting DP or PCIe x4 through that pipe.

And if you had only one port per controller, your 4k display would bottleneck your data peripheral. Having two ports per controller allows you to connect a 4K display to one and your PCIe peripheral to the other and not have them bottleneck each other. Agreed, you're not getting a full 20Gbps out of either port, but you ARE getting the full bandwidth of the inputs to the TB controller which is all that matters.

Tesselator · Aug 16, 2013

VirtualRain said:
Anyway, I'll be very surprised if the new TB2 controllers double the PCIe capacity... there's just nothing written to support that and that Anand link I posted above indicates fairly clearly that there is no added PCIe bandwidth.

Yeah, they do... err, He does. I wonder how much he knows and how much he's guessing at tho. The article clearly shows he's not opposed to publishing guesses.

Oh, well. It's been a nice discussion and a decent debate. Unusual for here lately.

I hope there are more to come in the future. It'll be very interesting to test these theories when the nMP ships - if indeed Apple or Intel don't clear the air before then.

VirtualRain · Aug 16, 2013

Tesselator said:
Yeah, they do... err, He does. I wonder how much he knows and how much he's guessing at tho. The article clearly shows he's not opposed to publishing guesses.

Oh, well. It's been a nice discussion and a decent debate. Unusual for here lately. I hope there are more to come in the future. It'll be very interesting to test these theories when the nMP ships - if indeed Apple or Intel don't clear the air before then.

I think the added PCIe bandwidth they are talking about is the fact that TB2 removes the 10Gbps channel cap on the PCIe data. Which means they can get the full PCIex4 bandwidth through a TB connection now. They were fairly clear that TB2 is still only PCIe x4. But I guess we'll know for sure soon enough. I suspect details to be announced at Intel's IDC in early Sept.

And, Yeah, testing it will be somewhat of a challenge. You will need a couple of SSD arrays and 4K displays to really push this thing.

subsonix · Aug 16, 2013

VirtualRain said:
Yes, you're not getting it though. There IS 20Gbps per port, but there are only two inputs to the controller to be carried over either port... PCIe x4 and/or DisplayPort. Don't confuse the signalling speed of the TB connection with the speed of the traffic it's carrying.

The traffic is bound to the signaling speed. I have no idea what you mean by that last sentence. Thunderbolt v1 also had 20Gb/s if you count the DisplayPort.

VirtualRain said:
And if you had only one port per controller, your 4k display would bottleneck your data peripheral. Having two ports per controller allows you to connect a 4K display to one and your PCIe peripheral to the other and not have them bottleneck each other. Agreed, you're not getting a full 20Gbps out of either port, but you ARE getting the full bandwidth of the inputs to the TB controller which is all that matters.

What data peripheral? There will surely be one port controllers, there already are, in the MacBook Air for example.

VirtualRain · Aug 16, 2013

subsonix said:
The traffic is bound to the signaling speed. I have no idea what you mean by that last sentence. Thunderbolt v1 also had 20Gb/s if you count the DisplayPort.

What data peripheral? There will surely be one port controllers, there already are, in the MacBook Air for example.

Right, but TB2s payload is PCIe x4 and DisplayPort 1.2... that's what goes into the TB controller and that's what comes out the other end. Nothing more. If TB2 was 100Gbps per port, it wouldn't matter. You still only get PCIe x4 data (16Gbps) and DisplayPort 1.2 (4K display = 16Gbps) carried over that link.

And if there is only one port, such as on the MacBook Air, it's limited by the capacity of that one port. If it could pass a 4K display signal (16Gbps), there would only be 4Gbps left for a data peripheral daisy chained. However, if the next MBA offers two ports, then it can support a 4K display (16Gbps) on one port and a full x4 PCIe peripheral on the other (16Gbps).

Now if Intel upped the capacity of a TB channel to 32Gbps, then a single port could carry both a 4K display signal and a x4PCIe connection without bottlenecking. But of course, that's not the case today.

deconstruct60 · Aug 16, 2013

VirtualRain said:
The only thing Falcon Ridge does over Cactus Ridge is aggregate the two 10Gbps TB channels on each connector/cable into a single 20Gbps channel so that it can pass 4K DisplayPort signals.

There is Redwood Ridge between those two and Falcon Ridge also picks up the quite useful ability to run the ports in DisplayPort v1.2 backward/pass-through mode.

A DP v1.2 monitor directly hooked up to Mac Pro a TB port is not going to consume TB bandwidth

EDIT: With this design, three TB2 controllers in the new Mac Pro would utilize a total of 12 PCIe 2.0 lanes (x4 for each controller). This makes sense from a PCIe lane budget perspective...

Lanes available: 40 PCIe 3.0 lanes on CPU, 8 PCIe 2.0 lanes on PCH
- GPU 1 = 16 lanes (3.0)
- GPU 2 = 16 lanes (3.0)
- TB Controller 1 = 4 lanes (3.0 or 2.0)
- TB Controller 2 = 4 lanes (2.0)
- TB Controller 3 = 4 lanes (2.0)
- PCIe SSD 1 = 2 lanes (3.0)
- PCIe SSD 2 (?) = 2 lanes (3.0)

You are under counting sources/users of PCI-e lanes.

2 1GbE Ethernet ports 2 lanes (2.0)
Wifi/Blue tooth 1 lane ( 2.0)
USB 3.0 1 lane (2.0 )

So there are another 4 .

That's 48 lanes which is all the system has.

Actually more than it has once include the missing. Likely workaround candidates are to

a. put GPU1 and SSD on a PCIe switch sharing that x16.
( GPU2 since more of a typical GPGPU target and more data traffic when in peak use would have full x16 )

A x16 v3 GPU with an effective x14 v3 throughput is still 150% the performance of an old x16 v2 link. Extreme 3D might has some blimps but it will work. If have extreme 3D and low GPGPU needs just flip which one primarily leverage for display by using another socket.

( second SSD on 2nd GPU card... same trick if a BTO option).

b. Kneecap the other I/O ports (or one o the controllers ) by making a TB controller share with their x4 worth of bandwidth.

c. Kneecap two of the TB controllers by making them share a x4 v2.0 link.

The lowest impact kneecapping is a.

Any notion that Apple would go TB controller per TB port pushes over subscription into looney toon land.

deconstruct60 · Aug 16, 2013

Tesselator said:
... So far I haven't met a GPU that is faster given 16 lanes than when set to 8 on a v3 buss and barely a difference between 8 and 16 on v2 buss, so this could be a possibility. They did it with their rMBP too according to that article you linked - "GeForce GT 650M in the system only gets the use of 8 PCIe 3.0 lanes instead of the full 16, but with PCIe 3.0 this is not an issue (it wouldn’t be an issue with PCIe 2.0 either to be honest)."

Low impact if not pushing them with GPGPU ( OpenCL or CUDA ) traffic perhaps. Likewise if not pushing much texture and/or model data over to GPU.

The IOHub/PCH chipset on the rMBP 15" has to deal with the SATA SSD. In the Mac Pro all the SATA traffic has been de-allocated from the IOHub/PCH. A chipset that provisions 10 SATA ports has a decent chunk of unused internal infrastructure bandwidth.

Will be interesting to see if that TB vs. SSD bundling to the CPU PCI lane allocation flip-flops in this year's rBMP 15" Shaving just x2 lanes off has lower kneecap impact.

Where I find the room for my attempted logic is in the descriptions given for TB1. They all say that the 10Gb lines were unidirectional. One line could only send and the other line could only receive - like USB is.
One line is Tx the other Rx if I'm not mistaken. And each connector had 4 for those (I think)

The mini Display Port socket has 4 sets of wires.

".. The DisplayPort connector can have 1, 2, or 4 differential data pairs (lanes) in a Main Link ..."
http://en.wikipedia.org/wiki/Display_Port

OK, but if it's really only x4 v2 switched between 2 ports then where's the bandwidth for video coming from or is he talking about 640x480 @ 24fps?

In TB v1, it is largely moving in the other direction on a different set of wires from a completely different source than PCI-e.

Tesselator · Aug 16, 2013

VirtualRain said:
I think the added PCIe bandwidth they are talking about is the fact that TB2 removes the 10Gbps channel cap on the PCIe data. Which means they can get the full PCIex4 bandwidth through a TB connection now. They were fairly clear that TB2 is still only PCIe x4. But I guess we'll know for sure soon enough. I suspect details to be announced at Intel's IDC in early Sept.

And, Yeah, testing it will be somewhat of a challenge. You will need a couple of SSD arrays and 4K displays to really push this thing.

6 or 8 SSDs would do. 3 or 4 SSDs in each RAID0 TB2 enclosure and connect them up to the same controller's two ports and then run sequential sustained I/O benchmarks on each array at the same time.

If you get the 2GB/s from both, then case A if not and it's down around 1GB/s , then case B.

subsonix · Aug 16, 2013

VirtualRain said:
Right, but TB2s payload is PCIe x4 and DisplayPort 1.2... that's what goes into the TB controller and that's what comes out the other end. Nothing more.

There is no question about this, but TB2 aggregates the 4 existing lanes to two which can be used for data and/or display. As you can see in the illustration TB1 has two channels with with two lanes, one in each direction, Thunderbolt 2 has one channel with two lanes, each 20Gb/s. Anand's hypothesis about what this will do for storage was ~1.5GB/s or so from your link.

VirtualRain said:
And if there is only one port, such as on the MacBook Air, it's limited by the capacity of that one port.

Sure, and that's 20Gb/s for Thunderbolt 2.

VirtualRain said:
If it could pass a 4K display signal (16Gbps), there would only be 4Gbps left for a data peripheral daisy chained. However, if the next MBA offers two ports, then it can support a 4K display (16Gbps) on one port and a full x4 PCIe peripheral on the other (16Gbps).

Well, no because support for the 4k display is achieved by aggregating the lanes as the illustration shows above. The limitation that a daisy chain adds is still there on a dual port. Because 4k display signal is achieved by lane aggregation. It's a different topic really. What I'm interested in is the max theoretical I/O with one device to one port (the up 20Gb/s data transfer speed claim).

deconstruct60 · Aug 16, 2013

subsonix said:
The traffic is bound to the signaling speed. I have no idea what you mean by that last sentence. Thunderbolt v1 also had 20Gb/s if you count the DisplayPort.

There has to be a signal to be signaling speed.

A USB 3.0 controller is only going to put around 4Gb/s onto the Thunderbolt network . Perihperals don't get any faster than their native controllers if they are displaced in location by Thunderbolt.

If there is a freeway with a metering light that only lets cars onto the freeway once every minute and all the cars want to go two exits down the road, then their arrival rate is going be one car/min. Doesn't really matter if freeway speed is 55 , 65 , 85 mph. The flow on is limited so the flow off is also limited.

Tesselator · Aug 16, 2013

deconstruct60 said:
Low impact if not pushing them with GPGPU ( OpenCL or CUDA ) traffic perhaps. Likewise if not pushing much texture and/or model data over to GPU.

That doesn't make sense. OpenCL and CUDA are very very low bandwidth! Tested. Sure to fill the cards' respective 6GB with textures or vertex data might take 100ms instead of 80ms - but only if it were already stored in system RAM. And it would probably only need to do that once per job.

I'm pretty sure the fastest GPUs going never to almost never utilize x16 v3 (~16GB/s). I'm going to go ahead and say ya, Never! Further I'm going to say the same in regards to x16 v2 (8GB/s) - ya, never. We might wanna think so tho.

With an extremely heavy load x8 v2 (4GB/s) might top out... mmm... 1% to 3% of the time - not noticeable to users without a highly accurate stopwatch and an I/O meter. And again x8 v3 (~8GB/s) - no, never.

It's a nice dream tho.

subsonix · Aug 16, 2013

deconstruct60 said:
There has to be a signal to be signaling speed.

Yes, I understand the meaning of what is said, but not the relevance to the discussion.

deconstruct60 · Aug 16, 2013

VirtualRain said:
I think the added PCIe bandwidth they are talking about is the fact that TB2 removes the 10Gbps channel cap on the PCIe data.

But it also removes the segregation. It is only a cap removal if the Display Port traffic doesn't jump in and become a road hog. In some contexts, it can be a net loss in bandwidth if the video traffic is very high.

TB v2 is going to have to do more proactive bandwidth capping/allocating if it is to keep up with ischronous constraints with there are multiple concurrent users in the same direction.

Typically it isn't going to be a problem. Say editing 4K data streams. Largely the user is going to pull those off the disk and view them back out as video. Even if saving and viewing the data aspect is going from multiple streams down to one. So going back out, the data is smaller.

If 4K video data is a road hog the simple solution (at least on a six port Mac Pro) is not to put it on the TB network. Regulate it purely to a Display Port network. Problem solved.

deconstruct60 · Aug 16, 2013

Tesselator said:
That doesn't make sense. OpenCL and CUDA are very very low bandwidth! Tested. Sure to fill the cards' respective 6GB with textures or vertex data might take 100ms instead of 80ms - but only if it were already stored in system RAM. And it would probably only need to do that once per job.

You are largely mapping OpenCL and CUDA back into video game and narrow graphics programs. Also with a bit of "small enough to fit" to fit sprinkled on top. In most simulations of the real world, the vast majority of the model data is not immutable, static data like textures. Iterations of the model impact the data. The data doesn't necessarily doesn't all fit in local RAM ( or VRAM ) .

If actually doing real time stream analysis there is a steady stream of data from the outside world and results/analysis have to be pushed back out in a timely manner enough to make a difference.

GP in GPGPU is for "general purpose" not corner cases. GPGPUs are never going to be scalar only generalists, but it far broader problem space than immutable bulk data and vertexes.

I'm pretty sure the fastest GPUs going never to almost never utilize x16 v3 (~16GB/s). I'm going to go ahead and say ya, Never!

640K ought to be enough for everybody. *cough* OpenCL and other frameworks picking up uniform memory access capabilities is going to change that. You computation architecture wordview is looking in the rearview mirror. The foundation for uniform memory access is already in PCIe v3.

Lots of exiting software has been warped by the hardware contraints had to operate under... not that those were the optimal solutions/algorithms.

Tesselator · Aug 16, 2013

deconstruct60 said:
...

Well, that's the closest I'll ever get to you conceding when wrong so I'll happily take it.

deconstruct60 · Aug 16, 2013

subsonix said:
Yes, I understand the meaning of what is said, but not the relevance to the discussion.

Discussion of a situation that is unlikely going to exist? With two ports/connectors , the only way to get to zero PCI-e data traffic on one and 20Gb/s data traffic on the other is if the first port is making no requests. No read/write requests means traffic will be zero. That isn't going to happen for an extended period of time unless the chain is full of switched off devices.

If trying to say that if crank up the PCI-e data traffic on one port/connect up to 16Gb/s and then the other port will be starved.... that isn't going to happen. Thunderbolt is largely a switch. If resource sharing is initiated then bandwidth is going to get reallocated. For PCI-e data and a singular target, the size of the pie being sliced up is the target's PCI-e pie, not the Thunderbolt pie. Two different things.

There is not particularly wrong with the design at all. Pretending a switch is not a switch isn't going to give deep insight into how it works or in optimizing the utilization of it.

The 20Gb/s bandwidth of TB v2 isn't there just for DisplayPort data. It is not there just for PCI-e data. It is there for both. The same aggregate 20Gb/s was there for both in v1 they just interacted in a less dynamic fashion.

slughead · Aug 16, 2013

Tesselator said:
The only thing ya can't do through those devices is play the latest heaviest games over the newest GPUs

Really? I think the other thing you can't do is.. oh yeah, anything requiring more than 2GBps per device/controller.

Gosh, why do they even include PCIE 3.016x, apparently all we needed was a few busses running 1/8 the speed!

Do I need 36 GPUs which I get to use one at a time without physically unplugging them or do I need one/two GPU plus a 4 port SAS running at 40Gbps, plus maybe a 3GBps hard drive and a capture card?

Yes, I understand, you can take unused bandwidth and put it from one device to another. Yes, you can plug in more devices at a time. This does not make up for the individual limit on device throughput.

As a practical matter, there's not a ton of stuff that requires 2GBps, but the things that do are literally not useable under any circumstance over thunderbolt 2, this includes controller cards for large arrays, 3000MBps SSDs, and yes, GPUs.

subsonix · Aug 16, 2013

deconstruct60 said:
Discussion of a situation that is unlikely going to exist? With two ports/connectors , the only way to get to zero PCI-e data traffic on one and 20Gb/s data traffic on the other is if the first port is making no requests. No read/write requests means traffic will be zero. That isn't going to happen for an extended period of time unless the chain is full of switched off devices.

No, go back and re-read what VirtualRain said.

But since you highjacked here, let's do this.. The example is made to illustrate that there isn't enough backup behind to truly deliver 20Gb/s on two ports simultaneously. On Thunderbolt v1 this isn't an issue since more than 10Gb/s can not be sent anyway and the PCIe lanes divide perfectly between the ports of a two port controller.

deconstruct60 said:
If trying to say that if crank up the PCI-e data traffic on one port/connect up to 16Gb/s and then the other port will be starved.... that isn't going to happen. Thunderbolt is largely a switch. If resource sharing is initiated then bandwidth is going to get reallocated. For PCI-e data and a singular target, the size of the pie being sliced up is the target's PCI-e pie, not the Thunderbolt pie. Two different things.

There is not particularly wrong with the design at all. Pretending a switch is not a switch isn't going to give deep insight into how it works or in optimizing the utilization of it.

It's not largely a switch at all. It contains a switch (two actually) but so does the PCIe fabric on your mother board, but it's also so much more. You must have mentioned this fundamental about one hundred times by now. I know, and a knew before you mentioned it the first time.

With that out of the way. If you have a device on port 1 that is consuming 2GB/s then add another identical device to port 2, the resource then gets reallocated, and device 1 get's starved since half of the resource disappeared to port 2. In effect starving both ports, again not an issue on v1 since the resource actually can feed both ports at their full potential.

The issue is not pretending a switch is not a switch, it's that the resource is to small to actually deliver a consistent speed on all ports, irregardless on what controller, or if anything else is plugged in. Again in v1 this is not an issue.

deconstruct60 said:
The 20Gb/s bandwidth of TB v2 isn't there just for DisplayPort data. It is not there just for PCI-e data. It is there for both. The same aggregate 20Gb/s was there for both in v1 they just interacted in a less dynamic fashion.

Well it wasn't aggregated, and they didn't interact at all since one channel was dedicated to DisplayPort and the other to PCIe.

Now add up all this with what Apple is saying and the equation isn't balanced. The only way to get it balanced is with a one port controller (intel already have one port controllers backed by x4 PCIe http://ark.intel.com/products/67022/Intel-DSL3310-Thunderbolt-Controller), or a dual port controller backed by x8 PCIe. Which is why I have mentioned this, over and over and over.

slughead · Aug 16, 2013

deconstruct60 said:
No even close to being right. The Thunderbolt controllers are effectively switches for the TB network. If pump 20 Gb/s in one port they can pump the same 20Gb/s back out on the other port at full speed at the same time.

What they can't do is pump all 20 Gb/s inside of the device they reside in. that has little to do with whether can run the two external facing port at full speed.

....aaaand that's all I was saying. Can't run 2 devices doing the same thing at the same time.

And the reason why this is important: Tessalator is convinced you could arrange a RAID using all the plugs of the new Mac Pro and aggregating the speed. Clearly this will not go very far.

Thanks for all that.

VirtualRain · Aug 16, 2013

slughead said:
....aaaand that's all I was saying. Can't run 2 devices doing the same thing at the same time.

And the reason why this is important: Tessalator is convinced you could arrange a RAID using all the plugs of the new Mac Pro and aggregating the speed. Clearly this will not go very far.

Thanks for all that.

I agree you couldn't use all the plugs, but I believe you could run a RAID array across the three controllers and get 6GB/s of total bandwidth that way since from what I've seen, each controller has access to 4 PCIe 2.0 lanes. So... If for example you had a few SSDs capable of 2GB/s each (that can max out a x4 PCIe 2 bus) and you plugged one into a port for each of the three controllers, you could then RAID0 those and enjoy 6GB/s throughput. All while running a trio of 4K displays on the other three TB ports.

subsonix · Aug 16, 2013

slughead said:
....aaaand that's all I was saying. Can't run 2 devices doing the same thing at the same time.

And the reason why this is important: Tessalator is convinced you could arrange a RAID using all the plugs of the new Mac Pro and aggregating the speed. Clearly this will not go very far.

Of course you could do that. The question is about the total bandwidth available on all ports.

If you had to choose: Thunderbolt Vs Upgradeable GPU + PCIe slots?

Thunderbolt Vs Upgradeable GPU + PCIe slots?

Thunderbolt ports + Proprietary, non-upgradeable GPUs, NO free PCIe slots [new Mac Pro]

Four PCIe 3.0 slots sharing 40 lanes with NO thunderbolt at all

macrumors 68040

macrumors 603

macrumors 68040

macrumors 601

macrumors 603

macrumors 601

macrumors 603

macrumors 68040

macrumors 603

macrumors G5

macrumors G5

macrumors 601

macrumors 68040

macrumors G5

macrumors 601

macrumors 68040

macrumors G5

macrumors G5

macrumors 601

macrumors G5

macrumors 68040

macrumors 68040

macrumors 68040

macrumors 603

macrumors 68040

Our Staff