Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Thunderbolt Vs Upgradeable GPU + PCIe slots?

  • Thunderbolt ports + Proprietary, non-upgradeable GPUs, NO free PCIe slots [new Mac Pro]

    Votes: 61 32.4%
  • Four PCIe 3.0 slots sharing 40 lanes with NO thunderbolt at all

    Votes: 127 67.6%

  • Total voters
    188
However, if you plug the drive array into the second connector instead (still the same TB2 controller) will your drive array then run at full speed? Or, do you need to plug it into a whole separate TB2 controller to get full speed? (Is there any documentation that makes this clear or are people assuming stuff?)

No, there is no documentation that makes this clear, people are assuming stuff. For the current v1 Thunderbolt this is not a problem because the four PCIe lanes is enough to give 1Gb/s to both ports.

That is why it makes sense to only look at what is said about it by Apple and others at this point. I mean, we are only months or possibly weeks from a release anyway and then we will know. I suppose someone need to make a Thunderbolt2 equipped device first as well...
 
No, there is no documentation that makes this clear, people are assuming stuff.

Exactly. From the wording in 3 different advertisement type announcements 2 from Intel and one from Apple and by looking at the hardware specs, it seems pretty clear that the two connectors from each controller are independent and each capable of supplying 20Gb/s independently. So no slowdown in VirtualRain's 2nd example.

Maybe this is something I should get on the Intel site and ask about. ;) But every time I have a straight question for them like this they give me a run-around saying crap like: "Ask the MB manufacturer" or "Sorry we have a gag agreement with Apple so STHU and go ask them". Grrr...
 
Last edited:
We know the new Mac Pro is not expandable (as in you cannot add extra components), but they didn't say anything about upgradeable (replace existing components with better performing equivalents). The RAM certainly looks upgradeable, so there appears to be that distinction. The SSD also looks replaceable.

The GPUs are accessible and attached by bolts. I think they might also be upgradeable, although it will only be possible with other GPUs designed for the MP. Hopefully it will be possible to extend the useable life of a Mac Pro by adding components from the following generations.
 
Maybe this is something I should get on the Intel site and ask about. ;) But every time I have a straight question for them like this they give me a run-around saying crap like: "Ask the MB manufacturer" or "Sorry we have a gag agreement with Apple so STHU and go ask them". Grrr...

That's an interesting idea :D but as you say they are likely to have an excuse or refer you to join some membership area.

A long-time Intel insider and technical writer John Montgomery, writes this:

"Expansion (Thunderbolt 2): Six Thunderbolt 2 connections, each with 20Gbps bandwidth"

When listing out the nMP specs in one of his Tech blog posts.

.

Interesting, it's good to know it's not just you and I who believe this to be the case. Also, Deconstruct60 mentioned an endpoint controller previously, in any case that would need to have four lanes one port to support 20Gb/s. It would be odd if the endpoint is more capable than the host the way I see it.
 
Last edited:
  1. That's an interesting idea :D but as you say they are likely to have an excuse or refer you to join some membership area.
  2. Interesting, it's good to know it's not just you and I who believe this to be the case. Also, Deconstruct60 mentioned an endpoint controller previously, in any case that would need to have four lanes one port to support 20Gb/s. It would be odd if the endpoint is more capable than the host the way I see it.

  1. There's an open area - or I some how have membership to a large group of sub-boards there. But trying to get info out of them is like pulling teeth. :p
  2. Right, TB1 devices will not operate at TB2 speeds. It'll work the same as USB works in that a USB2 device on a USB3 buss operates at USB2 speeds. That said, many manufacturers are saying they will offer to upgrade their adapter boards for a small fee - probably less than the cost of whatever the new TB2 (whole device) would cost - says several people's tech-blogs.
 
To a specific host or specific peripheral the data has to travel through a x4 PCIe v2 link. That link theotrically maxes out at 2GB/s ( 4 * 500MB/s ). That's 16Gb/s if convert to bytes.
I don't follow.

Typo/error in last word in that should say bits not bytes to match the symbol (bits). The 2GB/s of the previous sentence is converted to the 16Gb/s in the second. All the 8b/10b encoding adjustment is already in there by starting with the 500MB/s figure.


You keep pretending that you know something about this new controller which no-one has seen.

Intel hasn't seen the new TB v2.0 controller.... Not! Your comment is immaterial if folks have discussed this with Intel. Which Anantech.com did :

".... Thunderbolt 2/Falcon Ridge still feed off of the same x4 PCIe 2.0 interface as the previous generation designs. Backwards compatibility is also maintained with existing Thunderbolt devices since the underlying architecture doesn't really change. ... "
http://www.anandtech.com/show/7049/intel-thunderbolt-2-everything-you-need-to-know

The x4 PCIe v2 throughput is very real constraint in both v1 and v2 Thunderbolt. The underlying switching architecture is basically the same.


Imagine the following where both devices gets data from the host:

[host] <---> [device 1] <---> [device 2]

If device 1 consumes 1GB/s and device 2 consumes 2GB/s, then device 2 can never get 2GB/s because the link between the host and device 1 needs to carry 3GB/s of data for that to happen. That is unrelated to what we are talking about, but it's one reason to use the "up to" wording.​

Generally, the same issue if you just substitute 'PCI-e switch' for 'host' above and 'port 1' , 'port 1' for device 1/2 above and oriented the connection to the PCI-e switch' correctly.

On a PCI-e network, data traffic doesn't have to go to the host. A PCI-e device can transfer data to another PCI-e device. That device-to-device transport is what could be utilized to fully saturate the TB v2 network presuming no DisplayPort traffic was also a consumer of the same bandwidth. Device 1 would just have to be a consumer of data from device 2.

However, pragmatically for most of these Thunderbolt vs. internal slot discussions it really is about aggregating data down to a single choke point. In that context, it is the choke point's bandwidth, not Thunderbolt's, that is the issue.

You obviously do not know any of this.

Chuckles. Right because a 2 socket switch is going to cost the same as 4 / 6 / 24 port one. You can go back to the quote above where Intel informed the author that the basic architecture didn't change. Where is the source backing up your hand waving that it has radically changed?
 
I asked that Intel Insider guy this:
In your article you listed the Thunderbolt 2 specification on the New MacPro as:

- “Expansion (Thunderbolt 2): Six Thunderbolt 2 connections, each with 20Gbps bandwidth”

May I ask what your source is and why you think each of the 6 connections can independently support the full 20Gb/s when there are only three controllers? I do believe this is correct but there is some debate over this point in several forums and I have not been able to locate any documentation directly specifying this is indeed the case. Many people interpret the casual jargon from both Apple and Intel, used to describe TB2 as possibly meaning that each pair within the 6 ports share the 20Gb/s bandwidth.

Which is it? Is there a *total* of 60Gb/s or 120Gb/s from the 6 ports (3 controllers) offered on the MacPro?

Is it 3 controllers with 6 20Gb/s independent connections or 3 controllers at 20Gb/s offering two connections each?

Thanks!

Posted by Tesselator Tess on August 16, 2013 at 6:26 am
And he replied:


I think the confusion comes from the word “connector” as opposed to “controller”. I find it very clear that this is about each connector.

Regarding the current thunderbolt implementation:

“A Thunderbolt connector is capable of providing two full-duplex channels. Each channel provides bi-directional 10 Gbps of band-width. ” (https://thunderbolttechnology.net/tech/how-it-works)

For thunderbolt 2:

“It is achieved by combining the two previously independent 10Gbs channels into one 20Gbs bi-directional channel that supports data and/or display. Current versions of Thunderbolt, although faster than other PC I/O technologies on the market today, are limited to an individual 10Gbs channel” (http://blogs.intel.com/technology/2...ndwidth-enabling-4k-video-transfer-display-2/)

Of course, could be wrong — but this seems really clear to me.

Posted by John Montgomery on August 16, 2013 at 9:47 am​
 
That's interesting Tesselator, that's how I read it as well.

It just strikes me as very odd if one port can completely starve the second on the host.



Intel hasn't seen the new TB v2.0 controller.... Not! Your comment is immaterial if folks have discussed this with Intel. Which Anantech.com did :

".... Thunderbolt 2/Falcon Ridge still feed off of the same x4 PCIe 2.0 interface as the previous generation designs. Backwards compatibility is also maintained with existing Thunderbolt devices since the underlying architecture doesn't really change. ... "
http://www.anandtech.com/show/7049/intel-thunderbolt-2-everything-you-need-to-know

The x4 PCIe v2 throughput is very real constraint in both v1 and v2 Thunderbolt. The underlying switching architecture is basically the same.

Intel has seen it, they have made it... We have been through this before, how many ports does it have? Intel has both made single and dual port host controllers previously. An endpoint controller have one port, to support 20Gb/s it needs four PCIe lanes, if the host controller also have one port it would also get four PCIe lanes.


Generally, the same issue if you just substitute 'PCI-e switch' for 'host' above and 'port 1' , 'port 1' for device 1/2 above and oriented the connection to the PCI-e switch' correctly.

On a PCI-e network, data traffic doesn't have to go to the host. A PCI-e device can transfer data to another PCI-e device. That device-to-device transport is what could be utilized to fully saturate the TB v2 network presuming no DisplayPort traffic was also a consumer of the same bandwidth. Device 1 would just have to be a consumer of data from device 2.

However, pragmatically for most of these Thunderbolt vs. internal slot discussions it really is about aggregating data down to a single choke point. In that context, it is the choke point's bandwidth, not Thunderbolt's, that is the issue.

Yes, but this isn't related to the discussion of 20Gb/s bandwidth per port (or not). It really is the same on the mother board on the PCIe fabric, it's a network with switches, that can be saturated.. It's no news.


Chuckles. Right because a 2 socket switch is going to cost the same as 4 / 6 / 24 port one. You can go back to the quote above where Intel informed the author that the basic architecture didn't change. Where is the source backing up your hand waving that it has radically changed?

It's assumptions, that's why I made the comment, you can not know if you are not working with this at intel or Apple. Intel already have single port host controllers, that doesn't change the "architecture".
 
No, there is no documentation that makes this clear, people are assuming stuff.

Just go back to post 161 in this thread. Diagram makes it quite clear.

The distinction in the second part of the Virtual Rain's example is one set of traffic is Display Port on 1/2 of those TB channels and one set is PCI-e data on a different set of those TB channels (on different socket). Thunderbolt controllers have two source/sinks that pass through the Thunderbolt switch. If one port is using just one of those source/sinks and the other port is using the other there is nothing at all that shows why the TB switch would be a blocking constraint.

The independence is primarily being driven by the difference in data types and the internal switch not the inherent property of the socket/connector's connection to bandwidth to the TB switch.
 
Just go back to post 161 in this thread. Diagram makes it quite clear.

I know of the diagram, in fact I have posted it my self before we ever had these discussions. The diagram is showing a dual port device controller. It was released by intel when Thunderbolt 1 was released, it does a good job of giving a schematic overview of what Thunderbolt is.

I'm also aware that Thunderbolt carries DisplayPort and PCIe, no need to keep repeating this.
 
The independence is primarily being driven by the difference in data types and the internal switch not the inherent property of the socket/connector's connection to bandwidth to the TB switch.

No. This part Intel made really clear.

The 2 connecters can be both DATA
The 2 connectors can be both Display
The two connectors can be 1 display and the other data
The two connectors can be Data+display on one and Data+display on the other.

Are you using TB1 specs to describe TB2 or something?

What they didn't make so clear (I mean PERFECTLY CLEAR) is whether or not that 20Gb/s is per controller or per connector. Although all the language used so far IMPLIES that it is indeed per connector (2 connections per controller). This obviously would not apply to language describing single connector controllers. ;)


.
 
Last edited:
it seems pretty clear that the two connectors from each controller are independent and each capable of supplying 20Gb/s independently.

Not true. The Thunderbolt controller is basically composed of two/three switches. In v1 a PCIe and Thunderbolt one. In v2 pragmatically need to conceptually add another since DiplayPort v1.2 also supports multiplexed traffic.

No port on a switch is 100% independent of the others. A switch's internal crossbar is a shared resource. The ports are connected to one another; that is the whole point of a switch.

For example that probably more folks have encountered, a switch with 5-8 1GbE ports has ports that deliver up to 1GbE data. But those ports are not independent. Hook a server with 1 GbE port to the switch and it can receive/send 1 GbE over that link. Hook 4 clients that make demands all at the same time to that client and they will not see 1 GbE bandwidth throughput to the server. They will all be timesliced into sharing that 1GbE throughput to that single target. During their timepslice they may see up to 1GbE but the timeslice is limited. So the effective bandwidth is less.

Point 8 active concurrent clients at the same 1GbE link and the effective bandwidth is going to go down even more. At an even higher rate because the switching overhead is going to grow. ( internal crossbars are typically much better at hooking up multiple point to point connection, not everyone beating on a single port. )

If all the 8 clients have sporadic connections that don't overlap then the effective bandwidth is close to 1GbE. That is far more driven by the clients being decoupled/independent ( in time) though; not the ports on the switch.


High concurrent workloads is the distinction you are not making when you ask folks about Thunderbolt. The rate for low concurrency/activity is going to be close to 20Gb/s in v2. But that is relatively lightweight network traffic load. Actually try to fill that v2 network to 100% capacity everywhere on the network and not going to see that. You'll loop in choke points outside the network (crossbar limitations , host bandwidth , etc. )
 
What I think you're missing is that the switch is 4 (four) 10Gb channels. That's two full bidirectional 20Gb/s connections per controller. ;)

4x10 = 40
2x20 = 40

:D


As far as total bandwidth we won't know the full story till release but Intel showed an early implementation (slower they said) which was doing a 2K monitor (Display) and a 1.2GB/s (Data) off of one connection (one port). Experts claim it could be as low as 1.5GB/s per port. But I guess it will be closer to 1.7 as TB1 could achieve 850MB/s under the best conditions.
 
Last edited:
No. This part Intel made really clear.

The 2 connecters can be both DATA
The 2 connectors can be both Display
The two connectors can be 1 display and the other data
The two connectors can be Data+display on one and Data+display on the other.

All of which have absolutely nothing to do with accounting for the bisection bandwidth of the switches inside the controller.

Are you using TB1 specs to describe TB2 or something?

Only where there have been acknowledge to be the same. TB v2 isn't all that different from v1. Same stuff repackaged slightly different to do more aggregation.




What they didn't make so clear (I mean PERFECTLY CLEAR) is whether or not that 20Gb/s is per controller or per connector.

Different sub-components of the TB network have different top end bandwidth. Any one of those subset is not the same thing as measuring end to end data transfer.

The confusion folks seem to be stirring the pot with is flip-floping between scope. If narrow the scope to only look at a single port at time then maybe can convince yourself that it is 20Gb/s and completely independent. Actually, look at whole deployed network bisection bandwidth and that is an entirely different scope and a very different picture of independence.

Cherry picking the scope does very little if talking about overall system throughput.
 
This is a good explanation of how TB1 works with two ports on the rMBP. According to this, TB supports the full bandwidth of TB1 on both ports. Hence, there's no reason to believe that has changed for TB2 (since all they did was aggregate the channels on a given port).

http://www.anandtech.com/show/6023/the-nextgen-macbook-pro-with-retina-display-review/11

However, while there is a total of 120GB/s of TB bandwidth available across 6 ports, we have to keep in mind that not all of that is accessible by peripherals. It's important to note that each controller is constrained by only having x4 PCIe 2.0 lanes (2GB/s). So for things like drive arrays and other PCIe based peripherals, there's really only a total of 6GB/s (48Gbps). The remaining TB2 bandwidth is only consumable by DisplayPort signals.
 
Intel has both made single and dual port host controllers previously. An endpoint controller have one port, to support 20Gb/s it needs four PCIe lanes, if the host controller also have one port it would also get four PCIe lanes.

They almost all have x4 PCI-e. The variation on the single connector/port/socket models is how many TB channels they support. At this point there are two variants. One set supports two channels ( and could conceptually go to 20 Gb/s if those are bounded in v2) and on set supports just one channel and won't do 20G/s because there aren't two channels to bond/combine. [ The one channel Port Ridge L2011 has both a chopped down Thunderbolt switch and chopped down PCI-e switch (just x2 PCI-e v2).

http://ark.intel.com/products/66003/Intel-DSL2210-Thunderbolt-Controller

Surprise , surprise, surprise, it is also less expensive. Just $5.25 versus $9.40-$13.00 for the alternatives
http://ark.intel.com/products/series/67021 ]


It's assumptions, that's why I made the comment,

Assumptions that simpler switches aren't less expensive? Sure chuckle. Simpler designs are generally cheaper. Even more so when it is the same general speed and process implementation. There is tons of quantitative evidence to back that up. For example, the TB controllers example above.
 
This is a good explanation of how TB1 works with two ports on the rMBP. According to this, TB supports the full bandwidth of TB1 on both ports. Hence, there's no reason to believe that has changed for TB2 (since all they did was aggregate the channels on a given port).

http://www.anandtech.com/show/6023/the-nextgen-macbook-pro-with-retina-display-review/11

However, while there is a total of 120GB/s of TB bandwidth available across 6 ports, we have to keep in mind that not all of that is accessible by peripherals. It's important to note that each controller is constrained by only having x4 PCIe 2.0 lanes (2GB/s). So for things like drive arrays and other PCIe based peripherals, there's really only a total of 6GB/s (48Gbps). The remaining TB2 bandwidth is only consumable by DisplayPort signals.

I was following you up till the bold there. Nowhere have I read that to be the case except on this forum by end users. Who says it's not PCIe x4 per connector? Two x4 PCIe connections per controller?

Who? Link me. :)

And don't link me to a single connection controller. :p


Thee was even a person here recently who calculated all the lanes needed for all the components and ports in the nMP and there were enough for each port (6 total) to have x4 lanes each. <shrug>

I wonder where that post is? :p
 
Last edited:
I was following you up till the bold there. Nowhere have I read that to be the case except on this forum by end users. Who says it's not PCIe x4 per connector? Two x4 PCIe connections per controller?

Who? Link me. :)

Haha... :) Have a look at that block diagram on the Anandtech page I linked above.
 
This is a good explanation of how TB1 works with two ports on the rMBP. According to this, TB supports the full bandwidth of TB1 on both ports. Hence, there's no reason to believe that has changed for TB2 (since all they did was aggregate the channels on a given port).

Not really. That is partially because there is a fat tree relationship between both ports being capped at 10Gb/s for the PCI-e data he is testing and the x4 PCI-e v2 link being about 16Gb/s max. From the article

"... I suspect if I had another Pegasus SSD array I’d be able to approach 1800MB/s, ... "

In other words, approach under the PCI-e interface boundary.

If doing a "read" driven test along with a display port out traffic they are heading in different directions and decoupled.

Thunderbolt v2 still has the exact same x4 PCI-e v2 interface as v1. Same choke-point. Only now can just use one port/socket/connect to get to it.
 
I was following you up till the bold there. Nowhere have I read that to be the case except on this forum by end users. Who says it's not PCIe x4 per connector?

Insane. The bold text says controller not connector.

As for external to forum sources to the controller the intel ark pages should suffice ( at least for 3xxx series ).

http://ark.intel.com/products/series/67021

and the

http://www.anandtech.com/show/7049/intel-thunderbolt-2-everything-you-need-to-know
 
They almost all have x4 PCI-e. The variation on the single connector/port/socket models is how many TB channels they support. At this point there are two variants. One set supports two channels ( and could conceptually go to 20 Gb/s if those are bounded in v2) and on set supports just one channel and won't do 20G/s because there aren't two channels to bond/combine. [ The one channel Port Ridge L2011 has both a chopped down Thunderbolt switch and chopped down PCI-e switch (just x2 PCI-e v2).

http://ark.intel.com/products/66003/Intel-DSL2210-Thunderbolt-Controller

Surprise , surprise, surprise, it is also less expensive. Just $5.25 versus $9.40-$13.00 for the alternatives
http://ark.intel.com/products/series/67021 ]

Ok, so what is the problem then, it's a question about how many ports they have, only.. Almost all controllers are about the same price anyway ~$10. How much do you think $30 extra affect the end price of a Mac Pro?

That's some very interesting intel docs btw, in that there are price info readily available..


Assumptions that simpler switches aren't less expensive? Sure chuckle. Simpler designs are generally cheaper. Even more so when it is the same general speed and process implementation. There is tons of quantitative evidence to back that up. For example, the TB controllers example above.

No, assumptions about it's simplicity and price or any statements beyond what can be inferred from a block diagram.
 
OK, I read it. I'm still confused tho. That's not Falcon Ridge (MacPro). They're talking about Cactus Ridge in that article (LapTop).

http://en.wikipedia.org/wiki/Thunderbolt_(interface)#Controllers

Here's some additional reading on the difference between Cactus and Falcon...
http://www.anandtech.com/show/7049/intel-thunderbolt-2-everything-you-need-to-know

(see the 2nd to last paragraph)...
Thunderbolt 2/Falcon Ridge still feed off of the same x4 PCIe 2.0 interface as the previous generation designs. Backwards compatibility is also maintained with existing Thunderbolt devices since the underlying architecture doesn't really change.

Here's my attempt to explain/understand it...

The only thing Falcon Ridge does over Cactus Ridge is aggregate the two 10Gbps TB channels on each connector/cable into a single 20Gbps channel so that it can pass 4K DisplayPort signals.

In more details... With TB1, one channel is reserved for PCIe, the other for DisplayPort. 10Gbps for each type of signal. The problem with that is a 10Gbps channel is not enough bandwidth for a 4K display signal (which is about 16Gbps). So in order for Intel to support 4K displays, they needed to combine the two 10Gbps channels in TB1 into a single 20Gbps channel, and they called that TB2. Now instead of PCIex4 and DisplayPort having their own 10Gbps channel, they are now muxed together on a single 20Gbps channel.

Now, even though the controller can toggle each connector at 20Gbps, it still appears that it only has a PCIex4 input from the computer to send across either of those connectors. So I assume (don't know this for sure) that it's switching PCIe x4 across both connectors. So you could hook up a x4 peripheral to either connector and it would work at full speed as long as they weren't trying to saturate the bus at the same time. If you hooked up a x4 peripheral to both connectors at the same time and they were both saturating their bus, they would be fighting for that same x4 connection with the computer and would be bottlenecking each other. It's the only way that I can see it making sense.

EDIT: With this design, three TB2 controllers in the new Mac Pro would utilize a total of 12 PCIe 2.0 lanes (x4 for each controller). This makes sense from a PCIe lane budget perspective...

Lanes available: 40 PCIe 3.0 lanes on CPU, 8 PCIe 2.0 lanes on PCH
- GPU 1 = 16 lanes (3.0)
- GPU 2 = 16 lanes (3.0)
- TB Controller 1 = 4 lanes (3.0 or 2.0)
- TB Controller 2 = 4 lanes (2.0)
- TB Controller 3 = 4 lanes (2.0)
- PCIe SSD 1 = 2 lanes (3.0)
- PCIe SSD 2 (?) = 2 lanes (3.0)

That's 48 lanes which is all the system has.
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.