Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Varmann

macrumors regular
Original poster
Jan 3, 2010
156
73
I wonder if someone know about the technological possibilities with Thunderbolt.

The new MacPro will have 6 TB2 ports.
- Do we yet know if that is equal to a total bandwidth of 6x20=120 Gbit/s or 3x20= 60 Gbit/s (with switching)?

- Is it possible to aggregate several TB channels like it is with ethernet?
If it is, would the best use of multiple TB channels be to aggregate them to a "hub" and then balance the load of external devices (high performance storage, 4k displays, PCIe-boxes, 40Gbit Ethernet ) connected to this hub?

The reason I wonder is that storage today is increasing performance very fast. Today it is rather cheap to get PCIe connected SSD devices capable of >10 Gbit/s. Yesterday I read that Samsung is introducing XS1715 (NVM Express PCIe SSD, 1.6TB capacity, sequential read speed at 3,000MB/s, 740,000 IOPS). Of course, it is very expensive today, but it is never the less a single device needing a bandwidth of about 3 TB1 channels.

Without aggregating TB might be a bottleneck if the bandwidth is not increased significantly in TB versions 3 and 4.
 
First off, the total bandwidth will actually be around ~6GBps (~60Gbps). That means when running at full bandwidth, each port will be throttled by 1/2. Each controller card is only capable of 2 TB channels (10gbits each), and each has dual 2 channel port (2 divided by 4 = ____). The LGA2011 is only capable of 40 lanes, with 32 lanes taken up by the Two video cards, this doesn't leave much.

Second, aggregating should theoretically be possible, but according to some decent sources on this board, there will be a HEAVY loss. Two Thunderbolt 2 ports may aggregate at around 2-3GBps. (25-50% loss). It may be possible to create a software RAID with multiple <2GBps drive controllers without much loss, but linking the channels together wont work well.

For instance: a 3GBps SSD (PCIe only) was just announced. There is no way this would work at full speed with the New Mac Pro.
 
First off, the total bandwidth will actually be around ~6GBps (~60Gbps). That means when running at full bandwidth, each port will be throttled by 1/2. Each controller card is only capable of 2 TB channels (10gbits each), and each has dual 2 channel port (2 divided by 4 = ____). The LGA2011 is only capable of 40 lanes, with 32 lanes taken up by the Two video cards, this doesn't leave much.

Second, aggregating should theoretically be possible, but according to some decent sources on this board, there will be a HEAVY loss. Two Thunderbolt 2 ports may aggregate at around 2-3GBps. (25-50% loss). It may be possible to create a software RAID with multiple <2GBps drive controllers without much loss, but linking the channels together wont work well.

For instance: a 3GBps SSD (PCIe only) was just announced. There is no way this would work at full speed with the New Mac Pro.

LGA2011 isn't limited to 40 lane. The PCIe controller is inside the cpu and is cpu dependent. The i series are 40 lanes, the E serie is 80 lanes and the X serie CPU are 144 lanes.
 
LGA2011 isn't limited to 40 lane.

Apparentely Intel doesn't think so...... For example E5 2620

http://ark.intel.com/products/64594...E5-2620-15M-Cache-2_00-GHz-7_20-GTs-Intel-QPI

Max # of PCI Express Lanes 40

On the LGA 2011 socket, there is a max of 40 lanes.

. The i series are 40 lanes,

Again wrong unless being sloppy and implicitly limiting "i series" only to the LGA 2011 subset that is i7 (i7 x9xx ).

the E serie is 80 lanes

Wrong. E5 1600 series are limited to one socket. So even want to play the multiple socket game. If is still 40. Just like the i7 variants of the E5 1600.

If do want to play the multiple socket game the E5 4600 series can be up to 4 sockets. The E7 more than 4 is use the right chipset support configuration.


There is only a single socket in the new Mac Pro so wondering off into the swamp of multiple socket land isn't really addressing the thread's question.
 
Second, aggregating should theoretically be possible,

Not really.

First, for TB v2 they are already aggregated on the individual controllers. That is actually the change of TB v1 to TB v2, Intel bonded/aggregated the 10 Gb/s channels into 20 Gb/s. There isn't another possible level of aggregation on top of that.

Second, for ports on a single TB controller they are not independent. They sit on the same switch infrastructure. So trying to "tie" those two together somehow is moot for very similar reasons highlighted in the first point above. They already are tied together so applying that again isn't going to do much.

The notion of tying together the TB controller is equally whacked based on the same reality if the controller's core property of being a switch. You aggregregate ports not switches. Stacking up three independent Ethernet/Fiber Channel /etc. switches isn't going to get you a coherent virtual switch of combined bandwidth. The internal crossbars of each switch is independent (that is actually the mechanism that is used to do the bonding. ). You can't really use the mechanism on itself.


There could be some kind of Rube Goldberg kludge where some external device also had 3 TB controllers. Some sort of aggregating mechanism consumed the three independent streams from the Mac Pro and merged them inside the peripheral. I doubt that will ever appear any more than there are PCI-e cards that plug into multiple sockets at the same time.



~6GBps (~60Gbps)

60 / 8 ==> 7.5GBps. ( there are 8 bits in a byte)

However, that is TB bandwidth, not PCI-e data bandwith. They aren't necessarily the same thing. Especially if trying to pull it all from a single node on a TB network.

Max PCI-e data from a single TB node is pragmatically about x3 PCI-e v2. ( ~12Gb/s). Even the on paper theoretical x4 is only 16Gb/s.



Two Thunderbolt 2 ports may aggregate at around 2-3GBps.

Thunderbolt ports are not additive in bandwidth. It is a switch. You just can't go to a switch and add up the ports. It doesn't work that way.
 
Apparentely Intel doesn't think so...... For example E5 2620

http://ark.intel.com/products/64594...E5-2620-15M-Cache-2_00-GHz-7_20-GTs-Intel-QPI

Max # of PCI Express Lanes 40

On the LGA 2011 socket, there is a max of 40 lanes.



Again wrong unless being sloppy and implicitly limiting "i series" only to the LGA 2011 subset that is i7 (i7 x9xx ).



Wrong. E5 1600 series are limited to one socket. So even want to play the multiple socket game. If is still 40. Just like the i7 variants of the E5 1600.

If do want to play the multiple socket game the E5 4600 series can be up to 4 sockets. The E7 more than 4 is use the right chipset support configuration.


There is only a single socket in the new Mac Pro so wondering off into the swamp of multiple socket land isn't really addressing the thread's question.

I may have erred in stating that the limitation was in the CPU. It's something that I remembered reading about a while ago. I'm man enough to admit I was wrong on that point.

But I was responding to a general statement that LGA2011 based system is limited to 40 lanes which is false. Beside I wasn't talking to you, but I guess you never fail to butt in with your silly WoT... You must be a real joy at party.

The simple addition of 2 PLX PEX8747 Bridges will bring the x16 lane to 4 or
x16/8/8/8/8/8/8 like you can on this board:

http://www.newegg.com/Product/Product.aspx?Item=N82E16813157327

So it is possible to adress more than 40 lanes on a lga2011 based system.

You can read more about the bridge here:

http://www.plxtech.com/products/expresslane/pex8747
 
But I was responding to a general statement that LGA2011 based system is limited to 40 lanes which is false.

Yeah because there are actually going to be 48 in a normal system. The C600 series IOHub chipset has an addition x8 PCI-e v2 lanes. In the vast majority of host PCs with Thunderbolt controllers, that is exactly where the TB controller is attached. The Mac Pro is probably the first 'odd-ball' system(relative to deployed TB capable hosts) to actually attached TB controllers to the CPU to hit the market.

, but I guess you never fail to butt in with your silly WoT...

If you actually fact checked before writing your posts and even had evidence for what you claim you probably would not see as many replies.


The simple addition of 2 PLX PEX8747 Bridges will bring the x16 lane to 4 or
x16/8/8/8/8/8/8 like you can on this board:

The TB controller is in part a PCI-e switch. The existence of a TB controller on systems is another example of lane create through bandwidth sharing.

However, it would be highly dubious to add a PCI-e switch into order to add another switch (e.g, TB controller ). That should be a red flag that are you doing something substantially wrong in the system's design or objectives.


So it is possible to adress more than 40 lanes on a lga2011 based system.

But that doesn't add bandwidth. It dilutes it. The thread's core question is about bandwidth.
 
Yeah because there are actually going to be 48 in a normal system. The C600 series IOHub chipset has an addition x8 PCI-e v2 lanes. In the vast majority of host PCs with Thunderbolt controllers, that is exactly where the TB controller is attached. The Mac Pro is probably the first 'odd-ball' system(relative to deployed TB capable hosts) to actually attached TB controllers to the CPU to hit the market.



If you actually fact checked before writing your posts and even had evidence for what you claim you probably would not see as many replies.




The TB controller is in part a PCI-e switch. The existence of a TB controller on systems is another example of lane create through bandwidth sharing.

However, it would be highly dubious to add a PCI-e switch into order to add another switch (e.g, TB controller ). That should be a red flag that are you doing something substantially wrong in the system's design or objectives.




But that doesn't add bandwidth. It dilutes it. The thread's core question is about bandwidth.

But it's only you, you see. You can't help it...

The statement was that it was impossible to have more than 40 pcie lane on a lga2011 based system. This is false as demonstrated.

Also TB while being PCIe it is way bellow x16 bandwith wise. Personnaly I would go with 4 x16 (64 lanes) vs what TB offers which is way less.
 
. Yesterday I read that Samsung is introducing XS1715 (NVM Express PCIe SSD, 1.6TB capacity, sequential read speed at 3,000MB/s, 740,000 IOPS).

It is a somewhat odd duck. It is a 2.5" drive but doesn't have a SATA connector. It has the new SFF-8639 connector.

http://www.anandtech.com/show/6294/breaking-the-sata-barrier-sata-express-and-sff8639-connectors

It is x4 PCI-e but it is a SFF-8639 instance of x4 PCI-e v3 (not v2 which TB handles). Pragmatically it is a x8 PCI-e v2 device in terms of bandwidth.

In the Thunderbolt space the solution would be more so to use two independent PCI SSDs on two different TB controllers. At that point, aggregating them with RAID 0 effectively gets the same bandwidth. For storage, aggregation duties can just be shifted a bit to accomplish.

As mentioned in another post conceptually it is possible to make a peripheral with two (or three) TB controllers in it. For example, a 2 port (**) box that had two x4 PCI slots each one connected to a TB controller sitting behind each port. You could consolidate the power supply for the cards but there would be two cables out to the Mac Pro. You'd have to hook up to ports on different controllers.

If no other multiple TB controller host systems show up on the market I doubt such a box ever gets built.




At the core this isn't an aggregation issue but a PCI-e v3 versus v2 mismatch issue. Will some future version of Thunderbolt move to PCI-e v3? Eventually if it continues to grow. I wouldn't count on it any time in next 2-3 years though.

There are two external factors TB probably needs to go to v3 coverage. Affordable fiber cables. More v3.0 lanes allocated inside of standard systems (e.g. the mainstream IOHub chipset allocating v3 lanes ). Before that happens it doesn't make much sense for TB to move up because no infrastructure to move up to.

If SATA Express and SFF-8646 go mainstream in v3 (Gen 3) formats maybe that will lead the way. That isn't going to happen any time soon.



** if primarily intend to completely saturate the TB link there is no sense in pretending that are going to share bandwidth with devices further down the chain. Since two controllers, this way can also limit cost increase by going with a chain ender controller which are somewhat cheaper.
 
Disclaimer: I've not seen the system described below with my own eyes, but I think it's reasonable to trust the numbers I've been given.

A source at (company name withheld) in Hollywood had a massive array hooked up to an iMac via Thunderbolt in a "2-channel system" (that's how he described it... I think it means two TB1 connections) and it got about 2100MB/sec... bounces around 2050-2200 in the screen capture I saw.

Since then, they got a mysterious box brought in that has "six channels" of Thunderbolt (new Mac Pro, anyone?) and from THAT box, they've seen a maximum of only about 4600MB/sec. The source said it was theorized that the memory bandwidth was holding it back from the 6000MB/sec they were expecting to see.

Take that for what it's worth, given it's only information passed to me from someone who *may* be testing Thunderbolt bandwidth on a new Mac Pro.
 
So you know a guy at a company who knows a guy that may or may not be testing a yet to be released product. Not real credible. :rolleyes:
 
So you know a guy at a company who knows a guy that may or may not be testing a yet to be released product. Not real credible. :rolleyes:
I knew I could count on at least one person to think the info is worth zero. That's fine. I roll my eyes right back at you, big daddy. Soon enough, it won't be a secret any longer.
 
I knew I could count on at least one person to think the info is worth zero. That's fine. I roll my eyes right back at you, big daddy. Soon enough, it won't be a secret any longer.

There is nothing to back that up and anyone could easily make up information like that so I am not sure what you were expecting. If you want your story to be believable then it needs specifics.
 
There is nothing to back that up and anyone could easily make up information like that so I am not sure what you were expecting. If you want your story to be believable then it needs specifics.
I really don't care if you believe it, and I agree that anyone could make it up. I expected disdain from people such as yourself, but I also expected some discussion about the logic in the numbers as well.

I'm telling you what I've been shown in a screen capture from someone that *I* find credible. I tried to go see it for myself, but apparently, people freaked out when they saw tests being run on gear that "nobody was supposed to know about" at the facility, and access was further restricted. Despite this, info has been leaked out.

When the nMP comes out, you can buy one, and then look back on this old thread and say, "Well, how do you like that... he was right!"
 
When the nMP comes out, you can buy one, and then look back on this old thread and say, "Well, how do you like that... he was right!"

Just because the new Mac Pro is released does not confirm that you were right or that this was not made up.
 
Just because the new Mac Pro is released does not confirm that you were right or that this was not made up.
Haha, nice one. Sure.

Are you this skeptical of the shape of the Earth, too? I've not seen it from space, myself. :p
 
.... I think it means two TB1 connections) and it got about 2100MB/sec... bounces around 2050-2200 in the screen capture I saw.

Anything shipping that is sustaining over 2000MB/s in a demo is a rigged.

The thunderbolt controller in the currently deployed Macs are connected via a x4 PCI-e v2 line. The theoretical max on that is 2000MB/s. So if seeing numbers higher than that it is either rigged or the test is hitting the internal file cache. The former is just pure deception and the latter doesn't say much of anything about Thunderbolt. A USB 2.0 device whose data is in the RAM cache of the OS X is going to show high rates too.

You can goose TB v1 by aiming at a single host from two directions at once. Send ~1000MB/s of data down one port and ~1000MB/s data down the second port and hope the single 4x connection aiming at can juggle both with relatively small overhead.

"... Whereas most Thunderbolt storage devices top out at 800 - 900MB/s, Thunderbolt 2 should raise that to around 1500MB/s (overhead and PCIe limits will stop you from getting anywhere near the max spec). ... "
http://www.anandtech.com/show/7049/intel-thunderbolt-2-everything-you-need-to-know


Since then, they got a mysterious box brought in that has "six channels" of Thunderbolt (new Mac Pro, anyone?) and from THAT box, they've seen a maximum of only about 4600MB/sec.

Do same trick at the 2-3 different controllers. The box probably doesn't have TB v2 controllers (e.g., a prototype with v1 controllers as stand-ins. ). If are v2 controller they are relatively early engineering samples and won't necessarily work full blast. ( Even Intel's NAB demo with early engineering samples topped out at 1200MB/s )


Since Apple already did the sneak peek, prototype boxes leaving Cupertino would not be completely out of the question if secured in a room with extremely limited access.


What is missing from the hype here is how this "massive storage array" is physically structured. Quite likely this isn't a single unit array, but actually a collection of TB v1 storage logically treated as one storage array.
 
The statement was that it was impossible to have more than 40 pcie lane on a lga2011 based system. This is false as demonstrated.

What is false is the characterization above. His statement was that LGA2011 is limited to 40 (not the system the socket). It is. That isn't false. The link I provided from Intel confirms that.

The screw up you both made was that the LGA2011 socket is NOT the only source of PCI-e lanes in a LGA2011 host system. The CPU isn't going to work without the chipset. The chipset adds another 8.

Sure some of those are sucked up by USB 3.0 , 2 ethernet ports, and Wifi/Bluetooth, but there were more than 40 even in a complete system he was alluding to.




Also TB while being PCIe it is way bellow x16 bandwith wise. Personnaly I would go with 4 x16 (64 lanes) vs what TB offers which is way less.

If my aunt had balls she'd be my uncle. It would be much easier to blow 12 PCI-e v3 lanes on v2 limited TB controllers if there were two E5 inside the system and still provision 4 x16 slots. It would cost substantially more also.

In this universe though, the Mac Pro is a single E5 system. So 3 TB controllers (12) , 2 Ethernet (2) , Wifi/bluetooth (1) , USB 3.0 (1) , two GPUs (32) , and a PCI based SSD (probably 2) make for an oversubscribed PCI-e lane budget; total 50. At 48 lanes though, it's pretty close and manageable.
 
Here's some info on the array:

"We have some interesting pieces of hardware in a new test lab. This isn't the off the shelf boxes with Marvill or via controllers. These are PPC based SoC RAID controllers. The arrays are 24 15k 600GB SAS RAID 6, each with 96GB of DDR3 for cache. Each box has two arrays, two controllers. I assume they use a custom kext, as the two arrays are physically separate, other then being in the same rack chassis, yet they are presented to the system as a single drive...

They are in house built boxes...I also know that several of the chips on the boards are completely unmarked, so I have no idea what they are or who makes them."​

The source in question is another MacRumors member, so if he wants to expand, update or correct any information, I leave it up to him.
 
Thunderbolt ports are not additive in bandwidth. It is a switch. You just can't go to a switch and add up the ports. It doesn't work that way.

What's funny is that I was paraphrasing a post you made before--you were my source for this. You basically said that they don't aggregate well and threw out some numbers on how bad it would be.

I guess now you've changed your mind.

Edit: Oh I see, you're saying that you can't aggregate through a single port, but that you could theoretically have a device with three TB inputs and aggregating BW that way.... ? That's exactly what I was talking about.


----------

LGA2011 isn't limited to 40 lane. The PCIe controller is inside the cpu and is cpu dependent. The i series are 40 lanes, the E serie is 80 lanes and the X serie CPU are 144 lanes.

40 lanes / socket, as stated by other users.

----------

However, that is TB bandwidth, not PCI-e data bandwith. They aren't necessarily the same thing. Especially if trying to pull it all from a single node on a TB network.

So does a TB channel count as a "PCIe" lane on a system, then? My understanding is that it does.

Also, are you saying that the TB controllers on the nMP will count off the chipset's lanes and not the CPU?
 
Last edited:
It's absolutely a tetrahedron! This was leaked last February by NASA. :)

giorgio-tsoukalos.jpeg
 
- Is it possible to aggregate several TB channels like it is with ethernet?

I'm way behind on the TB tech, but ethernet trunking/link aggregation is not as straight forward as it seems, bar a few custom software vendors who have spent a lot of time trying to make it work (I'm talking specifically about streaming video).

I imagine it may be a similar story with TB if as you guys have said it is a switched controller.

Aggregating links does not mean you magically send a single stream of data across multiple channels and everything works fine... usually it's broken up and sent down each channel in separate parts and it takes a lot of trickery to re-assemble it at the other end.

Maybe TB works differently?
 
I'm way behind on the TB tech,

From the "Getting inside Thunderbolt" section of a article:

lightridge_thunderbold_inside_600px.png

[ http://www.tomshardware.com/reviews/thunderbolt-performance-z77a-gd80,3205-4.html ]

Those outputs on the TB switch are what Intel calls TB channels. Two of those are typically assign to one physical port. (there is a corner case controller that is meant for chain enders that only has one channel. However, that one channel limitation will probably get tossed in the TB v2 version. Still one physical port supported but minimum is pragmatically 2 channels with v2. Or there just won't be a v2 chain ender controller. It would be a waste on something like an inexpensive FW or Ethernet dongle device. )



but ethernet trunking/link aggregation is not as straight forward as it seems,

Not really. It is a standard.

http://en.wikipedia.org/wiki/Link_aggregation#Initial_release_802.3ad_in_2000

It has to be implemented, which isn't extremely trivial, but it is a well solved problem for local area networks.


bar a few custom software vendors who have spent a lot of time trying to make it work (I'm talking specifically about streaming video).

Standard link aggregation is transparent to the software layer. However, I have no doubt that some software folks have tried to re-invent the wheel and failed.

If this is about long distance aggregation over numerous intermediate switches that are unaware of the aggregation then perhaps, but that is a different problem for different reasons.


I imagine it may be a similar story with TB if as you guys have said it is a switched controller.

Somewhat similar but to keep network overhead complexity and overhead to a minimum ( and minimize latency and maximize user data delivered) TB switches only consist of two TB ports. That also keeps costs down. Folks wail on TB controllers being $20 (or so) but 10GbE controllers are $100+ . Ethernet can do fancier things with link aggretion at similar speeds but it also costs alot more too.


Aggregating links does not mean you magically send a single stream of data across multiple channels and everything works fine... usually it's broken up and sent down each channel in separate parts and it takes a lot of trickery to re-assemble it at the other end.

Largely depends upon which side of the aggregation functionality you are standing. If using the aggregation service then yes it just magically happens. There isn't whole lot of trickery in sequencing packets. TCP/IP solved that a long time ago. tag each packet with a sequence number and it isn't that hard to put them back into the correct order. Similar keep a logical/physical block mapping and striped RAID controller can get your data back.


Maybe TB works differently?

It is more so a mismatched user mental model. generally in Ethernet the ports on a host are independent in terms of bandwidth to the internals of the host. In Thunderbolt, they aren't.

In Ethernet a user can hook port 1 to switch A on subnet X and can hook port 2 to a switch on subnet Y. If want to put port 1 and port 2 on same subnet then hook them both to Switch C.

With Thunderbolt port 1 and port 2 are on the same "subnet". It is a switch itself. There is no node on the network that isn't a switch. Hooking the TB controllers together forms the LAN. There is a special role for the host PC's controller, so it is pragmatically hierarchical.

The generally completely independent ports on a TB device would be driven by multiple controllers. Ports 3 & 4 would be independent from 1 & 2 .

So it really isn't "port aggregation" as much as "controller aggregation" to push past the 20Gb/s bandwidth limit.

Neither TB , PCI-e , or Ethernet really have a concept of aggregating switches.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.