Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Boy, I really wish there was a higher resolution version of that block diagram.

Yeah no kidding... Is this official? :confused:

The key difference here is that it shows the USB hanging off the PEX switch instead of the PCH (Anand's assumption).

And if this is correct, it shows the SSD attached to the display GPU which blows any theory of interconnect pin constraints out the door.

EDIT: if you load the image into a tab, it's a bit more legible.

Other Noteworthy Observations:
- WiFi is actually PCIe 1.0 (still plenty of bandwidth though)
- USB3 is still only single lane despite the fact that it and the three TB controllers are sharing 8 3.0 lanes which is the equivalent of 16 2.0 lanes (thus they could have opted to give USB x4 without impacting TB performance - an odd choice to artificially limit it - maybe a limitation of the USB chipset they selected?)
- The Crossfire connection is noted as "CVO Connection". I wonder what that stands for.

Very interesting.

----------


Sorry, I missed this... Nice find! Some of what I thought was nonsense now makes sense. :eek:.

Where does this come from? Is there a higher res version anywhere?

Anyway, thanks for sharing.
 
Last edited:
Yeah no kidding... Is this official? :confused:

The key difference here is that it shows the USB hanging off the PEX switch instead of the PCH (Anand's assumption).

The system profiler used to show the PCI-e switching hierarchy and would have made guessing moot. But if official the suggestive that PCH throughput is an issue when coupled to this "super fast" PCI SSDs. They have idled all USB and SATA traffic on the PCH.

If an issue, they may continue to idle the PCH's USB controller even after it gets USB 3.0.


They could be using the x4 layout as a placeholder for later discrete USB 3.1 upgrade, but that is likely going to "sag" the throughput of the other TB controller if all 4 x4 controllers fully engage. Theoretically x8 PCIe v3.0 is equal to x16 v2.0 lanes but going to have switching and "uplift" losses. Dangling x1 off still leaves the equiv of x3 lanes worth of bandwidth to soak up the "losses".

If USB 3.1 controllers are just x2 PCIe v2 ( ~8Gb/s ) then perhaps. At just x2, that would leaves x2 lanes worth for soaking up losses.

Although dangling off of one single 8x PCIe bus may mean there is a problem with fully populating the TB daisy chains. May run out of addresses if use one for this USB controller. Pragmatically probably a non-issue since pratically no one is going to hook up the max number of devices anyway.




And if this is correct, it shows the SSD attached to the display GPU which blows any theory of interconnect pin constraints out the door.

there are around 300 pins. Interference on the connector ribbon was more likely a source issue than running out of actual pins.


- USB3 is still only single lane despite the fact that it and the three TB controllers are sharing 8 3.0 lanes which is the equivalent of 16 2.0 lanes (thus they could have opted to give USB x4 without impacting TB performance - an odd choice to artificially limit it - maybe a limitation of the USB chipset they selected?)

Just about all discrete USB controllers are x1 PCIe v2 . (they are all primarily designed to be hooked to the limited lane PCH chipset.)
 
Two points:
  1. I count 4 unused lanes on that diagram - look for them

Must be the "new" math. On the CPU it is listed 3 links coming off. x16 , x16 , and x8 ... that makes 40. Zero left over.

On the PCH. x4 to the SSD and three x1 . That makes x7 and therefore just one left free.

Completely immaterial that there are 3 "free" on the PLX switch ( the three that the USB 3.0 controller doesn't use). Those 3 aren't usable by anything else and are dramatically far away from the other 1 "free" one. So can't combine those into a coherent whole 4. those three would only be usable if added another x4 -> ( x1 , x1, x1 ,x1 ) swtich. The three are dead-enders in this block diagram.


2. There's nothing wrong with modest oversubscription - of course if you run a "bandwidth virus" you'll see a slowdown, but almost never in real use

LOL. Adding extra x4 PCIe SSDs wouldn't be "Modest oversubscription". going from a x1 USB 3.0 controller to a x2 USB 3.1 perhaps. However, most of the extras folks are typically suggesting like multiple 6Gb/s SATA connection , x4 SSDs , etc aren't really small (~ x1 PCIe v2 ) like deltas.

As for "bandwidth virus" ... one person's virus is another useful tool. (e.g., the TB threads and "but can't handle my x8 SATA card" examples. ) "Virus" is a silly characterization.
 
Two points:
  1. I count 4 unused lanes on that diagram - look for them
  2. There's nothing wrong with modest oversubscription - of course if you run a "bandwidth virus" you'll see a slowdown, but almost never in real use

Actually, now that I see this diagram, you're right... they could move the USB to the unused lane on the PCH and use those four PCIe lanes on the PLX to facilitate another SSD without impacting much at all.
 
Thanks for the explanation!

So the huge advantage of building a computer from E-series chips (Xeon) vs. i-series (i7, i5) is that Xeon has many more "lanes" of bandwidth which are called PCI?

The i7 range is not uniform. As of recently, the i7 x9xx are in the Xeon E5 derivatives. But yes, the Xeon E5 & E7 designs have higher core counts and bandwidth. Really doesn't particularly make much sense to do one without the other at a single technology and implementation process level.


Another is that the Xeon can take heat better so it wouldn't have to throttle down or throttle later compared to the i-series.

They don't necessarily take heat better. They are just design to take more power. More power means have to dissipate more heat but it isn't like they are designed to run better without fans or exposed in the middle of a hot desert.

That lower power budget allow them to run at a higher clock if permanently limit the number of cores running. If design is mostly targeted to 6-8 running concurrently then 4 cores can run higher with same amount of power.


And finally, Xeon use ECC RAM whereas i-series can't.

Again this is matter of features turned on. The Xeon E3 has the same PCIe limitations as the mainstream Core i-series do. x16 CPU and x8 on chipset. They do support ECC. ECC is consistant across Xeon products.

But not unique. Perhaps Intel may later elevate the product to Xeon branding but there is now an Atom that has ECC.

http://www.anandtech.com/show/6509/intel-launches-centerton-atom-s1200-family-first-atom-for-servers


Do you think the next nMP will have ECC RAM for the video cards?

If Apple can get away with selling mainstream GPU cards at FirePro/Quadro like prices I think they'll stick with playing the "cheaper Pro card" game. Given the launch success with backorders months after initial shipping, I would not be surprised if they stuck to that on the next design.

The OpenCL toolchain not being "best in class" and no ECC on the current offerings is highly indicative that top end computational users are not a top 10 priority for Apple. ( marketing hyperbole on web pages aside).

Perhaps Apple might add a 4th GPU option (high VRAM + ECC) to the line up but I doubt they are going to try to match the abilities of the more mainstream "Pro" cards across the broad spectrum of GPUs they offer.


Is there such a feature of ECC when writing to storage?

Called using a SSD. ;-) Any decent SSD is using ECC. Many HDDs use ECC. Disk sectors can have ECC. Like RAM these are typically transparent to the user.

But yes can add defacto ECC behavior to a file system so that get ECC protection from RAM cache all the way down to persistent data stored on drives. ZFS , Microsoft's RFS, and BTRFS file systems with checksums and redundant (mirror/RAID) data storage can detect and correct for errors.

Conceptually Apple could add something like that to CoreStorage over time but they haven't as of now.
 
Actually, now that I see this diagram, you're right... they could move the USB to the unused lane on the PCH and use those four PCIe lanes on the PLX to facilitate another SSD without impacting much at all.

It looks like the PCH has PCIe 2.0 rather than the PCIe 3.0 that the PCIe switch has, so the USB controller would suffer even more than it already does.

The curious thing with regard to the USB specifically (assuming the diagram is correct) is that the USB controller is only x1 PCIe, but is connected to a x4 connection on the PCIe switch.
 
It looks like the PCH has PCIe 2.0 rather than the PCIe 3.0 that the PCIe switch has, so the USB controller would suffer even more than it already does.

The curious thing with regard to the USB specifically (assuming the diagram is correct) is that the USB controller is only x1 PCIe, but is connected to a x4 connection on the PCIe switch.

As Deconstruct60 points out above, USB 3 controllers are all x1 v2 PCIe devices.

What's particularly odd, is that they put USB 3 on the PLX and all the PCIe networking on the PCH. It probably would have been easier from a PCB design perspective to put all the I/O off the PLX keeping all related traces local to the I/O board.
 
As Deconstruct60 points out above, USB 3 controllers are all x1 v2 PCIe devices.

What's particularly odd, is that they put USB 3 on the PLX and all the PCIe networking on the PCH. It probably would have been easier from a PCB design perspective to put all the I/O off the PLX keeping all related traces local to the I/O board.

So, something that is supposed to have 5 Gbit/s throughput per port, is actually limited to a theoretical limit of 4 Gbit/s aggregate across all ports. Why doesn't that surprise me? USB is such giant pile of crap for anything serious... But that doesn't really surprise me.

I'm guessing that typical PC motherboards have multiple controllers (like one controller for every 2 ports or something).
 
As Deconstruct60 points out above, USB 3 controllers are all x1 v2 PCIe devices.

What's particularly odd, is that they put USB 3 on the PLX and all the PCIe networking on the PCH. It probably would have been easier from a PCB design perspective to put all the I/O off the PLX keeping all related traces local to the I/O board.

Perhaps this bizarre configuration created a more aesthetically pleasing PCB design, thus, was deemed necessary.
 
So, something that is supposed to have 5 Gbit/s throughput per port, is actually limited to a theoretical limit of 4 Gbit/s aggregate across all ports. Why doesn't that surprise me? USB is such giant pile of crap for anything serious... But that doesn't really surprise me.

I'm guessing that typical PC motherboards have multiple controllers (like one controller for every 2 ports or something).

Yeah, this surprised me too, but I suspect PC's are the same... the limiting factor, especially in mainstream Intel systems, is the lack of PCIe lanes to work with. We're fortunate in that Apple has 48 lanes to work with between the CPU and PCH with this architecture. Mainstream boards might have to get by with half that. So while you could design it so that every USB port theoretically had 5Gbps, there will be choke points further upstream. For example, for years now, Intel's DMI interface between the PCH and CPU has been limited to 2GB/s and in a typical PC, that's for all SATA, USB, networking, audio, and peripheral PCIe devices. God forbid if you actually wanted to use all that at once.
 
It looks like the PCH has PCIe 2.0 rather than the PCIe 3.0 that the PCIe switch has, so the USB controller would suffer even more than it already does.

The USB 3.0 controller is only a PCIe 2.0 device. It doesn't particularly "suffer" in either location. It is low impact (on the other things attached to the switch) and close by so it got hooked up. The only way the USB controller would suffer is that the PCIe SSD had saturated the PCH. At the very top end speeds it is pretty close. Throw in the other I/O connected even more so.

The curious thing with regard to the USB specifically (assuming the diagram is correct) is that the USB controller is only x1 PCIe, but is connected to a x4 connection on the PCIe switch.

It is likely a property of the switch. It is configured to be x8 -> ( x4, x,4 , x4 ,x4 ) switch. There probably isn't a mode that gets three x4's and sprinkles the remainder around as individual x1's

Doing uplift into v3 for two pairs of x4 is going to be easier than chasing after 7 different downstream subnetwork/busses ( x4, x4, x4, x1 , x1 ,x1 ,x1 ,x1 ). More of them to split the x8 of bandwidth over and not symmetric either. The buffering is going to be alot more complicated.

Most install USB 3.0 cards are x4 slot oriented and vast majority them are only using x1 too. Same mismatch the aftermarket card would have been doing without the physical card.

----------

What's particularly odd, is that they put USB 3 on the PLX and all the PCIe networking on the PCH. It probably would have been easier from a PCB design perspective to put all the I/O off the PLX keeping all related traces local to the I/O board.

If he PLX switch delivers x4 then would need another PLX switch to "break" that into 4 x1 lanes. It is a simpler peephole layout problem (clustered everthing onto I/O board) but now is also more expensive. It is cheaper to pay someone once to layout the three lanes down to the PCH than to pay extra for extra components for every single Mac Pro shipped.

It is "cheaper" short range layout, but also taking bandwidth away from the TB controllers. If create latency and addressing hiccups on those controllers' expansion chains haven't really picked up a big win.
They already created addressing issues by clustering all TB controllers onto one bus anyway. (e.g., http://www.uaudio.com/support/uad/compatibility ).
 
If he PLX switch delivers x4 then would need another PLX switch to "break" that into 4 x1 lanes.

Yeah, that makes sense. I had assumed you could probably treat each of the lanes on that x4 switch output independently, but I'm sure you're right... to do that would require another PLX.
 
It is likely a property of the switch. It is configured to be x8 -> ( x4, x,4 , x4 ,x4 ) switch. There probably isn't a mode that gets three x4's and sprinkles the remainder around as individual x1's.

It can be configured with x1 output ports - the limit is 6 ports.

That means that x8 -> (x4, x4, x4, x2, x1) could be done (leaving one lane on the table). Or x8 -> (x4, x4, x4, x1) leaving three lanes wasted.

Still odd that Apple wastes one lane on the PCH and three lanes on the PLX, and has that nice empty space on the second GPU where an SSD socket could be put.
 
Still odd that Apple wastes one lane on the PCH and three lanes on the PLX, and has that nice empty space on the second GPU where an SSD socket could be put.

Totally agree. They really should move the USB to the unused lane on the PCH and then use the PLX for a second SSD. That would give you full unfettered RAID0 performance if you striped the two.
 
It can be configured with x1 output ports - the limit is 6 ports.
...
That means that x8 -> (x4, x4, x4, x2, x1) could be done (leaving one lane on the table). Or x8 -> (x4, x4, x4, x1) leaving three lanes wasted.

You can but it isn't going to buy much. It still has to do with switching constraints.

The first of those is going to bring higher latencies ( fixed buffer size spread more thinly over more ports along with smaller switch intervals) and the second isn't any materially different than just leaving it the last at x4 is just hooking up the USB 3.0 controller. ( The second also seems likely the USB 3.0 controller ends up with a better switch buffer too with minimal effort. The 4th as x1 seems likely to waste buffer as well as lanes since those are probably coupled in the design constraints. )

if were hooking to three x4 PCIe controllers margin latency increase would not be a big deal, but with TB what the design is doing is layering switches on top of switches. Using all 6 possible ports isn't particularly good idea anymore than trying to maximize the Thunderbolt daisy chain length being used when shorter chains are readily available. If "have to" max fine, but as normal mode it is dubious.


Still odd that Apple wastes one lane on the PCH and three lanes on the PLX,

Not all that odd. The 4th x4 is already there(so shorter, simpler PCB routing) and it siphons bandwidth away from the almost saturated PCH chipset. Why not? With no other internal storage an extremely high percentage of Mac Pro users are going to be using DAS/SAN storage which most likely will flow though this switch instead of the PCH. The leaves the PCH bandwidth free to maximize the internal SSD.


and has that nice empty space on the second GPU where an SSD socket could be put.

It is an empty space because both GPU boards use the same basic design. It is cheap to remove elements from a design that is already done. Take the finished display card's design and just "delete" the DisplayPort and SSD elements components from the PCB design that cost money. It is still going to work with minimal R&D to create it.

Long term if a future PCH picks x4 (or x8 ) PCIe v3 lanes they could "split" two x2 v3 bundles to support two SSD with no drop off. Or if Intel could squeeze another x2 v2 bundle ( killing off some SATA lanes which are just completely dead here anyway) could enable the 2nd SSD in a future version.


Sure if ignore all of the other design constraints can drag to PLX switch down to the logic/backplane board and and thin out the bandwidth just to hook up a x2 SSD on the 'Compute' GPU. Would make the "only want to stick things inside the box" crowd happier but those who have to fully leverage the external expansion, but probably won't be.
 
PCIe OtB?

A columnist for Mac Life wrote that PCIe OtB (outside the box) would take over and that TB will fade. He thought it was a mistake that Apple devoted so many outlets to TB. He didn't say it explicitly, but it sounds like he was saying TB will end up like FireWire.

What do you all think of PCIe OtB?
 
A columnist for Mac Life wrote that PCIe OtB (outside the box) would take over and that TB will fade. He thought it was a mistake that Apple devoted so many outlets to TB. He didn't say it explicitly, but it sounds like he was saying TB will end up like FireWire.

What do you all think of PCIe OtB?

Never heard of it. WTF is PCIe OtB? Isn't that what TB provides? I guess my ignorance says something about it. :eek:
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.