nMP GPU routing

Cubemmal · Apr 4, 2014

AFAIK there's still an open question as to how the nMP routes the GPU's. Anandtech wasn't able to figure it out. Please correct me if anybody has done this, but I believe all we know is that there are two GPU's which multiplex to three TB2.0 busses, which are further multiplexed to six TB ports.

I've done some checking. I downloaded the free GPU stress test too here. My nMP has three Cinema monitors ...

Hex core D700 nMP
Two TB which are on one bus
An older 27" Cinema mDP on another bus
A third bus which is dedicated to data (Pegasus2)

I would hope that OS X would (somehow) dedicate one GPU to the older Cinema (center monitor), and the other GPU to the outer two monitor. However, running several instances of the GPU stress test seems to indicate otherwise. No matter on which monitor the windows are placed they perform equally FP wise (15-30 FPS on these tests).

OK, so I launched the developer tool "Open GL Monitor" and looked at the AMDRadeonX4000GLDriver for FirePro D700 #1 and #2. The first sign is that the tool shows one of the GPU's as being "Offline". In Cocoa speak that means it's the "Offline Renderer", i.e. it's not routing to any display. Well, could mean something different, let's sample some data.

There are many parameters to look at, I then checked video RAM used, Contexts and Surfaces. Uh-oh, these also showed that one of the cards was being heavily used, while the other was sitting idle (I can't tell which corresponds to which in the Renderer Info window).

This seems to indicate that Apple simply plumbed one GPU to Thunderbolt to drive all your displays, and the other is always offline. The evidence is looking pretty clear. If they didn't then OS X is doing a poor job of allocating GPU's, simply taking the dumb approach of having one do all the heavy lifting while the other sits idle.

What this means is, unless you use FCX extensively, one of your expensive GPU's is probably sitting idle most of the time. I'm disappointed, again Apple drops the ball. I'd like to know if the second GPU is plumbed to Thunderbolt at all?

I'll do more tests later, Civ uses offline rendering too.

joe-h2o · Apr 4, 2014

This has been known since the launch of the nMP - the second GPU is for compute power only (under OS X), not for driving a display.

You need applications that take advantage of this, like FCPX, to see the benefit of the second GPU at the moment.

Cubemmal · Apr 4, 2014

joe-h2o said:
This has been known since the launch of the nMP - the second GPU is for compute power only (under OS X), not for driving a display.

How so? It has been assumed, but I've not seen any proof or validation from Apple. Anandtech tried to figure it out and didn't, and they write the most comprehensive reviews on the planet.

Perhaps it was obvious, please show us some links.

antonis · Apr 4, 2014

Cubemmal said:
AFAIK there's still an open question as to how the nMP routes the GPU's. Anandtech wasn't able to figure it out. Please correct me if anybody has done this, but I believe all we know is that there are two GPU's which multiplex to three TB2.0 busses, which are further multiplexed to six TB ports.

I've done some checking. I downloaded the free GPU stress test too here. My nMP has three Cinema monitors ...

Hex core D700 nMP

Two TB which are on one bus

An older 27" Cinema mDP on another bus

A third bus which is dedicated to data (Pegasus2)

I would hope that OS X would (somehow) dedicate one GPU to the older Cinema (center monitor), and the other GPU to the outer two monitor. However, running several instances of the GPU stress test seems to indicate otherwise. No matter on which monitor the windows are placed they perform equally FP wise (15-30 FPS on these tests).

OK, so I launched the developer tool "Open GL Monitor" and looked at the AMDRadeonX4000GLDriver for FirePro D700 #1 and #2. The first sign is that the tool shows one of the GPU's as being "Offline". In Cocoa speak that means it's the "Offline Renderer", i.e. it's not routing to any display. Well, could mean something different, let's sample some data.

There are many parameters to look at, I then checked video RAM used, Contexts and Surfaces. Uh-oh, these also showed that one of the cards was being heavily used, while the other was sitting idle (I can't tell which corresponds to which in the Renderer Info window).

This seems to indicate that Apple simply plumbed one GPU to Thunderbolt to drive all your displays, and the other is always offline. The evidence is looking pretty clear. If they didn't then OS X is doing a poor job of allocating GPU's, simply taking the dumb approach of having one do all the heavy lifting while the other sits idle.

What this means is, unless you use FCX extensively, one of your expensive GPU's is probably sitting idle most of the time. I'm disappointed, again Apple drops the ball. I'd like to know if the second GPU is plumbed to Thunderbolt at all?

I'll do more tests later, Civ uses offline rendering too.

There are some posts here from people that develop applications stated what exactly is the case. In short: The GPUs are indeed connected on h/w level. In windows where the drivers have enabled the so-called Crossfire X, the GPUs are equally sharing the load as long as the driver offers a proper profile for the application used (otherwise, only one GPU is used also).

On OS X, the Crossfire Pro is implemented, which allows to each application to "decide" what to do with each one of the GPUs. It also allows the unequal usage of the GPUs. However, this means that nothing happens automatically on its own. This also means that each application should be aware of the existence of both GPUs in order to take advantage of the second one. Every time, one GPU is responsible for display while you can assign tasks to the other (that sits idle otherwise).

If more applications will start support this in the near future, or the ATI drivers start implementing the Crossfire X as well, remains to be seen. The h/w is there, though.

Cubemmal · Apr 4, 2014

antonis said:
There are some posts here from people that develop applications stated what exactly is the case. In short: The GPUs are indeed connected on h/w level. In windows where the drivers have enabled the so-called Crossfire X, the GPUs are equally sharing the load as long as the driver offers a proper profile for the application used (otherwise, only one GPU is used also).

Ah! I'm a software developer too, and before writing some code I'm checking into this via this thread. I know about Crossfire support, however you have to understand how that works, which is that the load is shared between GPU's. What about the displays? Has anybody verified what happens under Windows, in a multi-display environment are the displays spread out across the GPU's? Regardless I"m mainly concerned with OS X in this post ...

On OS X, the Crossfire Pro is implemented, which allows to each application to "decide" what to do with each one of the GPUs. It also allows the unequal usage of the GPUs. However, this means that nothing happens automatically on its own. This also means that each application should be aware of the existence of both GPUs in order to take advantage of the second one. Every time, one GPU is responsible for display while you can assign tasks to the other (that sits idle otherwise).

I'm not sure you can call it "Crossfire Pro" - it's Apple's Cocoa "Offline Rendering". This simply means the application can choose to do OpenCL on an offline renderer. However this is entirely different from Crossfire, you can see this in their documentation on memory handling. The big cost with offline rendering is that data has to be mediated through main memory, if you want to move from one to the other. For example, say an app got the bright idea to render alternate frames on the other GPU. Bad idea, the frame would have to upload to main RAM, then over to the other GPU RAM before display, killing any performance gains.

Furthermore, the way the apple programming model is, the app is only given one GPU as far as it's concerned. Any others (even if they are connected to displays) are "offline". If you move an app window from one display to another Cocoa automatically transfers it to the other GPU. And finally, as I showed above, Cocoa/OS X clearly delinates online and offline, with one not being connected to a display at all.

If more applications will start support this in the near future, or the ATI drivers start implementing the Crossfire X as well, remains to be seen. The h/w is there, though.

No - again that's just for DMA. We still don't know if the two GPU's drive the thunderbolt ports, or just one, AFAIK.

Cubemmal · Apr 4, 2014

Maybe I can summarize, please correct any misunderstandings ... I believe these are the options available on the nMP

I believe that one GPU is plumbed through TB, and the other is not. DMA access (either via a special bus or through PCI) is available under Windows only.

Crossfire (Windows). The offline GPU can be used to offload frame rendering. This is accomplished via PCI bus mastering from one GPU to the other (DMA)
Eyefinity (Windows) Multiple TB displays can be driven as one single, large display. This is because the ports are all connected to a single GPU (a Eyefinity requirement)
Apple Offline Rendering. None of the above. The operating system does nothing, entirely up to the App developer. Furthermore, without bus mastering Crossfire support the only choice is to run OpenGL on one GPU, and OpenCL on the other.

From this I believe that OS X has the most limitations for GPU support, in that you can only do graphics on one and compute on the other. Given that the type of compute that can be done via GPGPU is limited, this makes the OS X the most limited approach available.

antonis · Apr 4, 2014

I've found those posts/threads back from January that contain useful info (actually, in one of the threads you also participated, sharing useful information about the matter).

https://forums.macrumors.com/posts/18638178/

https://forums.macrumors.com/posts/18643702/

https://forums.macrumors.com/posts/18579092/

In the second link, leman demonstrates OS X handling 2 different GPUs on a macbook pro.

A few days ago I've also read here that a game was patched to take advantage of nMP's dual GPUs (IIRC it was Sim City or ? ), but I have no idea to what extent.

But I think what you wrote in this thread is the most accurate.

Cubemmal · Apr 4, 2014

antonis said:
In the second link, leman demonstrates OS X handling 2 different GPUs on a macbook pro.

A few days ago I've also read here that a game was patched to take advantage of nMP's dual GPUs (IIRC it was Sim City or ? ), but I have no idea to what extent.

Thanks, good links. In the second link he is talking about offline rendering. And as he says, OS X WILL automatically handle the memory management for you. However, Apple developers caution us not to rely on that! It's horribly inefficient, they put the burden on us to manage the whole problem. If you're interested in details look up the documentation and developer videos on IOSurface.

In the first link, again he's correct, but in practice he's wrong. The memory bandwidth (without the DMA a-la Crossfire) makes offline frame rending useless.

As for games, yes Civilization recently enabled support for multi-GPU's. That's simply for "texture unpacking", which presumably is a low key operation (just performed once). I haven't noticed a drastic speedup when playing the new Civ on the nMP, but it's there I guess.

If this is all true I think Apple has committed two sins. One, not routing both GPU's to Thunderbolt, at least allowing you to put some monitor(s) on one and some on the other. Two, not (apparently) enabling DMA between the GPU's (this is a pure software solution).

antonis · Apr 4, 2014

Cubemmal said:
Thanks, good links. In the second link he is talking about offline rendering. And as he says, OS X WILL automatically handle the memory management for you. However, Apple developers caution us not to rely on that! It's horribly inefficient, they put the burden on us to manage the whole problem. If you're interested in details look up the documentation and developer videos on IOSurface.

In the first link, again he's correct, but in practice he's wrong. The memory bandwidth (without the DMA a-la Crossfire) makes offline frame rending useless.

As for games, yes Civilization recently enabled support for multi-GPU's. That's simply for "texture unpacking", which presumably is a low key operation (just performed once). I haven't noticed a drastic speedup when playing the new Civ on the nMP, but it's there I guess.

If this is all true I think Apple has committed two sins. One, not routing both GPU's to Thunderbolt, at least allowing you to put some monitor(s) on one and some on the other. Two, not (apparently) enabling DMA between the GPU's (this is a pure software solution).

I see. Thanks for this great info. In that case, I'd call those sins #2 and #3. I think the first sin is that Apple has yet to explain thoroughly what is the plan behind the dual GPUs, what they can do, what they cannot, and if they plan a real support for them in the future.

Cubemmal · Apr 4, 2014

antonis said:
I see. Thanks for this great info. In that case, I'd call those sins #2 and #3. I think the first sin is that Apple has yet to explain thoroughly what is the plan behind the dual GPUs, what they can do, what they cannot, and if they plan a real support for them in the future.

Apple, and Roadmap? Those two words aren't often seen together

Presently the plan is clear enough. One GPU is for OpenGL (display) and the other is for OpenCL (compute). Final Cut X is how they see us using them.

For future plans, I could see them enabling DMA between GPU's. That would make offline rendering more useful than it is now. The use cases available for Apple's model is quite limited, as I said. Namely, it's the developers problem, and don't transfer data between GPU's.

ActionableMango · Apr 4, 2014

Forcing everyone to go dual GPU seems wrong to me when many or most seem to be unable to take advantage of it. Yes, I know, anyone who doesn't need dual GPU OpenCL "is not a real pro".

The solution would be enabling OS X-wide universal crossfire in OpenGL. Present two video cards to the OS as if it were one, if such a thing is possible.

But I'm not sure it is possible, considering Windows has had Crossfire for years and it's still to this day a mishmash of supported and unsupported apps, profile nonsense, and a dash of bugs affecting performance and quality.

Cubemmal · Apr 4, 2014

nMP GPU routing

ActionableMango said:
Forcing everyone to go dual GPU seems wrong to me when many or most seem to be unable to take advantage of it. Yes, I know, anyone who doesn't need dual GPU OpenCL "is not a real pro".

Agreed. What's surprising is that Apple's developer philosophy is to make things as easy as possible. It's called "inversion of control" - which means that our apps are basically like device drivers. They are easier to write, but the downside is that things can be harder to debug (due to the thick Apple API) and you get less control.

Except here! Apple throws us under the bus, gives us all this powerful hardware, and says "Oh, go take care of it yourself, that's the right solution". AT LEAST enable GPU DMA for us.

The solution would be enabling OS X-wide universal crossfire in OpenGL. Present two video cards to the OS as if it were one, if such a thing is possible.

But I'm not sure it is possible, considering Windows has had Crossfire for years and it's still to this day a mishmash of supported and unsupported apps, profile nonsense, and a dash of bugs affecting performance and quality.

Possibly. Offline rendering is extremely problematic, as you mention with Crossfire/SLI. It's certainly possible, but it's quite hard to get zero artifact rendering (screen tears, etc), consistent frame pacing, etc. I don't mind that so much, I think Apple is taking a purist approach here. However, there are two mitigations they could take.

One, they could plumb both GPU's to the Thunderbolt ports. I assumed they did this. At least then you could, for example, put one monitor on on one GPU and the other(s) on the other GPU. That way your offline GPU isn't a boat anchor, which is it right now for all the thousands of nMP's out there (unless you happen to be running FCX).

Two, enable easier offline rendering. DMA, better tools, etc. At least GPU DMA as I've said.

antonis · Apr 4, 2014

Cubemmal said:
Two, enable easier offline rendering. DMA, better tools, etc. At least GPU DMA as I've said.

To be fair, though, is this only a task directed to Apple or is it - partially - AMD's responsibility including also the drivers ? If not for DMA, at least as far as Crossfire is concerned ?

Cubemmal · Apr 4, 2014

nMP GPU routing

antonis said:
To be fair, though, is this only a task directed to Apple or is it - partially - AMD's responsibility including also the drivers ? If not for DMA, at least as far as Crossfire is concerned ?

D700 drivers are Apple provided so it's up to them to give us the goods. At any rate IOSurface is a Core Graphics (I think that's where it is) facility, and it likely does the memory management. I don't think the low level driver gets any control of that, so yeah I think it has to be Apple's responsibility.

Theoretically Apple could push memory management/DMA down to the driver level, and AMD could give us OS X drivers, but why would they? AMD only provides Windows drivers because of a competitive fight with Nvidia. On the nMP you get no choice of GPU's so AMD has no reason to hand out custom drivers.

I'm sure AMD handed Apple their driver code, and wouldn't be surprised if they didn't hand over a few engineers in to boot, certainly ones who know Crossfire. Apple controls their ecosystem tightly, this is entirely their call, and fault.

jerryrock · Apr 4, 2014

Mavericks Technology Overview Document

http://images.apple.com/media/us/osx/2013/docs/OSX_Mavericks_Core_Technology_Overview.pdf

One GPU runs the monitors, the other contributes to CPU computing power using OpenCL. Dual GPU usage is configured in the software app.

Cubemmal · Apr 4, 2014

jerryrock said:
Mavericks Technology Overview Document

One GPU runs the monitors, the other contributes to CPU computing power using OpenCL. Dual GPU usage is configured in the software app.

OK, you'll have to show us where it says that. I looked through the marketing, and other than a small reference to OpenCL saw nothing related to what you said.

ChuckBlack · Apr 4, 2014

I'm curious to see how this article from Mac Performance Guide pans out...

"A certain software application vendor contacted me very recently, asking if I were interesting in testing a beta version of software that will utilize both GPUs on the 2013 Mac Pro. Naturally I said yes, and I hope Ill be able to report on the results soon, the exciting part being that optimized use of both GPUs could deliver 3X to 7X the performance one gets from CPU-based processing. Thats particularly exciting for photography (video too, but it has fairly good GPU support already)."

http://macperformanceguide.com/blog/2014/20140329_1-MacPro2013-using-dual-GPUs.html

goMac · Apr 4, 2014

ChuckBlack said:
I'm curious to see how this article from Mac Performance Guide pans out...

"A certain software application vendor contacted me very recently, asking if I were interesting in testing a beta version of software that will utilize both GPUs on the 2013 Mac Pro. Naturally I said yes, and I hope I’ll be able to report on the results soon, the exciting part being that optimized use of both GPUs could deliver 3X to 7X the performance one gets from CPU-based processing. That’s particularly exciting for photography (video too, but it has fairly good GPU support already)."

http://macperformanceguide.com/blog/2014/20140329_1-MacPro2013-using-dual-GPUs.html

This isn't really surprising. Apple already has documentation and video guides on how to make software use both GPUs at once. It's a few years old and was originally written for the oMP.

I think it was also announced that Civ V is being upgraded to use both GPUs on the Mac Pro at the same time as well. And I've used CUDA across multiple GPUs at the same time on older Macbook Pros, but OpenCL should have the same capabilities.

antonis · Apr 4, 2014

But are we talking about benefit for CPU intensive apps, or also for graphics intensive apps ? As the OP stated, using both GPUs for computational purposes is one thing. Using both GPUs as one, for doubling the graphics performance, is another. As I understand, Apple is still very far from achieving the latter.

Cubemmal · Apr 5, 2014

ChuckBlack said:
I'm curious to see how this article from Mac Performance Guide pans out...

He's a photo guy, I think the main thing you can do with GPGPU computing is applying filters. However see below ...

goMac said:
This isn't really surprising. ...

As you say OpenCL has been there for years, but nobody has enabled it practically. Why? Because Apple has a poor implementation in many ways (again see below).

antonis said:
But are we talking about benefit for CPU intensive apps, or also for graphics intensive apps ?

They're talking about apps which use just the CPU for compute (and are somewhat GPU intensive), and are going to use the GPU for compute too.

GPGPU - General Purpose GPU computing isn't easy. There just aren't a lot of problems that fit into that space. Most of computing (these days) is messy. Short little bursty bits of non deterministic things you need to do. This is why CPU's are the way they are, with branch detection and all these other powerful features.

GPU's are so fast and so parallel because they strip out all that messiness and just concentrate on one job. And if you happen to have something that takes a while and falls in that category, you can now throw it over at a GPU. But, the big problem is the time it takes to transfer over your data.

Usually that swamps the savings from doing it on the GPU. Or, more likely before you even know that, it swamps your development cycle and you don't have the time to mess with it.

Which gets back to my point that we paid for some expensive, overprices GPU's here. Apple could have (apparently) done a few things for us. My only hope is that they did do some plumbing but haven't enabled it or told us yet, which is also typically Apple.

antonis · Apr 5, 2014

Cubemmal said:
My only hope is that they did do some plumbing but haven't enabled it or told us yet, which is also typically Apple.

I stand in your final sentence, as this is indeed typically Apple. Generally speaking, in Apple's ecosystem, more often than not, h/w precedes the s/w, leaving a time-window where it is not in par with it.

I just wish we had a better explanation from the official source about the logic that hides behind this machine, even if it referred to future usage. It's a shame that nMP was included in 2 official presentations/keynotes before release (one of them about development, also), and still everyone wondered what the nMP is all about. Definitely not the ideal way to promote a workstation.

Cubemmal · Apr 5, 2014

antonis said:
I stand in your final sentence, as this is indeed typically Apple. Generally speaking, in Apple's ecosystem, more often than not, h/w precedes the s/w, leaving a time-window where it is not in par with it.

I just wish we had a better explanation from the official source about the logic that hides behind this machine, even if it referred to future usage. It's a shame that nMP was included in 2 official presentations/keynotes before release (one of them about development, also), and still everyone wondered what the nMP is all about. Definitely not the ideal way to promote a workstation.

With my three monitors I noticed that OS X preferred them hooked up a certain way. I have a mDP and two TB. Oddly it seemed to only remember one setup (which was center) across boots. I have a small thought that maybe I'm not seeing both GPUs on pixels because of the way I hooked it up.

But I recall that OS X generally blows at remembering desktop configuration with more than two, and don't want to experiment at the moment. But there's a small chance that my offline GPU is plumbed, I'll experiment soon.

Cubemmal · Apr 6, 2014

An easier way to see where your displays are connected is to look at the System information tool

Apple>About this Mac>More Info

Hardware|Graphics/Displays

This shows that the Slot-2 GPU has all the displays attached, while the Slot-1 has none of them.

I'd be curious if anybody saw a display show up on that second GPU. My guess is not.

jerryrock · Apr 6, 2014

"By default, one GPU is setup for display duties while the other is used exclusively for GPU compute workloads. "

"Due to the nature of the default GPU division under OS X, all games by default will only use a single GPU. It is up to the game developer to recognize and split rendering across both GPUs, which no one is doing at present."

http://www.anandtech.com/show/7603/mac-pro-review-late-2013/9

Cubemmal · Apr 7, 2014

jerryrock said:
"By default, one GPU is setup for display duties while the other is used exclusively for GPU compute workloads. "...

Good catch, however the bit you left out is important too

GPUs are notoriously bad at context switching, which can severely limit compute performance if the GPU also has to deal with the rendering workloads associated with display in a modern OS. NVIDIA sought to address a similar problem with their Maximus technology, combining Quadro and Tesla cards into a single system for display and compute.

I suspect this is what Apple is thinking. They take a purist approach, so here they might be planning on never releasing the capability of splitting Displays across GPU's. If they did I'd expect to see a "Use GPU for compute" option or something like that in the Displays pane of System Preferences.

And it may not be plumbed. I verified that the display GPU can route to all of the Thunderbolt ports. If you consider this diagram

It indicates that the DP signal is routed from the GPU directly to the controller which then multiplexes it, which makes a lot of sense. There are three Thunderbolt controller chips, therefore the one GPU has three outputs going to each. This isn't difficult as Graphics cards often have four outputs out the back.

However, if they were to enable multi GPU display, they'd have to have put in a DisplayPort switch before each of the TB controllers DP input. One from either chip. They certainly know how to do this already, as they are doing it with the MBP's. The other option is if TB controller chips have two DP inputs - I don't know if they do.

Did they do so? No idea.

EDIT: Look at the following documentation from Apple on TB programing. In there they indicate that for the 1st gen 82524EF controller ..

In host mode, the controller has a Gen2 x4 uplink to the system PCI Express Root Complex and one or more DisplayPort (DP) inputs (depending on the graphics capabilities of the system).

And the block diagram clearly shows two DP inputs.

There you so - simple. Based on this the TB controller clearly can take DP input from both GPU's with no multiplexing needed since the TB chip handles that, and I think Apple probably did plumb DP from both GPU's to all three TB controllers. In software they're only enabling one GPU for compute, and one for graphics, and it would be trivial to change this if so.

I bet they are not doing anything as antonis said - hardware proceeds software. Also they probably want to encourage the industry to start optimizing for OpenCL.

nMP GPU routing

macrumors 6502a

macrumors 6502a

macrumors 6502a

macrumors 68020

macrumors 6502a

macrumors 6502a

macrumors 68020

macrumors 6502a

macrumors 68020

macrumors 6502a

macrumors G3

macrumors 6502a

macrumors 68020

macrumors 6502a

macrumors 6502

macrumors 6502a

macrumors member

macrumors 604

macrumors 68020

macrumors 6502a

macrumors 68020

macrumors 6502a

macrumors 6502a

macrumors 6502

macrumors 6502a

Our Staff