How does OS X handles multiple GPUs?

goMac · Jul 8, 2013

jasonvp said:
Was this published anywhere or did someone pull it out of their backside and present it as "fact?" There's no reason to believe Apple will do any such thing.

...I heard it. ...Somewhere...

: innocent whistling :

jasonvp said:
Software coded against OpenCL can use multiple GPUs if the software engineer does the right thing.

Sure, but if FCPX is pegging one GPU with GL calls, that GPU doesn't have any bandwidth free anyway.

jasonvp said:
Why wouldn't Apple do that?

For the reason given above. Running OpenGL and OpenCL on the same GPU puts a performance hit on both.

If you have an app that doesn't use any OpenGL, sure, you can go wild with OpenCL and run on both. But otherwise OpenGL and OpenCL fight for resources. Given that FCPX is heavy on both GL and CL it doesn't seem like a good target for multi-GPU OpenCL.

MacVidCards · Jul 8, 2013

Test coming

Barefeats.com has a fun Multi-GPU test coming.

And not on Vapourware, on actual Mac Pro running current software.

MattInOz · Jul 8, 2013

goMac said:
My understanding is that Apple will use one GPU for OpenCL and one for OpenGL.

So you will see an overall better experience, but they probably won't put much effort behind using two GPUs for OpenGL at the same time. It's not the point of dual GPUs. OpenCL and OpenGL fight for resources, so if they're running at the same time, best to put them on GPUs.

So I guess in a way that's multi GPU? But also not.

Maybe I'm dumb but between tech like Quartz composer, GCD and LLVM Apple had plenty of scope to use multiple GPU's effectively for both openGL and openCL work.

I wonder if the changes to multiple desktops on OS X are a sign of improvements to Quartz composer, maybe breaking it up in to multiple composers so each screen has it's own, even each desktop has one and AppNaps as is goes off screen, with ability to move windows between.

It just seems to me Apple want a hardware agnostic solution, although the new Mac Pro must have a pretty massive bandwidth between each GPU and the thunderbolt connectors and even between each other independent of the CPU, if they are going to drive 3 4k screens out those 6 Thunderbolt connectors.

goMac · Jul 8, 2013

MattInOz said:
Maybe I'm dumb but between tech like Quartz composer, GCD and LLVM Apple had plenty of scope to use multiple GPU's effectively for both openGL and openCL work.

GCD and LLVM are CPU technologies only, not GPU. LLVM has some things to do with OpenCL, but without getting into details, it does absolutely nothing about multiple GPUs.

Quartz Composer is a design tool so I'm really not sure how that is involved at all.

tomvos · Jul 9, 2013

jasonvp said:
Was this published anywhere or did someone pull it out of their backside and present it as "fact?" There's no reason to believe Apple will do any such thing. Software coded against OpenCL can use multiple GPUs if the software engineer does the right thing.

Why wouldn't Apple do that?

Well, at least on Mac OS X 10.8 you notice a stuttering effect if you run OpenCL and Video on one GPU. In my Mac Pro I use a GTX680. If I run a GPU miner for litecoin I notice a stuttering effect in the overall video output. It is visible in the normal desktop UI as well as in video playback.

Now, if I think about an application like MARI, which was showcased at the WWDC ... this requires very smooth video frame rates as well as it puts high load on the GPU for OpenCL.

If you want to use both GPUs for OpenCL kernels and still need to process your video on the same GPUs (or at least one of them), you need some kind of load balancer which keeps the video frame rate acceptable and assigns all remaining GPU capacity to the OpenCL kernels.

While this is not impossible, I have not heard of something like this in relation to Mac OS X [redacted]. So my guess it, Apple will start with a simple approach to the problem: One GPU does video, the other does OpenCL.

MattInOz · Jul 9, 2013

goMac said:
GCD and LLVM are CPU technologies only, not GPU. LLVM has some things to do with OpenCL, but without getting into details, it does absolutely nothing about multiple GPUs.

Quartz Composer is a design tool so I'm really not sure how that is involved at all.

oops my bad - Quartz Compositor as described in classic Ars Technica Tiger Review.Quartz in Tiger It's funny since Tiger Quartz hasn't gone the attention it did for that release. Well at least in Ars reviews.

LLVM is a complier. AMD have a LLVM project for compiling OpenCL to both AMD GPUs and CPUs. Similarly nVidia use it for CUDA compiling as well. Apple even developed openGL support so they could fake CPU features in x86. Nothing about it is CPU only.

Ok GCD is CPU only according to the internet it just seems like some sort of live resource co-ordinator like that is required to make it all come together.

GXPvince · Jul 9, 2013

I have a question along the same lines..
I play League of Legends once a week.
I have the 1900xt and a PC 5770 in my 1,1. (Shows up as ATI 5000)

Does the game know which card to use?

I am thinking its using the 1900 because all my settings are low to achieve 40-50fps. When I only had the 5770 in it was 60+ on high settings.

Is there a way for the 5770 to become the main card without removing the 1900?

goMac · Jul 9, 2013

MattInOz said:
LLVM is a complier. AMD have a LLVM project for compiling OpenCL to both AMD GPUs and CPUs. Similarly nVidia use it for CUDA compiling as well. Apple even developed openGL support so they could fake CPU features in x86. Nothing about it is CPU only.

LLVM can compile OpenCL code (this is the relation I was hinting at), but it doesn't do anything at all about multiple GPUs.

Things like CrossFire or SLI don't really apply in the same way to OpenCL as they do to GL. An app really has to be written with multiple GPU support in mind. LLVM doesn't have anything to magically take care of the problem.

Photovore · Jul 10, 2013

goMac said:
...For the reason given above. Running OpenGL and OpenCL on the same GPU puts a performance hit on both....

Your point looks true in the way you mean it -- presumably something like a game with heavy oGL shader code while something else tries to do oCL computation.

My circumstance is different, and I imagine some of these game coders are in my shoes too: the magic of OpenCL / OpenGL Interoperability saves lots of time. They are designed to cooperate (if you set things up properly) so that you can do heavy oCL calculations, then, when your kernel is finished, you don't copy the data off the card; just turn it over to OpenGL for display, and it goes straight out through the chosen video adapter (with whatever oGL interpolation or whatever you want).
This lets you display your frame without having to copy it back to the CPU first, which saves my app a whole frame-chunk on a 4,1 with a 5870. I.e., the projector's running at 60fps, and a frame calculates in 30ms. With ogl/ocl interop, the resulting frame rate is 33ms, or 30fps. Without, it's 50ms, or 20fps. Big difference. [It's not transferring enough data to take nearly that much time; it just seems to cost one whole frame to do it.*]

This brings up my next point: I hope that Mavericks allows them to be treated as if with SLI/crossfire. My code is written so that it can use two different GPUs and the CPU at the same time (yes it is worth it doing oCL on the CPU). But then the transfer time to combine the data for output negates the time saved on the 4,1. If Mavericks allows sli, then no transfer time, just the massive combined computational power and then out to display in microseconds. [*Of course, Mavericks or a newer PCI spec or whatever may eliminate that whole-frame cost, which would be great.]

Umbongo · Jul 10, 2013

GXPvince said:
I have a question along the same lines..
I play League of Legends once a week.
I have the 1900xt and a PC 5770 in my 1,1. (Shows up as ATI 5000)

Does the game know which card to use?

I am thinking its using the 1900 because all my settings are low to achieve 40-50fps. When I only had the 5770 in it was 60+ on high settings.

Is there a way for the 5770 to become the main card without removing the 1900?

Is there a reason you are still using the 1900?

goMac · Jul 10, 2013

Photovore said:
My circumstance is different, and I imagine some of these game coders are in my shoes too: the magic of OpenCL / OpenGL Interoperability saves lots of time. They are designed to cooperate (if you set things up properly) so that you can do heavy oCL calculations, then, when your kernel is finished, you don't copy the data off the card; just turn it over to OpenGL for display, and it goes straight out through the chosen video adapter (with whatever oGL interpolation or whatever you want).
This lets you display your frame without having to copy it back to the CPU first, which saves my app a whole frame-chunk on a 4,1 with a 5870. I.e., the projector's running at 60fps, and a frame calculates in 30ms. With ogl/ocl interop, the resulting frame rate is 33ms, or 30fps. Without, it's 50ms, or 20fps. Big difference. [It's not transferring enough data to take nearly that much time; it just seems to cost one whole frame to do it.*]

This brings up my next point: I hope that Mavericks allows them to be treated as if with SLI/crossfire. My code is written so that it can use two different GPUs and the CPU at the same time (yes it is worth it doing oCL on the CPU). But then the transfer time to combine the data for output negates the time saved on the 4,1. If Mavericks allows sli, then no transfer time, just the massive combined computational power and then out to display in microseconds. [*Of course, Mavericks or a newer PCI spec or whatever may eliminate that whole-frame cost, which would be great.]

I don't see any reason you couldn't force that behavior now, without Mavericks. Just use the same graphics output device and OpenCL device. OS X never forces you into a certain device, you get to pick your device for both GL and OpenCL (assuming your GL isn't pinned to a window, then you don't get to pick for GL.)

It seems to me SLI and Crossfire would break your use case. SLI and Crossfire both have inherent transfer time in this situation. You still have to move your data off of the CPU to the VRAM on the cards, but now you have to do it twice.

Tutor · Jul 10, 2013

tomvos said:
Well, at least on Mac OS X 10.8 you notice a stuttering effect if you run OpenCL and Video on one GPU. In my Mac Pro I use a GTX680. If I run a GPU miner for litecoin I notice a stuttering effect in the overall video output. It is visible in the normal desktop UI as well as in video playback.

Now, if I think about an application like MARI, which was showcased at the WWDC ... this requires very smooth video frame rates as well as it puts high load on the GPU for OpenCL.

If you want to use both GPUs for OpenCL kernels and still need to process your video on the same GPUs (or at least one of them), you need some kind of load balancer which keeps the video frame rate acceptable and assigns all remaining GPU capacity to the OpenCL kernels.

While this is not impossible, I have not heard of something like this in relation to Mac OS X [redacted]. So my guess it, Apple will start with a simple approach to the problem: One GPU does video, the other does OpenCL.

Apple's claim that the new Mac Pro will yield 7 teraflops of GPGPU computing power is directly contrary to your guess: a single FirePro W9000 yields 3.994 teraflops of computing power [ http://en.wikipedia.org/wiki/Comparison_of_AMD_graphics_processing_units#FirePro_Workstation_Series ], so while either or both of the GPUs does video, both do OpenCL. I'd rather have two or three ($1K each) Radeon HD 7990s in one of my self-builds, where each yields 8.2 teraflops of computing power (that's about 3x the GPGPU compute performance of the nMP high-end version) and I'd tweak them to widen the gap even further.

MacVidCards · Jul 10, 2013

GXPvince said:
I have a question along the same lines..
I play League of Legends once a week.
I have the 1900xt and a PC 5770 in my 1,1. (Shows up as ATI 5000)

Does the game know which card to use?

I am thinking its using the 1900 because all my settings are low to achieve 40-50fps. When I only had the 5770 in it was 60+ on high settings.

Is there a way for the 5770 to become the main card without removing the 1900?

Games use whatever card display is connected to.

Ditch the X1900, it is a dinosaur with very little current support, it is likely only getting you a boot screen, for which a 2600XT would be a better choice.

tomvos · Jul 11, 2013

Tutor said:
Apple's claim that the new Mac Pro will yield 7 teraflops of GPGPU computing power is directly contrary to your guess: a single FirePro W9000 yields 3.994 teraflops of computing power, so while either or both of the GPUs does video, both do OpenCL. I'd rather have two or three ($1K each) Radeon HD 7990s in one of my self-builds, where each yields 8.2 teraflops of computing power (that's about 3x the GPGPU compute performance of the nMP high-end version) and I'd tweak them to widen the gap even further.

I'd like to be proven wrong by Apple. I would be happy if the new Mac Pro has some compelling method to facilitate all 7 tFlops to whatever usage is required and even manages to shift the load seamless between OpenCL and OpenGL calls.

But generally marketing numbers related to GPU performance like to brag about peak performance and only seldom talk about sustained performance. As Photovore pointed out in his comment, as soon as you split workload on several OpenCL devices, memory transfer and synchronization limit your peak performance. You'll get only close to peak performance if your task consists of completely independent small work units which are trivial to parallelize. Unfortunately, in most cases this is not applicable.

Well, I guess we have to wait until OS X 10.9 and the new Mac Pro are really in the hands of independent reviewers to find out how Apple tackled the dual GPU questions.

Tutor · Jul 11, 2013

It's likely that Apple has already done what would make you happy.

tomvos said:
I'd like to be proven wrong by Apple. I would be happy if the new Mac Pro has some compelling method to facilitate all 7 tFlops to whatever usage is required and even manages to shift the load seamless between OpenCL and OpenGL calls.

A single FirePro W9000 yields 3.994 teraflops of computing power. 2 x 3.994 = 7.988 or just shy of 8 teraflops. However, Apple's add says the yield in the nMP is 7 teraflops of computing power. Thus, about one teraflop of computing power is lost when one (? or both) of the cards is tasked with OpenGL duties also. In other words, the amount of peak computing performance the video card manufactures give is measured by the card's performance when it is only tasked with that function and what a computer system manufacturer/integrator, like Apple, gives is peak compute performance that may have been adjusted because of a particular use case, e.g., no additional card dedicated to OpenGL only. Thus, if you use a third (TB connected) video card for display, you might even get close to 8 teraflops on the two cards installed in the nMP. But I'd still prefer a trio of 7990s for 23.5+ teraflops.

Whereas the FirePro is exhibiting OpenCL prowess, the same type of phenomenon, i.e., a diminution of OpenCL performance because CUDA is using the resources, occurs with CUDA. Here's what happens when running Nvidia CUDA cards in Octane Render and what Otoy, the Octane render seller, states:

"Single PCI-E Slot
If the computer has a single PCI-E slot, the upgrade options are fewer. One could simply add a more powerful GPU as long as the power supply can provide enough power for the new GPU. Dual GPU, single slot card solutions like the GTX 590 or GTX 690 may also be used in this situation, again assuming that the power supply is sufficient to power the video card.
A second option is to use an external expansion box which contains multiple GPUs. This allows the use of multiple GPU's with a computer that only has a single PCI-E slot. For the smoothest user experience with OctaneRenderTM, it is recommended to dedicate one GPU for the display and OS to avoid slow and jerky interaction and navigation. The dedicated video card could be a cheap, low powered card since it will not be used for rendering and it should be unticked (off) in CUDA devices in the Device Manager/Preferences.

Two PCI-E Slot Motherboard
If the computer has two PCI-E slots the user is presented with many additional upgrade options. One option is to install a second graphics card along with the currently installed GPU. If the existing GPU is slow, it can be used to power the display only and the second card can be dedicated to OctaneRenderTM. This will allow the OS to be smooth and the computer will still be responsive while the second GPU is tasked with rendering. Another option would be to add an additional GPU to assist in rendering. In this situation, it is best to have both GPU's match in model and ram content. This allows multi-GPU rendering but the OS interface may still be slowed as all the GPU processing power is dedicated to the rendering process." [ http://render.otoy.com/downloads/OctaneRenderUserManual_1_20.pdf ]

So, in Octane render, the bias is towards rendering so interface interactivity may suffer is all cards are selected in preferences to render. The nMP appears to have the opposite bias, where OpenGL performance is favored, given the video display bias it appears that Apple has.

tomvos said:
But generally marketing numbers related to GPU performance like to brag about peak performance and only seldom talk about sustained performance. As Photovore pointed out in his comment, as soon as you split workload on several OpenCL devices, memory transfer and synchronization limit your peak performance. You'll get only close to peak performance if your task consists of completely independent small work units which are trivial to parallelize. Unfortunately, in most cases this is not applicable.

In CUDA, at least when using Octane render, each additional Nvidia card results in a virtually linear increase in performance if you dedicate a particular GPU for display and disable it from having any CUDA chores or if you use an ATI card for interactivity. But otherwise, I agree with point that marketing favors pitching peak over sustained performance.

davebean · Dec 7, 2016

So a card like the R9 295x2 is useless because OSX cannot handle/utilize dual GPU's on the same card? Strange considering the MacPro 6,1 has it onboard...

h9826790 · Dec 7, 2016

davebean said:
So a card like the R9 295x2 is useless because OSX cannot handle/utilize dual GPU's on the same card? Strange considering the MacPro 6,1 has it onboard...

My understanding is that the 6,1 has two graphic cards, not a single dual GPU card.

The cMP can also handle multi graphic cards, just not the dual GPU graphic card.

davebean · Dec 7, 2016

too bad, it seems like a good space-saving solution. Lord knows we only have 4 slots to work with.

fhenry · Dec 8, 2016

davebean said:
So a card like the R9 295x2 is useless because OSX cannot handle/utilize dual GPU's on the same card? Strange considering the MacPro 6,1 has it onboard...

It's seems to me that is possible for nvidia cards like the 490 or 590, let me check

Edit: https://forums.macrumors.com/thread...ted-performance.1333421/page-47#post-20483256

2*590 = 4 gpu

itdk92 · Dec 8, 2016

davebean said:
too bad, it seems like a good space-saving solution. Lord knows we only have 4 slots to work with.

still, let's say

slot 4 - USB3 card of your choice
slot 3 - multiple pcie ssds with adapter
slot 2 - GPU 1
slot 1 - GPU 2

What more do you need

cmabolt · Jan 11, 2018

itdk92 said:
still, let's say

slot 4 - USB3 card of your choice
slot 3 - multiple pcie ssds with adapter
slot 2 - GPU 1
slot 1 - GPU 2

What more do you need

Just out of curiosity, if using multiple gpu's on a single x16 PCIe slot how many can mac support. 4? 8?
unlimited?

h9826790 · Jan 11, 2018

cmabolt said:
Just out of curiosity, if using multiple gpu's on a single x16 PCIe slot how many can mac support. 4? 8?
unlimited?

AFAIK, the overall GPU limit won't change. e.g. 4 GPU is the max, you can install 4 single GPU graphic cards. Or 2 dual GPU graphic cards.

cmabolt · Jan 11, 2018

h9826790 said:
AFAIK, the overall GPU limit won't change. e.g. 4 GPU is the max, you can install 4 single GPU graphic cards. Or 2 dual GPU graphic cards.

So, if using an external GPU expander like cubix, interfacing through the single x16 PCIe slot, 2 double wide gpu's or 4 regular width gpu's is all a 2013 5.1 mac can support? Sorry to ask in another way. Just want to be absolutely clear since we are specking a GPU RENDER systems now.

h9826790 · Jan 11, 2018

cmabolt said:
So, if using an external GPU expander like cubix, interfacing through the single x16 PCIe slot, 2 double wide gpu's or 4 regular width gpu's is all a 2013 5.1 mac can support? Sorry to ask in another way. Just want to be absolutely clear since we are specking a GPU RENDER systems now.

May be you can ask William

https://forums.macrumors.com/threads/amfeltec-pci-express-splitter.2027827/#post-24201730

bsbeamer · Jan 11, 2018

cmabolt said:
So, if using an external GPU expander like cubix, interfacing through the single x16 PCIe slot, 2 double wide gpu's or 4 regular width gpu's is all a 2013 5.1 mac can support? Sorry to ask in another way. Just want to be absolutely clear since we are specking a GPU RENDER systems now.

Contact CUBIX to confirm, but that is/was the case. They are now listing Gen 3 PCI in specs for the HIC. Unsure if they've updated anything in their hardware that now REQUIRES Gen 3 vs Gen 2. Would think it should function, just at reduced speed from theoretical maximum.

https://www.cubix.com/store/#buy-cubix-xpander-desktop

How does OS X handles multiple GPUs?

macrumors 604

Suspended

macrumors 68030

macrumors 604

macrumors 6502

macrumors 68030

macrumors regular

macrumors 604

macrumors regular

macrumors 601

macrumors 604

macrumors 65816

Suspended

macrumors 6502

macrumors 65816

macrumors newbie

macrumors P6

macrumors newbie

macrumors regular

macrumors 6502a

macrumors newbie

macrumors P6

macrumors newbie

macrumors P6

macrumors 601

Our Staff