FCPX: AMD vs NVIDIA

koyoot · Apr 19, 2017

Asgorath said:
So you're talking about work done one time during shader compilation, which does not affect runtime CPU performance when those shaders are being used. I fail to see why NVIDIA is therefore more reliant on CPU performance during execution of an FCPX benchmark? The conclusion you're drawing here seems quite disingenuous, because shader compilation on the CPU generally has nothing to do with runtime performance (i.e. you can't claim that FCPX is slower on NVIDIA because their shader compiler is doing more/different work than AMD's compiler does).

I am not saying anything like this. I am describing only the difference between software optimized for specific things. It turns out that you have posted a clue to what can be reason why Nvidia is not fully utilized in FCPX and has in general lower performance than AMD, when we take in big scheme of things what I have written.

If FCPX is optimized for specific execution of code, without intervening of the CPU - you will see exactly what Prince134 described. FCPX may be optimized to use as low of the CPU as possible. Nvidia driver is designed to use as much as possible. The effects are simple to predict. Its always software performance in the end, and its optimization for specific runtime.

What we see with the behavior of the GPU is just book example of the difference between dynamic scheduling and static. That is all, what can be written here, without drawing anymore conclusions. We are gathering information on the differences, but overall, the pictures are being painted pretty clearly(differences in behavior of Nvidia in FCPX, on Ryzen platform, in DX11 vs DX12, etc.). Its all about understanding more and more what the hardware and software is doing, and in which way.

Asgorath · Apr 19, 2017

koyoot said:
I am not saying anything like this. I am describing only the difference between software optimized for specific things. It turns out that you have posted a clue to what can be reason why Nvidia is not fully utilized in FCPX and has in general lower performance than AMD, when we take in big scheme of things what I have written.

If FCPX is optimized for specific execution of code, without intervening of the CPU - you will see exactly what Prince134 described. FCPX may be optimized to use as low of the CPU as possible. Nvidia driver is designed to use as much as possible. The effects are simple to predict. Its always software performance in the end, and its optimization for specific runtime.

What we see with the behavior of the GPU is just book example of the difference between dynamic scheduling and static. That is all, what can be written here, without drawing anymore conclusions. We are gathering information on the differences, but overall, the pictures are being painted pretty clearly(differences in behavior of Nvidia in FCPX, on Ryzen platform, in DX11 vs DX12, etc.). Its all about understanding more and more what the hardware and software is doing, and in which way.

It's fascinating to watch the cognitive leaps you're making here. First of all, everyone who's posting about poor performance on NVIDIA that attached GPU usage charts shows well under 100% utilization of the GPU. Thus, shader execution has absolutely nothing to do with these performance issues. Do you have evidence that shows the NVIDIA driver is burning tons of CPU cycles compiling shaders during these benchmark runs? No? Then everything you are talking about has absolutely nothing to do with what we're discussing. Again, shader execution is a one-time thing that happens at startup (or level loading time, during most games). It has nothing to do with general runtime performance, for both pro apps and games.

TL;DR - Apple changed something in FCP and there's a bottleneck somewhere other than the GPU shader cores. Most likely it has something to do with data transfer between system memory and the GPU's memory, because that's most often been the bottleneck with high-resolution video in FCP (because Apple decodes on the CPU, at least on NVIDIA). It might be that AMD has a ProRes decode path on their GPU, perhaps using their video decode hardware or via OpenCL or something. I'd bet that Apple has been working on improving FCP and obviously only tuning it to run well on AMD, since all their products use AMD GPUs these days. And, I'd bet they changed something that inadvertently made it run worse on NVIDIA. Last time I ran this benchmark, BruceX completed in 14 seconds on my GeForce TITAN X (Maxwell), but that was a long time ago.

koyoot · Apr 19, 2017

I will put this this way. Even Nvidia GPU utilization is way below 100% it will not mean that at that moment the CPU utilization will go up.

That is the problem when you have Dynamic Scheduling optimized software working on static scheduling hardware. This is my theory, nobody has to agree with this, but the symptoms of behavior are similar, to the theoretical.

Asgorath · Apr 19, 2017

koyoot said:
I will put this this way. Even Nvidia GPU utilization is way below 100% it will not mean that at that moment the CPU utilization will go up.

That is the problem when you have Dynamic Scheduling optimized software working on static scheduling hardware. This is my theory, nobody has to agree with this, but the symptoms of behavior are similar, to the theoretical.

Okay, my theory is that you are way off track and that dynamic vs static scheduling in GPU shader cores has absolutely nothing to do with FCPX performance. I don't even know how you'd write an OpenCL kernel that would specifically run faster on AMD's dynamic sheduling hardware and much slower on NVIDIA's static scheduled hardware.

koyoot · Apr 19, 2017

Asgorath said:
Okay, my theory is that you are way off track and that dynamic vs static scheduling in GPU shader cores has absolutely nothing to do with FCPX performance. I don't even know how you'd write an OpenCL kernel that would specifically run faster on AMD's dynamic sheduling hardware and much slower on NVIDIA's static scheduled hardware.

Compare notes from OP:

- I just tried with 4444xq. Same as 422HQ. It took the card 30 seconds. I read some where that they don't make difference between the codecs in BruceX file sharing test.

I wonder if FCP in Sierra are CPU dependent. Prove is that I just have my Dell T3500 Hacintosh tested with Titan X Pascal. It takes 50 seconds. I am puzzled again. This Hac Mac's CPU is X5687, only 4 core 8 thereads but clocked at 3.6ghz. My Mac Pro is dual 3.46ghz, 12 core 24 threads. It looks like that in Sierra, GPU is less utilized than in El Capitan, so CPU involves more?

I remember a while ago I was having 2 7970s doing the same test, the conclusion was that it's not CPU dependent, ie 6 core or 12 core doesn't matter. That was from Yosemite and El Capitan, but not anymore in Sierra. Can anyone confirm this? -

Draw your own conclusions(you already have).

dolphin842 · Apr 19, 2017

If the source of the poor performance were some fundamental hardware limitation of Nvidia GPUs being CPU-dependent, wouldn't the performance be as poor on El Capitan as it is on Sierra? The situation we seem to have here is a performance regression from 10.11 → 10.12, which seems to imply something other than the GPU being at fault.

Prince134 · Apr 19, 2017

That Hardware architecture making BruceX results difference on the FCP was long known to me in the Yosemite era. 2X7970 with my dual x5690 beats the Macpro 6.1 3 years ago. Search on this forum the "King of the Mac". Same as Dolphin842, there was inconsistency. And then with their recent update to 10.3.3 fcp got a bit faster than 10.3.2.

dolphin842 · Apr 19, 2017

Ok, new driver, new results! 1060 gets a good improvement:

10.12.4 / FCP 10.3.3 / Mac Pro 5,1 6-core / 48 GB RAM
Export setting: Apple ProRes 422 HQ
Export drive: Crucial MX100 256GB, lower drive bay

121s (125/119/118) : GTX 1060 6GB (original Pascal driver)
77s (74/75/83) : GTX 1060 6GB (378.05.05.05f02 driver)

Here's the graph (now with more height for more detail):

CPU usage was the same. GPU had noticeably less time at 0% utilization, but still very sporadic utilization overall. The GTX 1060 is now on par with the 5770 at ~100% utilization.

Prince134 · Apr 19, 2017

Mine is the same 30 seconds with Titan Pascal after update to f02.

Wonder how your improvwment come from...

namethisfile · Apr 19, 2017

dolphin842 said:
Ok, new driver, new results! 1060 gets a good improvement:

10.12.4 / FCP 10.3.3 / Mac Pro 5,1 6-core / 48 GB RAM
Export setting: Apple ProRes 422 HQ
Export drive: Crucial MX100 256GB, lower drive bay

121s (125/119/118) : GTX 1060 6GB (original Pascal driver)
77s (74/75/83) : GTX 1060 6GB (378.05.05.05f02 driver)

Here's the graph (now with more height for more detail):
View attachment 696880
CPU usage was the same. GPU had noticeably less time at 0% utilization, but still very sporadic utilization overall. The GTX 1060 is now on par with the 5770 at ~100% utilization.

Wait, so, a GTX 1060 has the same performance as an HD5770 in FCP X?

dolphin842 · Apr 20, 2017

namethisfile said:
Wait, so, a GTX 1060 has the same performance as an HD5770 in FCP X?

Right now, yes. The 5770 GPU runs at full tilt during the benchmark, while the 1060 GPU only gets utilized very sporadically.

TzunamiOSX · Apr 20, 2017

Perhaps the problem is a slow export drive? Does anyone have a fast M.2 SSD to test this?

itdk92 · Apr 20, 2017

TzunamiOSX said:
Perhaps the problem is a slow export drive? Does anyone have a fast M.2 SSD to test this?

No.

There is a picture of an AMD GPU which gets fully used, an a shiny modern GPU which remains (for unknown reasons) idle.

Had a customer exactly complaining of this too today, in the nvidia cMP i set up for him.

dolphin842 · Apr 20, 2017

I haven't seen the export disk be a bottleneck, but it's not too difficult to make a small RAMdisk from the Terminal if we wanted to standardize somewhat.

dolphin842 · Jun 9, 2017

Stopped by the Apple Store and ran BruceX on the new Macs:

MacOS 10.12.5

16s: Radeon Pro 580 8GB (27" iMac)
22s: Radeon Pro 575 4GB (27" iMac)
65s: Iris Plus 650 1.5GB (13" MBP touchbar)

Prince134 · Jun 9, 2017

In high sierra Rx580 runs with cMP 5.1, 16 seconds. Same as imac 27" in sierra. No surprise.

It's not a fair game for Nvidia under Mac system. Since 2013 debuted nMP they purposely retained better performance with AMD for FCPX. This is not news. When Titan Maxwell closed the gap in El capitan, Apple widened it in Sierra. I don't know exactly but I think maybe Phil just want to cover his ass.

We will see how Nvidia does in High Sierra. Hopefully can be immediate after High Sierra make public. Things can change if mMP will be PCIE upgradable so Nvidia can gain better results. (As long as APPLE and Nvidia work together again!) As CEO Jensen said it's all depending on APPLE. Nvidia of course can do a job better than what we see now.

dolphin842 · Jun 9, 2017

Based on the existence of Pascal drivers and Nvidia's job postings a while back, it seems that Apple and Nvidia are at least back in touch again. Whether or not that will lead to meaningful collaboration to get their drivers up to par, we'll see.

koyoot · Jun 10, 2017

Prince134 said:
In high sierra Rx580 runs with cMP 5.1, 16 seconds. Same as imac 27" in sierra. No surprise.

It's not a fair game for Nvidia under Mac system. Since 2013 debuted nMP they purposely retained better performance with AMD for FCPX. This is not news. When Titan Maxwell closed the gap in El capitan, Apple widened it in Sierra. I don't know exactly but I think maybe Phil just want to cover his ass.

We will see how Nvidia does in High Sierra. Hopefully can be immediate after High Sierra make public. Things can change if mMP will be PCIE upgradable so Nvidia can gain better results. (As long as APPLE and Nvidia work together again!) As CEO Jensen said it's all depending on APPLE. Nvidia of course can do a job better than what we see now.

Why do you believe that if AMD GPUs are faster in non-geometry based compute it is because someone gimped Nvidia hardware performance through software? Maybe its Nvidia who is not able to optimize the drivers, and hardware is slower in most important compute metric for FCPX?

For example, currently Blender implementation of OpenCL is making RX 480 faster than GTX 1060 using CUDA. How is that possible? For two reasons. AMD GPU has higher compute throughput: 5.7 TFLOPs vs 4.4, and more importantly software is finally optimized for both vendors, not just one.

Look at this, what this does tell you?

dolphin842 · Jun 10, 2017

Nvidia may not be as fast as comparable AMD chips for compute, but it's clear from monitoring that something is up with the Nvidia drivers on Sierra that are causing them to be way under-utilized for the BruceX benchmark. Previously-fast Maxwell chips are now much slower, as are the newer Pascal products. Whether that's Apple's or Nvidia's fault, we don't know.

Guyrab · Jan 10, 2018

mhafeez said:
I got the GTX780m from ebay by seller named "ChooseYourDestiny" something like that.

I have done several times again and again, followed all steps accordingly especially No.3 and No.12 as mentioned and I still got an average of 19 seconds. I think it is due to Apple's newly improved Metal.

The problem is the Nvidia GPUs don’t support metal !! Something does not compute here
Only and GPUs support metal and therefor are much m0re efficient for fcp
[doublepost=1515606121][/doublepost]

Guyrab said:
The problem is the Nvidia GPUs don’t support metal !! Something does not compute here
Only and GPUs support metal and therefor are much m0re efficient for fcp

That was amd gpu

h9826790 · Jan 10, 2018

Guyrab said:
The problem is the Nvidia GPUs don’t support metal !! Something does not compute here
Only and GPUs support metal and therefor are much m0re efficient for fcp
[doublepost=1515606121][/doublepost]

That was amd gpu

The newer Nvidia GPU of course support METAL

It’s METAL performance is not bad indeed.

But it won’t help, FCPX performance still very bad due to lack of software optimisation.

FCPX: AMD vs NVIDIA

macrumors 603

macrumors 68000

macrumors 603

macrumors 68000

macrumors 603

macrumors 65816

macrumors 6502

macrumors 65816

macrumors 6502

macrumors 65816

macrumors 65816

macrumors 65816

macrumors 6502a

macrumors 65816

macrumors 65816

macrumors 6502

macrumors 65816

macrumors 603

macrumors 65816

macrumors newbie

macrumors P6

Our Staff