Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

CouponPages

macrumors regular
Original poster
Jan 16, 2014
165
91
Staten Island, NY
We've all seen lots of conflicting benchmarks that either confirm or disprove the real world functionality of the dual GPUs in FCPX.

What makes this difficult is there are so many combinations of CPU & GPU that you can't actually determine if the speed gaps are because of the CPU or the GPU.

For example, I've seen a lot of benchmarks that sometimes show a D300 beating a D500... but the reality is those benchmarks are also using different CPUs, so you can't tell if the D300 won simply because it was a single threaded program that runs better on a 4 core than an 8 core.

So, to get a definitive answer, I came up with a very simple benchmark that takes a few minutes to run and it doesn't require downloading any programs or video content.

If anyone is willing to run the test and post their CPU and GPU results, we can sort them by CPU so that the results show exactly what the advantages of each GPU are.

The goal is to only compare:

4 core D300 vs 4 core D500 vs 4 core D700.

6 core D300 vs 6 core D500 vs 6 core D700.

8 core D300 vs 8 core D500 vs 8 core D700.

12 core D300 vs 12 core D500 vs 12 core D700.

A home run would be to get the same test run on all three GPUs for at least one group of CPUs. For me, it doesn't matter which group (4, 6, 8, 12), but the real key is to get all three tests in the same class CPU.

A GRAND SLAM would be to some day get all 12 combinations. Then we would know how much a real world application like FCPX scales with the various configurations. Obviously there is a sweet spot somewhere.

===== HERE'S MY TEST =====

In order to simulate a real world FCPX project, I decided to shoot for something longer than BruceX, which is only 6 seconds. Short benchmarks are skewed easily by Turbo Boost, so this one creates an 8 minute 1080p video.

My test has 5 steps:

1. Create a 1920 x 1080p 30p project, and place "Pages" from the Generator on the timeline... then make the duration exactly 8 minutes, then time how long it takes to background render it.

2. Add the "Clouds" title to the start, change it's duration to 8 minutes too... and time the background render.

3. Add the "Romantic" filter to the "Pages" clip and time the background render.

4. Export a ProRes 422 master, then time it.

5. Compress it with Compressor using the "Apple Devices HD (Best Quality)" preset... then time it.

---- My FIRST RESULTS ---- Mac Pro 4 Core, D300

Step 1, "Pages". 1:49

Step 2, "Clouds" 2:28

Step 3, "Romantic" 2:44

Step 4, Render ProRes 422: 36 seconds

Step 5, H.264 Compression: 6:25

Since the 4 core, D300 is the entry level, I would LOVE to see results from other configurations. A 4 core D500 or 4 Core D700 would truly answer a LOT of questions, because we could then rule out the CPU differences.

I'm hoping we can pull off a real home run or two... all three GPUs for any CPU.

Any takers?
 
I'm hoping it will put some questions to bed, once and for all. Every benchmark I see contradicts the last, because they always use different combinations.

The only way we will ever know how much FCPX taps into the GPUs is to see the same CPU with each of the 3 GPUs.

In theory, if these tasks heavily use the GPUs, there should be clear gains within each class. The D500 should beat the D300 and the D700 should beat them both... on the same CPU.
 
I can't imagine what I did differently, but I ran these steps on my 8C/64GB/D700/1TB and got numbers which don't really make sense compared to the numbers you have listed for the 4C/D300. 01:54 / 02:17 / 02:16 / 00:28 / 09:44.
 
I can't imagine what I did differently, but I ran these steps on my 8C/64GB/D700/1TB and got numbers which don't really make sense compared to the numbers you have listed for the 4C/D300. 01:54 / 02:17 / 02:16 / 00:28 / 09:44.

Thanks for posting your results!

Unfortunately, they are somewhat consistent with some of the unusual results I've been measuring. Most are within a few seconds. The one that seems to be the wildcard is the compression. That one surprises me the most.

Many people say that Compressor is very CPU heavy... and some say that instead of loading Compressor as it's own step, they get better results inside FCPX.

I didn't get much of a difference, in fact it seemed slower, but there are some that say there are ways of speeding that part up, but I have yet to reproduce them.

The 28 second export is 11 seconds better, that's a pretty big percentage. The other ones are close enough for margins of error in timing.

My test environment was less than ideal, but I got mostly the same results in 3 different attempts. If anyone else has different rigs... or even the same as these two, please try the test and we can average them out. I'll keep a spreadsheet and post a link so we can share.
 
I can't imagine what I did differently, but I ran these steps on my 8C/64GB/D700/1TB and got numbers which don't really make sense compared to the numbers you have listed for the 4C/D300. 01:54 / 02:17 / 02:16 / 00:28 / 09:44.

Since the consensus is that the last step (Compression) relies less on the GPU and heavily on the GPU, I was wondering if you checked off multiple instances. From what I've read, on smaller jobs it's faster to have only one instance because of the time needed to divide the task up and splice it back, but on larger jobs the additional instances speed it up. With 8 cores and 64GB of RAM, perhaps you should try enabling an extra instance or two.
 
Will measure the full suite later, but I do think your Compressor seems low. On my 6c D500, I export at 8:30.
 
Will measure the full suite later, but I do think your Compressor seems low. On my 6c D500, I export at 8:30.

Thanks. I look forward to your results. I've done the test more than a few times and 6:25 is pretty much the average using the default settings.

There are different opinions as to the compressor settings that work best, such as:

1. Saving the file then exiting FCPX and loading it

2. "Send to Compressor..."

3. Creating a Compressor Preset, then hitting share and selecting that preset (for some reason it renders it through another background task).

4. Sharing as a Master File, but picking H.264 instead of ProRes. I tried this too, it's close to the same time, and produces a larger than normal H.264 file, likely a higher bitrate.
 
It has been quite for a while but maybe we can get some more results/data to compare. (external SSD used)

nMP 8c / 32GB / 1TB / D700:

Step 1, "Pages" 00:51

Step 2, "Clouds" 1:41

Step 3, "Romantic" 1:47

Step 4, Render ProRes 422: 34 seconds

Step 5, N/A (don't have "Compressor")
 
Last edited:
nMP 6c / 32GB / 1TB / D300: (external USB SSD used)

Step 1, "Pages" 01:01 (17% slower than 8c/D700)

Step 2, "Clouds" 1:43 (2% slower than 8c/D700)

Step 3, "Romantic" 2:07 (19% slower than 8c/D700)

Step 4, Render ProRes 422: 34 seconds (the same as 8c/D700)

Step 5, N/A (don't have "Compressor")

1 and 3 makes good use of more cores but that was expected due to the 20% higher multicore speed 6c vs 8c.
 
Last edited:
nMP 12C/ 32GB / 512GB / D700

Step 1, "Pages" 00:55

Step 2, "Clouds" 02:20

Step 3, "Romantic" 01:12

Step 4, Render ProRes 422: 00:43

Step 5, N/A (don't have "Compressor")


And just for the lulz:

iMac 3.2 Ghz Core I5/ 8GB / 256GB SSD / NVIDIA GeForce GT 755M 1024 MB

Step 1, "Pages" 02:09

Step 2, "Clouds" 04:52

Step 3, "Romantic" 02:11

Step 4, Render ProRes 422: 01:22

Step 5, N/A (don't have "Compressor")
 
6C, 24 GB RAM, 1 TB, D700

Pages: 1:09.23
Clouds: 2:13.29
Romantic: 2:21.66
Export: :30.12
Compressor: 6:05

edit: re-ran the export at the proper setting
 
Last edited:
nMP 6c / 32GB / 1TB / D300 with internet SSD instead of external USB SSD

OS X 10.9.3 (13D45a) instead of 10.9.2

Step 1, "Pages" 00:58 (10.9.2 01:01)

Step 2, "Clouds" 1:44 (10.9.2 01:43)

Step 3, "Romantic" 01:52 (10.9.2 02:07)

Step 4, Render ProRes 422: 25 (USB SSD/10.9.2 00:34)

Step 5, N/A (don't have "Compressor")
 
6C, 24 GB RAM, 1 TB, D700

Pages: 1:09.23
Clouds: 2:13.29
Romantic: 2:21.66
Export: :30.12
Compressor: 6:05

edit: re-ran the export at the proper setting

nMP 6c / 32GB / 1TB / D300 with internet SSD instead of external USB SSD

OS X 10.9.3 (13D45a) instead of 10.9.2

Step 1, "Pages" 00:58 (10.9.2 01:01)

Step 2, "Clouds" 1:44 (10.9.2 01:43)

Step 3, "Romantic" 01:52 (10.9.2 02:07)

Step 4, Render ProRes 422: 25 (USB SSD/10.9.2 00:34)

Step 5, N/A (don't have "Compressor")

These results make no sense... Do they? I mean, how is this possible?
 
These results make no sense... Do they? I mean, how is this possible?


They seem to be quite consistent for these tasks. Compressor in particular is still the bottleneck because it doesn't tap into the dual GPUs. The other figures are not too shabby.

Once you get past the master export, the compression stage can only get faster using more CPUs. The type of GPU has very little impact.

What bothered me most was that the slowest part of my workflow has always been compression, so I hoped that the dual D700s would crush that step. Instead, it's nearly twice as slow as a MacBook Pro doing compression (because of the QuickSync that is not in a Xeon).

I made a video showing this in real-time on a MacBook Pro 2011. Mac Pro beats the Mac Pro on everything, but the compression.

https://www.youtube.com/watch?v=ntoVIoM8cNg

Based upon your tests, the D700s are at least twice as fast as the D300s, but the compression is nearly identical.
 
not very clear

In summary:

Other than in the last step (compression), the latest results show that the D700 handily beat the D300 and the MacBook Pro 2011 I used in the YouTube clip.

The D300 and D700 had nearly identical compression times.

The MacBook Pro 2011 and every QuickSync enabled Mac (i3, i5, i7 SandyBridge and Haswell) beat the Mac Pro doing single pass compression by 50-100%. This is because single pass compression on those chips use hardware compression that is not available in Xeons.

The results are spread throughout this thread. Perhaps I should compile the results from this thread into a single spreadsheet or graph.

----------

not very clear

Actually, I just caught that... that was just a typo.

"Mac Pro beats the Mac Pro on everything, but the compression."

That should've read:

"Mac Pro beats the MacBOOK Pro on everything, but the compression."

In my video, I compared the MacBook Pro 2011 to the base model Quad Core D300.

I could also say:

"Mac Pro D700 beats the Mac Pro D300 on everything, but the compression, which was a tie."
 
Would love to see a spec'd out late 2013 27" iMac do this test. Any takers?

I'll give it a go once I get home.

Edit: Ok just for more lulz
2013 iMac i7, 32GB 1867MHz, 256GB Flash, GTX780M

Pages - 1:36
Clouds - 3:55
Romantic - 1:39
Render ProRes 422 - 0:56
(*Don't have Compressor*)
 
Last edited:
I'll give it a go once I get home.

Edit: Ok just for more lulz
2013 iMac i7, 32GB 1867MHz, 256GB Flash, GTX780M

Pages - 1:36
Clouds - 3:55
Romantic - 1:39
Render ProRes 422 - 0:56
(*Don't have Compressor*)

Wow, interesting. Good times, thank you for that.
 
I disagree. For example, look at the two systems I quoted above in post #14... both are 6-core machines but the D300s perform the tasks quicker than the system with the D700s. How do you explain that?

It doesn't have to fill all 12 gigs of the D700's GDDR5 with data? :p
 
I disagree. For example, look at the two systems I quoted above in post #14... both are 6-core machines but the D300s perform the tasks quicker than the system with the D700s. How do you explain that?

I think only 2 D300 benchmarks were posted so far, but only the original post included Compressor results. If that's the case, any small advantage in compression vs other D700 configurations could be because Compressor doesn't tap into the GPUs... so it's down to clock speeds. Quad Core models have faster clock rates.

However... I agree when you have 2 extra cores, Compressor should go faster because it does take advantage of CPUs, but from what I know, that's not the default behavior. You need to specifically configure it to use more than one instance. The funny thing is sometimes that improves speeds... sometimes it's actually slower because it needs to split, then merge the files together at the end. The consensus is using extra CPU instances only improves speeds with longer videos, which could explain why it's off by default.

This is what drove me nuts from day one. The biggest selling point in the ads was how the dual GPUs will improve the speed of video editing, but my tests have been mixed:

Sometimes it's a huge improvement.
Sometimes it's minor.
Sometimes it's slower.
 
nMP 6c / 32GB / 1TB / D700 - OS X 10.9.3 (13D65)

Step 1, "Pages" 00:56

Step 2, "Clouds" 1:38

Step 3, "Romantic" 01:44

Step 4, Render ProRes 422: 24

Step 5, N/A (don't have "Compressor")
 
nMP 8c, D700, 32Gb, 500gb

nMP 8c, D700, 32Gb, 500gb
Test done on the internal drive

Pages: 1:33
Clouds: 2:12
ROmantic: 2:20
ProRes Export: 32
Transcorde: 5:12
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.