Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Status
The first post of this thread is a WikiPost and can be edited by anyone with the appropiate permissions. Your edits will be public.
Status
Not open for further replies.
I don't know why Dirt benches the way it does, but here are some more gaming benchmarks. As far as I can tell, the Maxwell drivers are at least as well optimized as they are for the Kepler cards.

Screen Shot 2016-05-10 at 5.48.36 PM.png

Source: http://barefeats.com/gtx980.html
 
I understand what is CPU limiting. However, not applicable in the case.

Please read again post 3096. If those games benchmark are CPU limiting. No matter how Nvidia work on the driver, the gaming performance CANNOT be improved. HOWEVER, they do. 356.02 obviously works much better then 346.01. That means the games were not CPU limiting from the very beginning.

If Dirt 3 is CPU limiting, 980 should perform same as the 680, but not 10% slower.

We can always blame the old CPU in the cMP, however, does not fit in this case.

On the other hand, after 356.02 released, I cannot deny that the gaming performance may hit the point that's now CPU limiting. But, 980 perform ~10% worse than 680 still the fact, which should not happen if the poor performance purely due to CPU limiting.

It's been widely reported that the 346.02.02 release used in those BareFeats tests contained large CPU improvements in the NVIDIA web driver. GPU limited cases did not improve at all. So, clearly NVIDIA did some work to tune the driver code to improve performance in CPU limited cases. Dirt 3 is clearly an outlier, but maybe they just improved the driver code for Kepler more than for Maxwell.

So again, it all comes down to where the bottlenecks are. NVIDIA tuned their part of the driver code so it is no longer the main bottleneck, hence the large improvements in many of those tests. The remaining CPU overhead limits are most likely due to the Apple driver model that has a large software component from Apple (i.e. the OpenGL framework).

Or, in other words, NVIDIA improved their driver code and you'll get an improvement in CPU limited cases. This will result in results like a 680, 980 and 980 Ti all performing about the same, as you can see in the Tomb Raider test. Once their driver code is no longer the bottleneck, there's little NVIDIA can do to improve those benchmark scores. A more powerful GPU won't help because the test is not limited by GPU performance.

People can theorize about Maxwell running in backwards compatibility mode or the NVIDIA driver not supporting color compression or whatever, but it's exactly that -- a theory. The simple explanation is that most game benchmarks are completely CPU limited, which is easily confirmed by actually running the OpenGL Driver Monitor and looking at the GPU utilization. I've done this for many games and have observed my TITAN X running at barely 50% utilization most of the time.
[doublepost=1462917823][/doublepost]
I don't know why Dirt benches the way it does, but here are some more gaming benchmarks. As far as I can tell, the Maxwell drivers are at least as well optimized as they are for the Kepler cards.

View attachment 630816
Source: http://barefeats.com/gtx980.html

Thanks for posting this -- it clearly shows that in many cases the Maxwell cards are indeed performing better than the Kepler cards when the app has a sufficiently high CPU limit (e.g. L4D2). I really wish people would stop spreading crap about the Maxwell driver running backwards compatibility mode when the simple explanation is many/most apps are hitting a CPU limit.
 
  • Like
Reactions: scott.n
Just to prove Asgorath's point about things being CPU bound, here are some benchmarks I did about 3 years ago. You can clearly see how the exact same system benched much better with a CPU upgrade.

2.40 to 2.93.jpg

[doublepost=1462918381][/doublepost]
I really wish people would stop spreading crap about the Maxwell driver running backwards compatibility mode when the simple explanation is many/most apps are hitting a CPU limit.

Amen.
[doublepost=1462918725][/doublepost]Here are the results from a Maxwell card, albeit with newer version of OS X, newer Nvidia web drivers, and even faster CPUs. The Heaven score went from 40.4 to 52.4 fps. Almost a 30% increase from drivers that have not been optimized?

Screen Shot 2016-04-26 at 6.59.22 AM.png Screen Shot 2016-04-26 at 7.03.36 AM.png
 
Last edited:
Be careful about drawing conclusions when using benchmarks across different versions of OS X.

I saw a nearly 50% increase in render speed on the exact same hardware, switching from Yosemite to El Capitan (using the latest web drivers in both cases). Something in ElCap is making a huge difference. I suspect Metal, but like others have mentioned, that's only a theory and I have no idea.
 
Be careful about drawing conclusions when using benchmarks across different versions of OS X.

I saw a nearly 50% increase in render speed on the exact same hardware, switching from Yosemite to El Capitan (using the latest web drivers in both cases). Something in ElCap is making a huge difference. I suspect Metal, but like others have mentioned, that's only a theory and I have no idea.

Yup. I posted a link to your findings in post #3099 on this thread. I'm just trying to show that there are indeed driver improvements and optimizations coming from Nvidia
 
Yup. I posted a link to your findings in post #3099 on this thread. I'm just trying to show that there are indeed driver improvements and optimizations coming from Nvidia

I'd still like to know how a 780M is beating a 980. I wish more people were able to contribute results to that thread.
 
I'd still like to know how a 780M is beating a 980. I wish more people were able to contribute results to that thread.

Which benchmark has a 780M beating a 980 exactly?

Edit: To be clear, I'd fully expect a 780M paired with a fast Haswell CPU to beat a 980 in a cMP with a slow CPU by modern standards for CPU limited benchmarks.
 
BruceX in FCPX.

Okay, PCIe Gen3 with a fast CPU beats PCIe Gen2 with a slow CPU, no big surprise there. I'd be more interested in a comparision of a 780M with a 980 in a modern Hackintosh. Do you have a link to the exact results you're talking about?
[doublepost=1462927072][/doublepost]I did some quick searching and found this:

http://barefeats.com/gtx980ti.html

980 Ti with the newer web drivers gets 33.56 seconds. Are you saying a 780M beats that?
 
Okay, PCIe Gen3 with a fast CPU beats PCIe Gen2 with a slow CPU, no big surprise there. I'd be more interested in a comparision of a 780M with a 980 in a modern Hackintosh. Do you have a link to the exact results you're talking about?

https://forums.macrumors.com/threads/fcpx-amd-vs-nvidia.1956128/

There are few tests shows that a same GPU installed in PCIe Gen 2 x16 slot only few % (as low as 1-2%) penalty behind the Gen 3. Also, the 780m is not that powerful to fully saturate a PCIe Gen 2 x16 slot.

The GTX770 need 29s to finish the test.

980 need 27s.

And a user report that on his iMac with the 780m only need 19s.

Also, FCPX is quite multi core optimised. A Quad core iMac can hardly beat the Hex core Mac Pro that much in FCPX by it's higher single core performance. (even with the same GPU)

Anyway, my dual 7950 setup already prove that CPU is not the main factor, if CPU is so limiting in this test. There is no way that I can finish the task with just 15s.

And on my Mac Pro, my 2nd GPU is installed in the Gen 2 x4 slot for better cooling. I did quite a few tests to measure the performance penalty. It's just about 2.4%. If a 7950 can maintain >95% performance in a Gen 2 x4 slot. I really don't think Gen 3 x16 offer any significant advantage over Gen 2 x16 for the 780m.

It's more like talking about install a HDD in SATA 2 port will make it run slower than SATA 3. The bottleneck simply not there. Further improve the bandwidth should not improve anything significantly.
 
Okay, PCIe Gen3 with a fast CPU beats PCIe Gen2 with a slow CPU, no big surprise there.

19 sec: Mid-2011 iMac with 4-core 3.4 Sandy Bridge and GTX 780M
27.2 sec: 2010 MacPro with 6-core 3.46 Westmere and GTX 980

From other comments on MR over the years I was under the impression that [1] FCPx performance was heavily dependent on GPU choice and how well that GPU performed with OpenCL, and [2] that video rendering can make good use of multiple cores.

Given that [1] my GPU should be much faster, and [2] I have a slower CPU but 50% more cores, I didn't think my computer's performance would be so significantly worse. So I was surprised (and am still).

But you mention PCIe 3, and I admit I have no idea how well or poorly any of the above can saturate PCIe and/or be limited by PCIe 2.
 
That guy's results really make no sense. Since the GTX 770 is basically a higher clocked GTX 680 and a GTX 770m is just a lower clocked GTX 770, the results should be similar to a GTX 680. However, he got 19 second and the guy with the GTX 680 got 68 seconds.

Something obviously went wrong there.
 
Last edited:
Where are you getting those numbers from exactly? The screenshot in the link above is from 2013, which is years before NVIDIA released their new web drivers.

Keep in mind that the iMacs have QuickSync, which is used to accelerate the video decode/encode in FCP. I'm not an FCP expert but I've never seen it pegging my CPU cores at 100%, so the 4 vs 6 comparison should be meaningless. It certainly is optimized to take advantage of 2 GPUs though, so if you have 2 980s in a Hackintosh with QuickSync and PCIe Gen 3 I'm sure you'd get a better score. YMMV, I've never seen a report from such a system.

Really, at the end of the day, we're just beating around the same bush -- the 2010 Mac Pro is really long in the tooth by today's standards, and it's getting harder and harder to unlock the full potential of a modern GPU when you factor in all the other bottlenecks. That's why I moved to a Hackintosh a few years ago.
 
The biggest upgrade there is probably the RAID0 with SSD drives, that's one advantage that the nMP has over all the other systems (disk speed is more than 2x faster). The actual work done by the GPU on each frame of video is usually dwarfed by the cost of reading it from disk, sending it across the bus, sending it back across the bus, and writing it out to disk again.
[doublepost=1462980342][/doublepost]Okay, here's my config.

Core i7-4790K @ 4GHz
16GB 1666MHz RAM
GeForce TITAN X
10.11.4 with 346.03.06f01
Samsung 830 SSD

BruceX 5K finished in 14.62 seconds. I guess that proves the Maxwell card is running in compatibility mode, right?
 
The biggest upgrade there is probably the RAID0 with SSD drives, that's one advantage that the nMP has over all the other systems (disk speed is more than 2x faster). The actual work done by the GPU on each frame of video is usually dwarfed by the cost of reading it from disk, sending it across the bus, sending it back across the bus, and writing it out to disk again.

The cMP has PCIe SSD that 50% faster than 2 SATA SSD in RAID 0.

And FCPX is very GPU limiting, at least true for BruceX. If drive speed is a matter. My result will be very bad, because I only connect my 840 Evo to the SATA 2 port.

But the fact is my 840 via SATA2 only need 15s to finish the task with dual 7950. A 2x SATA SSD in RAID 0 2011 iMac seems can finish the task in 19s with 780m. But a Mac pro with faster PCIe SSD may need more time to finish the task.

Also, this guy said his performance is the same regardless using SSD or HDD.
https://forums.macrumors.com/threads/fcpx-amd-vs-nvidia.1956128/page-2#post-22575548

May be the project is too small, everything is just working with the RAM, but not the SSD/HDD.
 
The cMP has PCIe SSD that 50% faster than 2 SATA SSD in RAID 0.

And FCPX is very GPU limiting, at least true for BruceX. If drive speed is a matter. My result will be very bad, because I only connect my 840 Evo to the SATA 2 port.

But the fact is my 840 via SATA2 only need 15s to finish the task with dual 7950. A 2x SATA SSD in RAID 0 2011 iMac seems can finish the task in 19s with 780m. But a Mac pro with faster PCIe SSD may need more time to finish the task.

Also, this guy said his performance is the same regardless using SSD or HDD.
https://forums.macrumors.com/threads/fcpx-amd-vs-nvidia.1956128/page-2#post-22575548

May be the project is too small, everything is just working with the RAM, but not the SSD/HDD.

Who's cMP has that PCIe SSD? It's really hard to keep track of all of these different system configs you're talking about. SATA 2 can still support 300MB/sec, which is more than enough for a mechanical HD. If you have 2 SSDs on 2 different controllers, you could still get 600MB/sec in a RAID0 config. Obviously it'd be better to have 2 SATA 3 drives in RAID0 and then you'd be approaching the nMP throughput of 1GB/sec (I was able to get this with 2 840 PRO drives, since each can do 500MB/sec).

That guy is very likely limited by the fact he only has 2GB of VRAM. Each system will have a different bottleneck, and 2GB of memory to render a 5K video project is likely not enough and will end up causing a ton of thrashing (which is why his scores are more than 4x slower than mine).

Disabling the background rendering should mean that FCP is not caching the movie data in RAM, that's kind of the whole point of this and other FCP benchmarks (and why you need to restart FCP each time and clear out all its caches between runs).
 
Who's cMP has that PCIe SSD? It's really hard to keep track of all of these different system configs you're talking about. SATA 2 can still support 300MB/sec, which is more than enough for a mechanical HD. If you have 2 SSDs on 2 different controllers, you could still get 600MB/sec in a RAID0 config. Obviously it'd be better to have 2 SATA 3 drives in RAID0 and then you'd be approaching the nMP throughput of 1GB/sec (I was able to get this with 2 840 PRO drives, since each can do 500MB/sec).

That guy is very likely limited by the fact he only has 2GB of VRAM. Each system will have a different bottleneck, and 2GB of memory to render a 5K video project is likely not enough and will end up causing a ton of thrashing (which is why his scores are more than 4x slower than mine).

Disabling the background rendering should mean that FCP is not caching the movie data in RAM, that's kind of the whole point of this and other FCP benchmarks (and why you need to restart FCP each time and clear out all its caches between runs).

My understanding of "disabling background rendering" means that FCPX will not render the video. but not won't store the original imported video data into the cache.

Anyway, you are the persona suggest that SSD make the difference, and my single SATA 2 SSD vs that dual SATA (possible SATA 3 after firmware update) in RAID 0 suggest that you are wrong. And now you say that single sSD via SATA 2 is more than enough.

You suggest that a faster CPU make the difference, we point out that FCPX is not that CPU single thread limiting, and all in a sudden you said the CPU is not even working hard, so it's not the factor.

You are the person to initial those theories. That's good, but when we point out how unlikely the theory is correct. You suddenly stand on the other side and say we are considering the irrelevant factor?

May be that's because my English is bad. I miss understand your meaning. If that's the case, I am sorry for my rudeness. I voice that out just because I feel strange that every time I tell you why that's not right, then you will teach me back almost exactly the same thing :(

And now, one more theory you suggest is the QuickSync.

I just runt he benchmark again. The GPU is working hard through out the whose process. As AFAIK, BruceX do 2 things.

1) Rendering
2) encoding

And GPU should not do anything on the encoding part. So, I am quite sure QuickSync won't help that much. My CPU is exactly the old slow CPU as you describe. If my CPU is not limiting my dual GPU to perform. Then Quick sync won't help, because as you said, the bottleneck is not there.

Since the video is only 2 seconds long, and the output files is a mov file with Apple Quicktime codec. I highly doubt QuickSync can provide any benefit in this benchmark. If the video is longer, and the test is clearly divided into 2 part, rendering and encoding, and the output file is a H.264 MP4 file, may be the iMac can save a lot of time in the encoding part. However, not in this case.

Last but not least. I know quick sync is not gonna to help, please no need to teach me back on the same subject :D
 
I said SATA 2 is more than enough for a mechanical hard drive, not an SSD. What I meant was SATA 2 vs SATA 3 won't matter for a mechanical hard drive. I never said "1 SSD on SATA 2 is more than enough" for BruceX. Final Cut Pro does the ProRes decode/encode on the CPU, my understanding is that this is accelerated via QuickSync on modern CPUs. My understanding is that Final Cut Pro sends raw video frames across the PCIe bus, which means a faster bus will help (i.e. Gen3 vs Gen2). I'm trying to provide reasonable explanations as to why certain systems are running very slow, while others are running much faster.

Let me be honest: I don't really care that much about BruceX scores, because I don't use Final Cut Pro for anything other than running benchmarks. My system with a TITAN X is faster than any score I've seen quoted so far. What I do care about is all the crap floating around that the Maxwell cards are running in backwards compatibility mode, don't have color compression enabled, and so on.

There are obviously several factors that help with BruceX performance:
  • GPU speed
  • GPU driver quality
  • Amount of video memory
  • Hard drive speed
  • PCIe bus speed
There clearly is a point of diminishing returns, e.g. maybe 12GB isn't that much better than 8GB of video memory, maybe two SSDs in RAID0 isn't that much better than 1 SSD, and so on. However, if you only have 2GB of video memory, then basically nothing else matters and your score is going to be terrible. I wouldn't blame the NVIDIA driver for this, I wouldn't blame the Maxwell cards running in backwards compatibility mode, it's just that a 5K movie project needs a lot more video memory than that.
[doublepost=1462987964][/doublepost]Here's a chart of GPU utilization on my TITAN X while rendering the BruceX test. As you can see, it peaked at around 70% at the start, then dropped down to around 40% for most of the test. This indicates that my TITAN X is way more powerful than needed, at least in terms of raw GPU horsepower.
 

Attachments

  • Screen Shot 2016-05-11 at 10.30.09 AM.png
    Screen Shot 2016-05-11 at 10.30.09 AM.png
    90.1 KB · Views: 98
I said SATA 2 is more than enough for a mechanical hard drive, not an SSD. What I meant was SATA 2 vs SATA 3 won't matter for a mechanical hard drive. I never said "1 SSD on SATA 2 is more than enough" for BruceX. Final Cut Pro does the ProRes decode/encode on the CPU, my understanding is that this is accelerated via QuickSync on modern CPUs. My understanding is that Final Cut Pro sends raw video frames across the PCIe bus, which means a faster bus will help (i.e. Gen3 vs Gen2). I'm trying to provide reasonable explanations as to why certain systems are running very slow, while others are running much faster.

Let me be honest: I don't really care that much about BruceX scores, because I don't use Final Cut Pro for anything other than running benchmarks. My system with a TITAN X is faster than any score I've seen quoted so far. What I do care about is all the crap floating around that the Maxwell cards are running in backwards compatibility mode, don't have color compression enabled, and so on.

There are obviously several factors that help with BruceX performance:
  • GPU speed
  • GPU driver quality
  • Amount of video memory
  • Hard drive speed
  • PCIe bus speed
There clearly is a point of diminishing returns, e.g. maybe 12GB isn't that much better than 8GB of video memory, maybe two SSDs in RAID0 isn't that much better than 1 SSD, and so on. However, if you only have 2GB of video memory, then basically nothing else matters and your score is going to be terrible. I wouldn't blame the NVIDIA driver for this, I wouldn't blame the Maxwell cards running in backwards compatibility mode, it's just that a 5K movie project needs a lot more video memory than that.

No worries, I get your point now. anyway, sorry for my poor English. You are right, you never say that SATA 2 is more than enough for SATA 3, it my poor reading skill cause the misunderstanding.

Anyway, even though a bit off topic. Is there any one know if QuickSync can encode ProRes? I think it's not, but didn't find anything yet.

Also, I know 2 SSD in RAID can improve sequential read / write. But is that important to encoding? A few gigabyte movie require lots of time to encode, but only few seconds to copy. To me, it seems the max sequential read / write speed is not limiting anything.

On the other hand, I agree that VRAM can make the difference. But don't know how much it affect this test.

Back to a topic that related to this thread. So, do you think now we have enough data to decide the Web Driver is well optimised for Maxwell card?

N.B. I mean good enough, not perfect. e.g. if the web driver has 95% optimisation on 680, may be it also has >80% optimisation for maxwell card. Therefore, even though 680 can perform very well on some task, or even better than 980 on some task, the 980 still able to perform to a reasonable level on most task.

TBH, I don't know is there any new function on the maxwell card is activated by the web driver. On the Windows side, sure it is, but don't know if it's the same on the OSX side. Is there any way to fine out if colour compression is enabled? May be it's easier to rule this out by fact. If we can find out a single function that only avail on the maxwell card, but not on the older card (in OSX). I think we can then safely assume the card is not running in backward compatibility mode. Otherwise, there should be no new function, and the perform gain is purely coming from the much better hardware.
 
The chase of the wild goose.

From my experience and what I have read here I would say that claims of Maxwell running in "backwards compatability mode" and not using color compression are baloney.

Apple has allowed their OS and systems to fall into disrepair, we know this.

Nvidia can't fix it all with their driver.
 
Back to a topic that related to this thread. So, do you think now we have enough data to decide the Web Driver is well optimised for Maxwell card?

N.B. I mean good enough, not perfect. e.g. if the web driver has 95% optimisation on 680, may be it also has >80% optimisation for maxwell card. Therefore, even though 680 can perform very well on some task, or even better than 980 on some task, the 980 still able to perform to a reasonable level on most task.

TBH, I don't know is there any new function on the maxwell card is activated by the web driver. On the Windows side, sure it is, but don't know if it's the same on the OSX side. Is there any way to fine out if colour compression is enabled? May be it's easier to rule this out by fact. If we can find out a single function that only avail on the maxwell card, but not on the older card (in OSX). I think we can then safely assume the card is not running in backward compatibility mode. Otherwise, there should be no new function, and the perform gain is purely coming from the much better hardware.

I've been saying all along that yes, there's no evidence to suggest that key features like color compression aren't enabled on Maxwell. In general, Maxwell performs very well in GPU limited cases. Unfortunately, given the severe CPU limits in many/most games and benchmarks, it's much harder than you'd like to actually get a GPU limited case where a 980, 980 Ti or TITAN X can stretch its legs.

Some people look at the BareFeats numbers and conclude "Maxwell is running in compatibility mode and doesn't have color compression enabled". I've seen no actual evidence to back this up, because you know, how could anyone outside of NVIDIA actually know this? What I've seen is that no game can actually make my TITAN X run at 100% GPU utilization, even with a fast Devil's Canyon CPU (Core i7-4790K, same as the retina iMac).

In terms of GPU features, yes, the design of Apple's software interfaces means NVIDIA cannot expose any of the new functionality through OpenGL extensions and the like. The drivers are stuck implementing the same OpenGL 4.1 interface as every other GPU. Nothing NVIDIA can do about this.

Given the fact that Metal has a much lower CPU overhead than OpenGL, it'd be nice to do a comparison between a 680 and a 980 in a Metal benchmark. I'm not really aware of any apps/benchmarks that use Metal yet, though.

Edit: And to be clear, when I'm talking about newer/faster CPUs, I'm also including things like improved RAM speeds (1066 MHz in cMP vs 1666 MHz in my Hackintosh etc).
 
Status
Not open for further replies.
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.