Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
GPU Performance Update/Analysis No.1

Over the course of the last few months, I've performed a number of benchmark tests with my CUDA Rigs involving their OpenCL performance with Luxmark and their CUDA performance with AfterEffects ("AE") and Octane Render:

(1) See, e.g, posts #s 33 - 40 [ https://forums.macrumors.com/threads/1587574/ ];
(2) See, e.g, post #8 [ https://forums.macrumors.com/showthread.php?p=17607424#post17607424 ].

Some of the conclusions that I've drawn are as follows:
(1) In at least one of the OpenCL only benchmarks (Luxmark), your GTX card(s)'s performance will be lower than that of comparably priced ATI video cards. The reason for this may be as 666sheep suggests in post no. 607 in this thread [ https://forums.macrumors.com/threads/1333421/ ], namely that the code was not optimized for Nvidia cards.
(2) In CUDA benchmarks, your GTX cards may exhibit linear performance gains with additional GPUs of the same type (like in Octane render) and your GTX cards may not exhibit linear performance gains with additional GPUs of the same type (like in AE). E.g., one of my Titan's performance of that AE CUDA test used by Barefeats, was completed in 9 min. 16 secs (faster than test run with 3 GTX 580 Classified) and in the After Effects CS6 CUDA Benchmark Test linked to here [ https://forums.macrumors.com/threads/1587574/ ], one Titan card completed the test in 3 min. 43 secs and two Titan cards completed the test in 2 min. 55 secs, but any number of cards >2 took longer to complete the test than one Titan card did.
(3) Currently the king of GTX CUDA performance is the GTX Titan.
(4) Be guided in your GPGPU selection by the software you will run on it.
 
Last edited:
Over the course of the last few months, I've performed a number of benchmark tests with my CUDA Rigs involving their OpenGL performance with Luxmark and their CUDA performance with AfterEffects ("AE") and Octane Render

Are you benchmarking OpenGL or OpenCL performance?
 
Just a follow up - I've secured myself an X5687 on a good deal - should be here in a few days. I'll post my results if I can get 'er working in my 4,1 flashed 5,1.

I know Xavi8tor had tried this route before on an X5687 with no luck...

Well, I'm dismayed to report that the X5687 didn't work. Much like X8viator's experience, the system simply refused to POST. No LEDs, no beeps, no flashing LEDs or front panel. Just "nothing". I went through the MacPro2009 Service manual to see if I could figure out what was wrong based on the DIAG button on the montherboard, but nada - everything is fine according to the diagnostics, but just no boot.

I swapped the chip back to the W3680 and it booted back up again. (My 09 is indeed flashed to the 2010 firmware).

Oh well - I can swap this X5687 with a W3690 from the same Vendor, so looks like I'm going the same route that everyone else has (hexa-core 3.46) as the "max" I can make this baby run.

- CK.
 
I would like to contact Tutor via email/PM.

I have an issue with an evga sr-2 hack that I believe you may be able to help with.

Happy to pay for assistance...
 
I would like to contact Tutor via email/PM.

I have an issue with an evga sr-2 hack that I believe you may be able to help with.

Happy to pay for assistance...

Contact RampageDev as he is the guy (right now) to approach. I'm sure Tutor would like to help, but he is very busy and is more about contributing his thoughts on here and letting you learn how to do on your own; which is the very reason why this post exists.

If you don't have the time to learn, again, RampageDev would be the guy to turn to. Here's his direct link to help you have an SR-2 system that is typically up and running in a couple of hours, stream lined to perfection (as he has done with my machine).

http://rampagedev.wordpress.com/premium-technical-support/sr-2-production-system/

RampageDev (as with Tutor) have been an amazing ASSET to the Hackintosh community, so PLEASE contribute what RampageDev is asks for as his time too is just as important. Hope this helps, later... :)
 
Well, I'm dismayed to report that the X5687 didn't work. Much like X8viator's experience, the system simply refused to POST. No LEDs, no beeps, no flashing LEDs or front panel. Just "nothing". I went through the MacPro2009 Service manual to see if I could figure out what was wrong based on the DIAG button on the montherboard, but nada - everything is fine according to the diagnostics, but just no boot.

I swapped the chip back to the W3680 and it booted back up again. (My 09 is indeed flashed to the 2010 firmware).

Oh well - I can swap this X5687 with a W3690 from the same Vendor, so looks like I'm going the same route that everyone else has (hexa-core 3.46) as the "max" I can make this baby run.

- CK.

It is my understanding that any alpha 5 series Intel CPU will not work on a single CPU Mac. The alpha 3 series are for the single CPU Mac Pros. Dual CPU systems take the alpha 5 series. I know previous posters have posted otherwise, but I know of no one with a single CPU Mac Pro to successful run an X5xx series CPU

Lou
 
Overclocking is not a good idea, is it?

Overclocking is only a bad idea if your system is not properly cooled. It's likely to shorten the lifespan of your CPU so don't can't expect it to last for decades either.

It's not just a hobby thing, BOXX tech sell many professional grade workstations with heavily overclocked options.

It's not unrealistic to get 30% or more performance boost from your machine. Considering that performance already lies dormant in your machine it's quite tempting to unlock it.

*note this only applies to PCs and hackintosh machines, you can't overclock a mac!
 
Gigabyte 2011 progress?

Hey Tutor, just wondering if you've completed a GA-7PESH3 build yet?

You mentioned a few pages back they were under construction.
 
Hey Tutor, just wondering if you've completed a GA-7PESH3 build yet?

Those boards have literally just become available. I suspect Tutor is waiting for the next round of chips from Intel before he does it. The IVB Xeons are due any ol' day now, but they're not out yet.
 
Hey Tutor, just wondering if you've completed a GA-7PESH3 build yet?

You mentioned a few pages back they were under construction.

Planning has been completed and acquisition of parts, except for Mobo, CPUs and ram, has begun. Otherwise, Jasonvp's last post is correct. However, I've recently begun to consider whether my next build should be a quad 8-core to 12-core per CPU system.
 
Last edited:
Planning has been completed and acquisition of parts, except for Mobo, CPUs and ram, has begun. Otherwise, Jasonvp's last post is correct.

Ahh ok cool. Thought you might have started already with existing 8 core CPUs with the plan to upgrade later.

However, I've recently begun to consider whether my next build should be a quad 8-core to 12-core per CPU system.

What's the objective behind such a CPU heavy build, with emphasis being on GPU lately?
 
For Balance: Some need more fast CPU and GPU slots filled in rendering environments.

Ahh ok cool. Thought you might have started already with existing 8 core CPUs with the plan to upgrade later.

My two Xeon E5-2687Ws will be used for another dual 8-core build and the next place that they will go (never again in an ASUS mobo) will most likely be on a Gigabyte GA-7PESH3; just not immediately. What I'm now leaning against doing is that I was originally planning to build more of them. Their additional advantage is that each of them can also house 4 dual wide GTX GPUs. So I can't completely rule out more Gigabyte dual CPU systems in the future.

What's the objective behind such a CPU heavy build, with emphasis being on GPU lately?

My priority currently is to grow my GPU rendering capacity first and foremost, but while not completely ignoring my CPU rendering needs. Some of the rendering jobs that I do are CPU based and in some of my applications only some of the functions take advantage of CUDA and/or OCL. The current quad CPU motherboards have fewer PCIe slots, but I'll still be able to add a couple of the high-end GTX CUDA cards to that mix. Also, I have at least 5 other systems, each of which can house 3 dual wide GTX GPUs. The newer top of ATI's Radeon line (like the Radeon HD 7990 6GB) is not out of the question, depending on if OCL really takes off. So as Nvidia further improves the top of the GTX line, I can just push my current GTX cards down into the slots of my other five systems (four 6-core and one 4-core clock tweaked systems now housing old ATI Radeon cards) as I replace the ones now in the CUDA rigs mentioned in my sig. The net result will be that I'll have greater flexibility and for many projects I'll get the best of GPU and CPU worlds because the rendering can be allocated to take advantage of the many networked (and mostly clock tweaked) CPU cores and CUDA/OCL cores simultaneously. Since Octane Render won't be taxing those CPUs at all, those CPUs can be simultaneously rendering their assigned chores. Each high speed system will have its CPUs for rendering jobs, while, independently, its GPUs will be rendering another job, further maximizing my investments. However, system no. count will continue to slow in growth as I lean more towards GPU additions/upgrades as time goes on. Of course, the need for fast, open PCIe slots is and will be great for the foreseeable future, so I can't rule out other 8+ dual wide GPU systems such as my Tyan server. And what about the Xeon Phi co-processor? Their adoption could require more fast PCIe slots.
 
Last edited:
My two Xeon E5-2687Ws will be used for another dual 8-core build and the next place that they will go (never again in an ASUS mobo) will most likely be on a Gigabyte GA-7PESH3; just not immediately. What I'm now leaning against doing is that I was originally planning to build more of them. Their additional advantage is that each of them can also house 4 dual wide GTX GPUs. So I can't completely rule out more Gigabyte dual CPU systems in the future.



My priority currently is to grow my GPU rendering capacity first and foremost, but while not completely ignoring my CPU rendering needs. Some of the rendering jobs that I do are CPU based and in some of my applications only some of the functions take advantage of CUDA and/or OCL. The current quad CPU motherboards have fewer PCIe slots, but I'll still be able to add a couple of the high-end GTX CUDA cards to that mix. Also, I have at least 5 other systems, each of which can house 3 dual wide GTX GPUs. The newer top of ATI's Radeon line (like the Radeon HD 7990 6GB) is not out of the question, depending on if OCL really takes off. So as Nvidia further improves the top of the GTX line, I can just push my current GTX cards down into the slots of my other five systems (four 6-core and one 4-core clock tweaked systems now housing old ATI Radeon cards) as I replace the ones now in the CUDA rigs mentioned in my sig. The net result will be that I'll have greater flexibility and for many projects I'll get the best of GPU and CPU worlds because the rendering can be allocated to take advantage of the many networked (and mostly clock tweaked) CPU cores and CUDA/OCL cores simultaneously. Since Octane Render won't be taxing those CPUs at all, those CPUs can be simultaneously rendering their assigned chores. Each high speed system will have its CPUs for rendering jobs, while, independently, its GPUs will be rendering another job, further maximizing my investments. However, system no. count will continue to slow in growth as I lean more towards GPU additions/upgrades as time goes on. Of course, the need for fast, open PCIe slots is and will be great for the foreseeable future, so I can't rule out other 8+ dual wide GPU systems such as my Tyan server. And what about the Xeon Phi co-processor? Their adoption could require more fast PCIe slots.

I like the idea of having two jobs going on each machine at the same time!

In planning my next moves I'm of course curious about the hackintosh capability of the GA-7PESH3. To free up a PCI slot do you know if the ethernet controller might work natively with OSX? Here's hoping we can eventually get turbo & speedstep etc. functionality with these E5 CPUs :D

It seems such a shame to see current hackintosh GB scores of the 2687W at a little over half of the same machine running Windows. :(
 
I like the idea of having two jobs going on each machine at the same time!

I do too because that means that my CPU clock tweaked systems (some of which you can see more details on in the URL in my sig.) and the time that I spent tweaking them won't go to waste.

In planning my next moves I'm of course curious about the hackintosh capability of the GA-7PESH3. To free up a PCI slot do you know if the ethernet controller might work natively with OSX? Here's hoping we can eventually get turbo & speedstep etc. functionality with these E5 CPUs :D

Historically, Gigabyte has made the most OS agnostic motherboards on the market. That's why the vast majority of my systems are Gigabyte based. I hope that Gigabyte's tradition continues.

Moreover, you can always opt for a USB 3 ethernet controller to keep all four dual wide PCIe slots free.

It seems such a shame to see current hackintosh GB scores of the 2687W at a little over half of the same machine running Windows. :(

I agree with jasonvp that we should soon have the breakthroughs needed.
 
Why did Intel lockdown clock-tweaking of Xeons? Could it be due to stalling?

This is the main reason why I'm loading up more on GPGPU capacity. The three pics below show 12 cores of Xeon 5680 Westmere (64-bit Geekbench 2) performance vs. 12 cores of Xeon E5-2697 V2 Ivy Bridge (32-bit Geekbench 2) Performance. My 2010 WolfPack1's performance smashes any Mac Pro 2013/2014 envy I could conceivably muster, even if you throw in a 15% increase for 32-bit vs. 64-bit performance deltas.
 

Attachments

  • WOLFPACK1-1.jpg
    WOLFPACK1-1.jpg
    1.4 MB · Views: 179
  • WOLFPACK1-2.jpg
    WOLFPACK1-2.jpg
    1.1 MB · Views: 122
  • WOLFPACK1-3.jpg
    WOLFPACK1-3.jpg
    1,018.3 KB · Views: 123
Last edited:
This is the main reason why I'm loading up more on GPGPU capacity. The three pics below show 12 cores of Xeon 5680 Westmere (64-bit Geekbench 2) performance vs. 12 cores of Xeon E5-2697 V2 Ivy Bridge (32-bit Geekbench 2) Performance. My 2010 WolfPack1's performance smashes any Mac Pro 2013/2014 envy I could conceivably muster, even if you throw in a 15% increase for 32-bit vs. 64-bit performance deltas.

I'm trying to understand what makes the old X5680 Xeons perform like this?

The clock is lower, 1/2 the cores (for a single CPU) yet even if we halve the score the single 6-core still comes pretty close to the new single 12-core.

Would you suggest going a real macpro5,1 route instead of GA-7PESH3, if one does not care about 4 PCIe slots, but just general performance and a single GPU card?

EDIT: hmm I just realized your WolfPack1 is not a real macpro5,1, but EVGA motherboard based.
 
I'm trying to understand what makes the old X5680 Xeons perform like this?
The clock is lower, 1/2 the cores (for a single CPU) yet even if we halve the score the single 6-core still comes pretty close to the new single 12-core.

The X5680s are underclocked, but turbo boost biased to run slower and cooler when there is no load, but when there is a load on them they tend to turbo boost more frequently and much higher to around 4.9 Ghz each for the participating cores.

Would you suggest going a real macpro5,1 route instead of GA-7PESH3, if one does not care about 4 PCIe slots, but just general performance and a single GPU card?

If one does not care about having 4 double wide x16 PCIe slots for future GPGPU performance and does want to clock tweak the Sandy/Ivy Bridge CPUs to the minimal extent allowed on the CPUs that are supported by the GA-7PESH3, then yes I would recommend a dual CPU Mac Pro 5,1 and a CUDA card (or two) if needed. However, for speed and $$$ savings, I'd recommend the EVGA SR-2 for a diehard tweaker. It too has 4 double wide x16 PCIe slots, but unlike the GA-7PESH3 which has PCIe 3.0 slots, the EVGA board has PCIe 2.0 slots. When it comes to CUDA performance, the EVGA is fine. I have two of them. One has 4 GTX 680s (Geekbench 2 score 40,051) and the other one has 4 GTX 580 Classified GPUs (Geekbench 2 score 40,100) for Octane renderings.

EDIT: hmm I just realized your WolfPack1 is not a real macpro5,1, but EVGA motherboard based.
True.
 
Last edited:
Thanks for that Tutor.

I am split between looking for a real macpro5,1 and building custom. I might never have a need for so many GPUs but then again I like to future-proof things and it only makes sense to allow for expansion, if needed.

I found a new EVGA SR-X for sale yesterday for a pretty good price, which is enticing me to custom build because it has LGA2011 Xeon support, plus the benefits of PCIe 3.0. :D
 
Thanks for that Tutor.

I am split between looking for a real macpro5,1 and building custom. I might never have a need for so many GPUs but then again I like to future-proof things and it only makes sense to allow for expansion, if needed.

I found a new EVGA SR-X for sale yesterday for a pretty good price, which is enticing me to custom build because it has LGA2011 Xeon support, plus the benefits of PCIe 3.0. :D

As between the EVGA SR-X, which is no longer in production, and the new Gigabyte GA-7PESH3, I'd recommend the Gigabyte board for better support, quality, and multi-OS compatibility.
 
How about 8 Ivy Bridge Xeons and 12 Titans/Xeon Phis in a GPU/CPU powerhouse.

For those with massive rendering needs, you might want to consider a GPU accelerated super computer. For an extremely competitive price, Supermicro has introduced its line of GPU FatTwins [ http://www.supermicro.com/products/nfo/FatTwin.cfm?show=SELECT&type=GPGPU#solutions ]. For about $5,000, you can purchase the barebones system, giving you four nodes in a 4U rack. That's comes to $1,250 for each barebones node. You have to supply the OS, HDDs, GPUs, CPUs and memory or you can buy a fully decked out system from a system integrator such as [ http://www.pcsuperstore.com/products/11866416-SuperMicro-SYSF627G3FT.html ]. Among its other features, the 4U rack enclosure has twelve PCIe 3.0 x16 slots (3 such slots per node), supporting up to twelve double wide GPU cards. Each node supports two 3.5" or six 2.5" HDDs. There are 16 memory slots per node. Each node also supports dual Intel E5-2600 Xeons, giving you the room to load this server with up to eight of the latest Intel Sandy/Ivy Bridge Xeon CPUs (up to 130W TDP) of your choice. The system has four 1,620 watt PSUs. Talk about upgrade potential!

BTW - It should be able to run multiple OSes. Moreover, the nodes can be populated individually as your needs grow and wallet allows. Although a node's CPUs can have different specs than those in another node, each node itself must be populated with the same spec CPUs, e. g., one node can have a pair of high cost E5-2697 v2s [ http://www.cpu-world.com/CPUs/Xeon/Intel-Xeon E5-2697 v2.html ] and another can have a pair of lower cost E5-2650 v2 [ http://www.cpu-world.com/CPUs/Xeon/Intel-Xeon E5-2650 v2.html ].
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.