Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Need advice

Hi Tutor

You inspire me to build my own system.

I want to build two systems running Ubuntu server OS with dual and single CPUs for CUDA computing.

My idea is to use one CPU core per GPU.
Thats why i want to use dual Xeons with many GPUs and single CPU with dual GPU.

I think that I need faster cores.
Maybe there is exist even faster Xeons (LGA 1366) than X5680/X5690 but with less cores?

  1. Single LGA 1366 Socket

    I think i don't need single Xeon X5680/X5690 here, because there is cheaper analogue i7-980x/i7-990x.
    I want to overclock CPU in this system.
    Can you recommend motherboard, CPU, enclosure and cooling system?

  2. Dual LGA 1366 Socket

    I want to use TYAN server like yours (12 cores and 8 GPU).
    Will I be able to overclock Xeons on Tyan board?
    I want to use X5680 and overclock it. By the way, why are you not using

What software are you using to overclock CPUs and GPUs?
 
Let software applications guide your build.

Hi Tutor

You inspire me to build my own system.

I want to build two systems running Ubuntu server OS with dual and single CPUs for CUDA computing.

My idea is to use one CPU core per GPU.
Thats why i want to use dual Xeons with many GPUs and single CPU with dual GPU.

I think that I need faster cores.
Maybe there is exist even faster Xeons (LGA 1366) than X5680/X5690 but with less cores?

  1. Single LGA 1366 Socket

    I think i don't need single Xeon X5680/X5690 here, because there is cheaper analogue i7-980x/i7-990x.
    I want to overclock CPU in this system.
    Can you recommend motherboard, CPU, enclosure and cooling system?

  1. First, what CUDA supported software applications will you be using? This could significantly affect my recommendations. Also, I'm curious about why you propose to use one CPU [core?] per GPU? The number of CPUs on the motherboard can affect the number of PCIe slots you get, but I have 3 double wide CUDA cards in my Gigabyte i7/3670 clock tweaked systems (described further, below).

    Gigabyte, Tyan and Supermicro are the motherboard manufacturers that I recommend because of multi-OS compatibility, price and quality. Of those three manufacturers, only Gigabyte single CPU i7 motherboards are the ones that allow significant clock tweaking. As to modern chips, only LGA 1366 chips (i7 and Xeon Nehalems and Westmeres) and LGA 2011 Sandy/Ivy Bridge and Haswell i7 chips can be clock tweaked significantly. LGA 2011 Xeons are only minimally tweakable, but only on certain single CPU boards - not Tyan's and Supermicro's. It's possible to minimally clock tweak them, but only about 3 to 4% on a certain dual LGA 2011 CPU motherboard from EVGA [the SR-X (now discontinued)] and the troublesome Asus dual socket boards and that's not worth the possible instability for so little gain. The problem with single LGA 1366 motherboards that support clock tweaking is that new ones are extremely hard to find. The used ones I'd advise against purchasing because you wouldn't know whether the owner who may have clock tweaked it knew what he/she was doing, i.e., the board could be in horrible condition despite looking new. In sum, for single CPU clock tweaking, I would buy only a new Gigabyte LGA 1366 motherboard with as many PCIe slots as possible (for future GPGPU/coprocessor) growth if I could find one.

    My single CPU motherboards are Gigabyte X58A-UD3R and UD5's with i7-980Xs/W3670s cooled by Corsair H80s. Here's a great inexpensive case ($160 - SSI EEB, SSI CEB, Extended ATX, ATX, Micro ATX compatible) that I own [ http://www.newegg.com/Product/Produ...er+Cases+-+ATX+Form)-_-Silverstone-_-11163185 ]. See post #750, above.

    [*]Dual LGA 1366 Socket

    I want to use TYAN server like yours (12 cores and 8 GPU).
    Will I be able to overclock Xeons on Tyan board?
    I want to use X5680 and overclock it. By the way, why are you not using
What software are you using to overclock CPUs and GPUs?
You cannot overclock Xeons (or even i7s) on any Tyan or Supermicro motherboard and not on any Gigabyte LGA 2011 dual Xeon motherboard. CPU clocktweaking should only be done by via the system bios, and not all bios allow clock tweaking. CUDA GPGPU clock tweaking is done with Nvidia Control Panel and EVGA Precision X. In fact, I haven't reached a final decision whether I'll use my X5675s, X5680s or buy a new pair of lower powered/speed Nehalems or Westmeres for permanent residence on my eight double wide PCIe slotted Tyan Server because I use that system mainly for CUDA based Octane Render and thus the CPU speed/core count doesn't make a difference so long as the chip's QPI [what we used to call "Front Side Buss" (FSB)] is 3200 MHz.

At Superbiiz, the barebones Tyan Rackmount Server [ https://www.superbiiz.com/detail.php?name=TS-B715V2R ] costs $3,630 (that includes shipping for whole order including HHD, SSD and ram). You can turn it into a minimal server by adding a 1.5 T hard drive and a 240 G SSD for under $500, adding dual 5680s for $1,800 [ https://www.eoptionsonline.com/p-2049-594880-001.aspx ] and adding 72 Gigs of ram - 9 packs of Kingston KVR1333D3E9SK2/8G DDR3-1333 8GB (2x4GB) ECC CL9 Memory Kit for total of $752 [ https://www.superbiiz.com/detail.php?name=W8GE1333K ] , for a total cost of under $6,682.

Moreover, further complicating the matter - the LGA 1366 Xeons, such as the X5680/X5690s, will be discontinued at the end of this year - 2013.
 
Moreover, further complicating the matter - the LGA 1366 Xeons, such as the X5680/X5690s, will be discontinued at the end of this year - 2013.

Tutor's most excellent post is worth reading carefully, but the concluding sentence is the most important IMHO. It's time to move beyond 1366 if you're purchasing. Laying out good cash for hardware that is, essentially, two generations old really isn't a wise investment.
 
Let software applications guide your build.

Tutor's most excellent post is worth reading carefully, but the concluding sentence is the most important IMHO. It's time to move beyond 1366 if you're purchasing. Laying out good cash for hardware that is, essentially, two generations old really isn't a wise investment.

Agreed, but with one exception - If you building a CUDA rig for Octane Render with the maximum no. of double wide PCIe x16 slots all in one box, Tyan has a LGA 1366 8 slot barebones version for ~$3,600 and the 2011 8 slot barebones version for ~$5,000. For Octane GPU rendering, the CPUs don't matter and the difference between PCIe V2 and V3 doesn't make a big difference in CUDA performance, so I consider the 1366 barebones system to still be a good investment for a CUDA rig if $1,400+ matters. But, if you are building the system for CPU maximum performance of particular application(s) and money is no object, then my exception is irrelevant. That's why application use should dictate build. However, each 2011 CPU [ http://www.cpu-world.com/CPUs/Xeon/Intel-Xeon E5-2667.html = E5 -2667 or http://www.newegg.com/Product/Product.aspx?Item=N82E16819117264 = E5 - 2665] comparable to the performance of an X5680 (I know X5680 is a special case currently low priced item due to volume) cost $600+ more than the X5680 currently costs. I'm Mr. Cheapo Penny Pincher - (2x$600 + 1400 = $2,600). That's enough cash to add some of the flesh (CPUs, HHDs, SSDs, and/or ram) to the bare bones. The bad man always tries to save himself and you money.

P.S. Although Octane Render is a 3d renderer that relies solely on your CUDA GPU, not all CUDA renderers are 3d renderers and not all renderers, whether or not, 3d renderers rely solely on the GPU. If your renderer relies only partly on the GPU, then you 'll take that into account when selecting your CPUs and thus your motherboard.
 
Last edited:
First, what CUDA supported software applications will you be using? This could significantly affect my recommendations. Also, I'm curious about why you propose to use one CPU [core?] per GPU? The number of CPUs on the motherboard can affect the number of PCIe slots you get, but I have 3 double wide CUDA cards in my Gigabyte i7/3670 clock tweaked systems (described further, below).

I want to write my own software. There will be implemented inversion of very big matrix(4 billion by 4 billion) by block. Every matrix's block will be inversed by GPU(block is also matrix).

Here is some information on matrix block form - http://en.wikipedia.org/wiki/Block_matrix

GPU will return matrix-vector product.

For example matrix will contain 4 blocks and I will have 4 GPU.

CPU core_1 will run code on GPU_1
CPU core_2 wiil run code on GPU_2
CPU core_3 will run code on GPU_3
CPU core_4 wiil run code on GPU_4

core_1, core_2, core_3, core_4 will ran in parallel.

In real world number of blocks(4, 16, 36, 64, ...) will be greater than number of GPUs. The cores will reran code on the next blocks.

Gigabyte, Tyan and Supermicro are the motherboard manufacturers that I recommend because of multi-OS compatibility, price and quality. Of those three manufacturers, only Gigabyte single CPU i7 motherboards are the ones that allow significant clock tweaking. As to modern chips, only LGA 1366 chips (i7 and Xeon Nehalems and Westmeres) and LGA 2011 Sandy/Ivy Bridge and Haswell i7 chips can be clock tweaked significantly. LGA 2011 Xeons are only minimally tweakable, but only on certain single CPU boards - not Tyan's and Supermicro's. It's possible to minimally clock tweak them, but only about 3 to 4% on a certain dual LGA 2011 CPU motherboard from EVGA [the SR-X (now discontinued)] and the troublesome Asus dual socket boards and that's not worth the possible instability for so little gain. The problem with single LGA 1366 motherboards that support clock tweaking is that new ones are extremely hard to find. The used ones I'd advise against purchasing because you wouldn't know whether the owner who may have clock tweaked it knew what he/she was doing, i.e., the board could be in horrible condition despite looking new. In sum, for single CPU clock tweaking, I would buy only a new Gigabyte LGA 1366 motherboard with as many PCIe slots as possible (for future GPGPU/coprocessor) growth if I could find one.

My single CPU motherboards are Gigabyte X58A-UD3R and UD5's with i7-980Xs/W3670s cooled by Corsair H80s. Here's a great inexpensive case ($160 - SSI EEB, SSI CEB, Extended ATX, ATX, Micro ATX compatible) that I own [ http://www.newegg.com/Product/Produ...er+Cases+-+ATX+Form)-_-Silverstone-_-11163185 ]. See post #750, above.

What CPU will you recommend for Gigabyte motherboard?
i7-980x/i7-990x/X5680/X5690 ?

By the way, why are you not using X5690 CPUs?


CPU speed/core count doesn't make a difference so long as the chip's QPI [what we used to call "Front Side Buss" (FSB)] is 3200 MHz.
Can you explain it more detail
 
I want to write my own software. There will be implemented inversion of very big matrix(4 billion by 4 billion) by block. Every matrix's block will be inversed by GPU(block is also matrix).

Here is some information on matrix block form - http://en.wikipedia.org/wiki/Block_matrix

GPU will return matrix-vector product.

For example matrix will contain 4 blocks and I will have 4 GPU.

CPU core_1 will run code on GPU_1
CPU core_2 wiil run code on GPU_2
CPU core_3 will run code on GPU_3
CPU core_4 wiil run code on GPU_4

core_1, core_2, core_3, core_4 will ran in parallel.

In real world number of blocks(4, 16, 36, 64, ...) will be greater than number of GPUs. The cores will reran code on the next blocks.



What CPU will you recommend for Gigabyte motherboard?
i7-980x/i7-990x/X5680/X5690 ?

I'd recommend the i7-990x if you're wed to getting a single CPU socket LGA 1366 motherboard. However, from your description of your intended matrix application and that at wikipedia, it appears to me that you ought to consider an i7-4960X [ http://www.cpu-world.com/CPUs/Core_i7/Intel-Core i7-4960X Extreme Edition.html ] and EVGA X79 DARK - LGA 2011 Intel X79 SATA 6Gb/s USB 3.0 E-ATX Motherboard with Brand New GUI BIOS (150-SE-E789-KR) [ http://www.newegg.com/Product/Product.aspx?Item=N82E16813188131 ].

By the way, why are you not using X5690 CPUs?

They weren't released when I purchased my original 5680s and when I purchased more 5680's, there was a differential of almost $600, which, to me, was to much of a price differential for the small increase in speed.

Can you explain it more detail
My Tyan Server is used for CUDA based Octane Render. Octane renders only on the GPU and doesn't rely at all on the speed of either CPU. So the speed of the CPUs or the number of cores that they each have are irrelevant since the renderer doesn't rely on them for anything other than supplying the scene data to be rendered.

The chip's Quick Path Interconnect or QPI [what we used to call "Front Side Buss" (FSB) http://en.wikipedia.org/wiki/Intel_QuickPath_Interconnect ] determines how fast data moves from point to point, i.e., bandwidth. LGA 1366 has a unidirectional bandwidth of 3200 MHz and bidirectional bandwidth of 6400 MHz. LGA 2011 has a unidirectional bandwidth of 4000 MHz and bidirectional it's 8000 MHz. So data moves faster from point to point on LGA 2011. The outtake is that when an application relies on both the GPU and CPU for processing its best to have the fastest processor(s) [single CPU - i7-4960X or dual CPU - Intel Xeon E5-2697 v2 Ivy Bridge-EP 2.7GHz LGA 2011 130W 12-Core Server Processor BX80635E52697V2], the fastest bandwidth [8 GHz], the fastest PCIe slots (V3) and the fastest GPUs (currently Titans if you're coding with CUDA or the currently faster AMD 7990 [ http://www.newegg.com/Product/Produ...deId=1&name=4096 (2048 x 2) Stream Processors ] if you're coding with Open CL). But because you are the author of the application, you should have a much clearer understanding than I do of the role to be played by the CPU and GPU in your application environment. If both the CPU and GPU will be doing a lot of computations, consider going with LGA 2011. One last point, you ought to consider the criticality of memory error detection/corrections. In order words, you made need to stick with Xeons (which support ECC memory) [ see, e.g., http://www.cpu-world.com/CPUs/Xeon/Intel-Xeon E5-2697 v2.html ] for both systems, get error correcting memory (ECC) [ http://en.wikipedia.org/wiki/ECC_memory ] and the higher priced Tesla GPU cards (which use ECC memory) [ http://www.nvidia.com/object/why-choose-tesla.html ].
 
Last edited:
The chip's Quick Path Interconnect or QPI [what we used to call "Front Side Buss" (FSB) http://en.wikipedia.org/wiki/Intel_QuickPath_Interconnect ] determines how fast data moves from point to point, i.e., bandwidth. LGA 1366 has a unidirectional bandwidth of 3200 MHz and bidirectional bandwidth of 6400 MHz. LGA 2011 has a unidirectional bandwidth of 4000 MHz and bidirectional it's 8000 MHz. So data moves faster from point to point on LGA 2011. The outtake is that when an application relies on both the GPU and CPU for processing its best to have the fastest processor(s) [single CPU - i7-4960X or dual CPU - Intel Xeon E5-2697 v2 Ivy Bridge-EP 2.7GHz LGA 2011 130W 12-Core Server Processor BX80635E52697V2], the fastest bandwidth [8 GHz], the fastest PCIe slots (V3) and the fastest GPUs (currently Titans if you're coding with CUDA or the currently faster AMD 7990 [ http://www.newegg.com/Product/Produ...deId=1&name=4096 (2048 x 2) Stream Processors ] if you're coding with Open CL). But because you are the author of the application, you should have a much clearer understanding than I do of the role to be played by the CPU and GPU in your application environment. If both the CPU and GPU will be doing a lot of computations, consider going with LGA 2011.

But according to this http://ark.intel.com/compare/47932,47916,77779,75283 i7-4960x has 5 GT/s system bus.
i7-980x and X5680 have 6.4 GT/s system bus.
Only E5-2600 Xeon from this list has 8 GT/s system bus.

i7-980x and x5680 have faster system bus than i7-4960x.

Am I missing smth?

One last point, you ought to consider the criticality of memory error detection/corrections. In order words, you made need to stick with Xeons (which support ECC memory) [ see, e.g., http://www.cpu-world.com/CPUs/Xeon/Intel-Xeon E5-2697 v2.html ] for both systems, get error correcting memory (ECC) [ http://en.wikipedia.org/wiki/ECC_memory ] and the higher priced Tesla GPU cards (which use ECC memory) [ http://www.nvidia.com/object/why-choose-tesla.html ].
Tesla(Kepler) costs too much.
 
But according to this http://ark.intel.com/compare/47932,47916,77779,75283 i7-4960x has 5 GT/s system bus.
i7-980x and X5680 have 6.4 GT/s system bus.
Only E5-2600 Xeon from this list has 8 GT/s system bus.

i7-980x and x5680 have faster system bus than i7-4960x.

Am I missing smth?

Tesla(Kepler) costs too much.

You're not missing anything about the bus speed of the i7-4960. All of the post-Westmere E series i7s have this limit; but don't forget that this was done about the time Intel recalibrated clock frequency intervals at 100 MHz for Sandy Bridge and later (down from 133 MHz intervals for Westmere and Nehalem). I understood that you were interested in overclocking the CPU and that is why I recommended that CPU - its one of the fastest i7s out of the box that allows for overclocking to further enhance it's processing power [ but see http://www.anandtech.com/show/7255/intel-core-i7-4960x-ivy-bridge-e-review ]. Its processing potential at factory speed (factory = non-turbo 3.6 GH / turbo 4 GHz) is greater than that of a single i7-990x (factory = non-turbo 3.46 GHz / turbo 3.73 GHz) or x5690 (factory = non-turbo 3.46 / turbo GHz 3.73). There are some single CPU Xeon options that you may wish to consider, such as the E5-1660 v2 and the E5-1680 v2, if error correction is critical, but those CPUs don't allow for much overclocking (about 3 or 4%) and are also subject to the 5 GT/s QPI/DMI limit. Like the i7-4960, you can overclock the QPI/DMI buss by only about 3 or 4%, depending on the particular chip and the motherboard's bios. Otherwise, for comparable processing speed (and with little overclocking potential), you can buy a higher priced E5 Xeon such as a Intel Xeon E5-2697 for >$2.5K. Ouch! It just depends on your budget, needs (such as how much of the matrix function will be done by the CPU vs. the GPU), potential for cost recovery, etc.

Teslas cost more than I want to pay. But I'm only rendering graphics, so the lack of error correction is not critical to my work.
 
Last edited:
the bad man always tries to save himself and you money, just not at the expense of quality.

I've been rethinking my approach to the next Hack, bearing in mind some of what the bad man just posted here about saving some money. Specifically: the Xeons and apropos dual-2011 motherboard vs a fast 6-core Core i7 with a single-2011 motherboard.

The primary work my Mac Pro does today is video rendering with Adobe's Premiere Pro. It's a dual 5690 system with a GTX570 card to handle CUDA. I've always thought Adobe's threading, at least with Premiere Pro, has been, shall we way: suboptimal. There appears to be a point of diminishing returns with their software of about 8 cores or so. Very rarely during processing does the software crank all 12 (24 virtual) cores in my box. Most times, the cores are sitting there twiddling their respective digital thumbs.

Folks in the Premiere Pro forums that are Windows-centric see much better overall performance from the software than us Mac guys do. Part of that is that Adobe focused their attention on the Win API with Premiere Pro, and then moved it to OS X. But I also think part of it is: folks in the Windows world with Core i7 chips can overclock the snot out of them. Then they can throw a beefy CUDA-capable card (or with Premiere Pro CC: several CUDA-capable cards) and let the machine chew threw rendering and exporting.

I'm babbling a bit. My thoughts are, perhaps: the quickest Core i7 I can get, and then bump its clock speed up to >4.5GHz. From there, add 2 or more GTX780s or even Titans. It probably won't win any Geekbench scores, but I suspect it'll rip through Premiere Pro a bit quicker than a massive 20-core IVB Xeon system will, just due to the threading.

I know you've played with Premiere Pro in the past, but I'm not sure how much. Thoughts on the matter? What I don't want to do is save some money now only to have to pay the piper later. But, I'd also like to invest my money carefully: if a 20-core IVB Xeon is going to sit mostly unused, then it makes little sense to throw $6600USD+ at it. I guess ultimately the question is: which sledge hammer should I use to smack the thumb tack known as Premiere Pro? Core count or clock cycles? I'm suspecting it's the latter, but I welcome your most valuable input, as always.
 
Let your software applications guide your build.

I've been rethinking my approach to the next Hack, bearing in mind some of what the bad man just posted here about saving some money. Specifically: the Xeons and apropos dual-2011 motherboard vs a fast 6-core Core i7 with a single-2011 motherboard.

The primary work my Mac Pro does today is video rendering with Adobe's Premiere Pro. It's a dual 5690 system with a GTX570 card to handle CUDA. I've always thought Adobe's threading, at least with Premiere Pro, has been, shall we way: suboptimal. There appears to be a point of diminishing returns with their software of about 8 cores or so. Very rarely during processing does the software crank all 12 (24 virtual) cores in my box. Most times, the cores are sitting there twiddling their respective digital thumbs.

Folks in the Premiere Pro forums that are Windows-centric see much better overall performance from the software than us Mac guys do. Part of that is that Adobe focused their attention on the Win API with Premiere Pro, and then moved it to OS X. But I also think part of it is: folks in the Windows world with Core i7 chips can overclock the snot out of them. Then they can throw a beefy CUDA-capable card (or with Premiere Pro CC: several CUDA-capable cards) and let the machine chew threw rendering and exporting.

I'm babbling a bit. My thoughts are, perhaps: the quickest Core i7 I can get, and then bump its clock speed up to >4.5GHz. From there, add 2 or more GTX780s or even Titans. It probably won't win any Geekbench scores, but I suspect it'll rip through Premiere Pro a bit quicker than a massive 20-core IVB Xeon system will, just due to the threading.

I know you've played with Premiere Pro in the past, but I'm not sure how much. Thoughts on the matter? What I don't want to do is save some money now only to have to pay the piper later. But, I'd also like to invest my money carefully: if a 20-core IVB Xeon is going to sit mostly unused, then it makes little sense to throw $6600USD+ at it. I guess ultimately the question is: which sledge hammer should I use to smack the thumb tack known as Premiere Pro? Core count or clock cycles? I'm suspecting it's the latter, but I welcome your most valuable input, as always.

The CUDA rig where I currently get the best performance/$ with video rendering is the one loaded with Windows 7, running Adobe 6 Suite. It has an overclocked i7-3930K (1 processor, 6 cores, 12 threads) on an MSI X79A-GD45 (MS-7735) motherboard, clocked by increasing the multi to run @ 4.09 GHz base, but with TB enabled to run at up to 4.6 GHz, with 16 GB DDR3 SDRAM, running at 1708 MHz. If I determine that my 3XGTX295/1.8Gs aren't up to the task, then I just borrow two of my Titans from my 8XTitan/6G rig (to replace the GTX295s) and tweak them in Nvidia Control Panel and with EVGA Precision X and I'm good to go. Unlike Octane Render, Adobe Premier Pro and AfterEffects do not currently scale well with more than a couple or three GTX GPUs. Often with more than two, the render times start to increase.

BTW - You can build such a system for about $1,750 w/o the GPUs. My Geekbench 2.0 performance on that system isn't that bad for a 6 core (28,195). But DJenkins points, below, are well worth consideration for future purchasing considerations.
 
Last edited:
I've been rethinking my approach to the next Hack...

What advantage will a single 6c i7 machine have over your current dual X5690 machine?

Remember with dual CPU setups there's usually access to more PCIe slots. Making greater future GPU upgrades possible.

In regards to use of CPU power, I had a discussion with PunkNugget a while back where I ran some tests but I was mostly comparing Premiere to Avid.

It depends on what tasks WITHIN Premiere you do most as to where you will see a great advantage in more CPU power. Some effects within the app have just not been updated to newer standards, whereas others absolutely thrive on as many cores as they can get.

So many aspects of rendering are still quite reliant on CPU power. Only a few effects e.g. colour correction and specifically CUDA targeted effects will use your GPU resources. It's a matter of plugin and app developers catching up to either the CPU core count and/or GPU driven methods.

Also, choosing your editing codec wisely can increase performance. E.g. DNxHD plays nicer with more cores than ProRes if I recall correctly. The newer codecs have been written to be multi core aware. This is during importing, rendering and exporting footage.

Why don't you go with a higher clock speed, slightly lower core count dual 8c (16 total) machine? The E5-2667 v2 may be a good fit?

Would you consider moving to Windows and working with DNxHD (or whatever the most efficient codec is)? If performance really is the goal then it's worth considering. I however am a difficult person and want a do-all machine that is a mash of everything in one box so I'm personally persevering with OSX for now ;)

----------

Just going to add a VERY VERY simplistic comparison:

1CPU 6c @ 4.5GHz = 27GHz total
2CPU 16c @ 3.3GHz (4GHz turbo) = 52.8GHz total

For the times when your app will use all available cores, the 16 core will FLOG the 6c at almost double the pace.

For the time when you're only using a few cores, the 6c will only be marginally faster, especially when the 16c boosts to 4GHz which is quite impressive for a high core count machine.
 
Unlike Octane Render, Adobe Premier Pro and AfterEffects do not currently scale well with more than a couple or three GTX GPUs. Often with more than two, the render times start to increase.

Have you validated this with Premier Pro CC? Folks on the Premiere forums are finding excellent success with throwing GPUs at it, and it literally scaling linearly with the hardware.

----------

What advantage will a single 6c i7 machine have over your current dual X5690 machine?

Power for more GPUs. Internal slots for more drives. Basically: everything a well-built PC can do that a 5,1 Mac Pro can't without adding external PSUs, etc.

Remember with dual CPU setups there's usually access to more PCIe slots. Making greater future GPU upgrades possible.

Very true. But again, I have to bear in mind the ol' diminishing returns.

So many aspects of rendering are still quite reliant on CPU power. Only a few effects e.g. colour correction and specifically CUDA targeted effects will use your GPU resources.

Sorta. Premiere will throw as much as it can to the GPU(s) when it comes time to export. Specially if you're doing any scaling (ie, taking 1080p down to 720p) or whatnot. Yes, certain effects are GPU-enhanced, and as it turns out: I use a few of them. Horizontal flip and cropping are two examples.

Also, choosing your editing codec wisely can increase performance. E.g. DNxHD plays nicer with more cores than ProRes if I recall correctly. The newer codecs have been written to be multi core aware. This is during importing, rendering and exporting footage.

And this is one thing where I steadfastly will not change. My cameras produce AVCHD footage. My gaming rig produces AVI footage. I will not change codecs (ie: transcode) before importing just to make things "work faster". The whole idea behind Premiere and its advantage over other NLEs: it works with almost all codecs natively.

Would you consider moving to Windows and working with DNxHD (or whatever the most efficient codec is)?

Never. For me: Windows has one use and one use only: games.

1CPU 6c @ 4.5GHz = 27GHz total
2CPU 16c @ 3.3GHz (4GHz turbo) = 52.8GHz total

I know this math has been used before, but it doesn't actually work that way if the application can't thread itself well. It'll spend more time context-switching than it will actually doing real work. I still believe Premiere is one such app. Photoshop is another. But I'd be happy to be proven wrong.
 
Only with my Titans, GTX580Cs and GTX680s on my Tyan Server, MSI and EVGA motherboards running AE CC trial version, but not with Premiere Pro CC.

If you're game at some point, install the PP CC trial, find and delete the CUDA text file, and see how it reacts to your collection of GPUs. I'd be interested in hearing your results, because I don't think anyone's had the opportunity to throw that many GPUs at it.

----------

Please bear in mind that all of my comments responsive to your most recent inquiry relate solely to my experiences under Windows 7 OS.

Yep, I always bear that in mind. We're trying to accomplish similar things, using slightly different toolsets. :)
 
Power for more GPUs. Internal slots for more drives. Basically: everything a well-built PC can do that a 5,1 Mac Pro can't without adding external PSUs, etc.

Ahh ok I thought you had an SR-2 hackintosh already, didn't realise the 5690's were in a 5,1 machine.

Changes a few things then I guess... e.g. I thought you had 7 PCIe slots to play with already :eek:

Well your choice should be made easier by your fixed constraints of OS, codec and cameras.

Can you confirm if your AVCHD footage makes use of all cores or not when carrying out simple tasks and exporting without effects etc?

If not then a highly clocked single CPU machine will be a much better option - the unused cores won't be sitting there wasted and the higher clock speed can be utilised to pack & unpack the AVCHD codec faster.

If it does use all cores, then what I was saying before (and by my dodgy GHz example) is that the advantage of more cores will clearly outweigh the drop in clock speed.

It can all come down to what you want out of the machine - if you're doing long form work and have 3 hr export times killing you, then a machine effectively firing on 16 cores may help best. If you're doing shorter projects and finding the AVCHD footage sluggish to work and interact with, your 6c build may be the answer :)
 
If you're game at some point, install the PP CC trial, find and delete the CUDA text file, and see how it reacts to your collection of GPUs. I'd be interested in hearing your results, because I don't think anyone's had the opportunity to throw that many GPUs at it.
I try to get that info. posted here by October 10th. Is there some specific test file that you'd like for me to run on my Tyan?
Yep, I always bear that in mind. We're trying to accomplish similar things, using slightly different toolsets. :)
That (Windows 7) applies only to my video and 3d work; for almost everything else, I'm all OSX.
 
I try to get that info. posted here by October 10th. Is there some specific test file that you'd like for me to run on my Tyan?

A well-known Premiere Pro benchmark package exists here. It says it's for CS6, but it'll run just fine with CC. It consists of a variety of tests with Premiere Pro, all run under the watchful eye of a VB script (bleah!) So, it can't easily be ported to OS X. If you look through the results page, you'll see they're all basically Windows machines.

It might be interesting to see how your rig(s) stack up...
 
Obviously, I find it hard to throw anything away if I can put it to any use.

Ahh ok I thought you had an SR-2 hackintosh already, didn't realise the 5690's were in a 5,1 machine.

I work with a vast mix of machines, some with Linux, some with OSX, some with various Windows OSes, and some with ancient OSes, with hardware ranging from 2007 and 2009 Mac Pros, and self builds with 4, 6, 8, 12, 16, and 32 cores, as well as Dell, HP and Aspen System workstations (Circa 1996 - 2006) and my trusty more ancients - Tandys, Apples, Ataris and Amigas (Circa 1984 - 1993?). The vast majority of my systems are overclocked, and even more so in the case of my ancient ones. So, in the future I'll try to be as specific as possible about the details of the platform that I'm referencing.

BTW - Has anyone seen my Tandy 1000 or 100 recently - I can't find them.
 
Last edited:
A well-known Premiere Pro benchmark package exists here. It says it's for CS6, but it'll run just fine with CC. It consists of a variety of tests with Premiere Pro, all run under the watchful eye of a VB script (bleah!) So, it can't easily be ported to OS X. If you look through the results page, you'll see they're all basically Windows machines.

It might be interesting to see how your rig(s) stack up...

Thanks Jason, I'll get it done. Is there a rig, other than the Tyan, that you'd like for me to test, bearing in mind that testing other rigs could take more time?
 
Thanks Jason, I'll get it done. Is there a rig, other than the Tyan, that you'd like for me to test, bearing in mind that testing other rigs could take more time?

Not particularly. I'm just mostly curious to see how your multi-GPU test turns out. If you're game and have time, pull the GPUs such that you start with 1. Do the tests, then add the GPUs 1 at a time, and repeat. See if you're times drop linearly as you add resources.
 
Not particularly. I'm just mostly curious to see how your multi-GPU test turns out. If you're game and have time, pull the GPUs such that you start with 1. Do the tests, then add the GPUs 1 at a time, and repeat. See if you're times drop linearly as you add resources.

OK, I'll do that with the Tyan, going from one, to two ..., to eight Titans.
 
Last edited:
hardware ranging from 2007 and 2009 Mac Pros, and self builds

Hi Tutor I was referring to jasonvp's comments here:

...The primary work my Mac Pro does today is video rendering with Adobe's Premiere Pro. It's a dual 5690 system with a GTX570 card to handle CUDA...

I always thought from previous posts on here that he had a hackintosh already... even though 10 words before saying "dual 5690 system" he clearly says "Mac Pro" :eek: oops!

I'm mostly aware of your array of machines... although it's hard to keep up sometimes :D
 
Have you ever misplaced your nose? I did, but now I've found it.

I. Was It was Just My Imagination Running Away With Me?

A. I've always considered Supermicro to consistently make some of the best designs, excellent components, and high production quality, sturdiest motherboards on the market. But given their customer base (mainly the enterprise market), their motherboards had seemed, to me, to lack pizzaz.

B. Imagine running stably two 12-core Ivy Bridge E5-2697 v2 at 1.0625 * 2.7 GHz = 2.87 GHz or better yet at 1.0755 * 2.7 GHz = 2.90 GHz (whereas they normally run at 2.7 GHz) and imagine running those CPUs so that their highest turbo boost level is 1.0625 * 3.5 GHz = 3.72 GHz or better yet 1.0755 * 3.5 GHz = 3.76 GHz (whereas they normally turbo boost up to 3.5 GHz) or

C. Imagine running stably two 8-core Ivy Bridge E5-2687W v2 at 1.0625 * 3.4 GHz = 3.61 GHz or better yet at 1.0755 * 3.4 GHz = 3.65 GHz (whereas they normally run at 3.4 GHz) and imagine running those CPUs so that their highest turbo boost level is 1.0625 * 4.0 GHz = 4.25 GHz or better yet 1.0755 * 4.0 GHz = 4.30 GHz (whereas they normally turbo boost up to 4.0 GHz) and

D. Imagine being able to run faster memory than standard and being able to tweak it to run even faster on a dual CPU Sandy/Ivy Bridge system.

II. The Problem:

Here's, in part, what I posted, above in post no. 17, on Mar 8, 2012.

SRX announced by EVGA. There's good news and there's bad news. The good news is that the SRX is an overclockable motherboard, which is soon to be released. However, the bad news is very bad - E5 8-cores have locked multipliers and locked - inaccessible - CPU straps, so until Intel releases fully unlocked E5's there'll be no overclocking beyond what could have been done to Sandy Bridge non-K chips in the past, i.e., overclock limited to about 6-8 percent. ... .

What Intel had done, besides locking the Sandy Bridge Xeons, was to put a whole host of functions, that had been separately configurable via some bioses on some Nehalem and Westmeres motherboards, under the control of DMI. So if you could overclock or underclock the Xeons, it also affected a whole host of things, such as your PCIe frequency, HHDs, SSDs, USB, Video, etc. That's not a pretty sight. On Intel's enthusiast CPUs, you could pay a toll by getting a CPU with a "K" in its model name and for that toll you got "straps" to hold everything else down so you could tweak the CPU and the memory a little more liberally. But Xeons couldn't be strapped down. Ask Sandy Bee Xeon out for a date and you have to take the whole family out too.

Well, some things do change. EVGA discontinued the SRX earlier this year because of poor sales, no doubt due to the limited clocktweaking allowed by Intel on the Sandy Bridge E5 Xeons. Not only did Intel's locking of the E5 Xeons make them as malleable as the Sandy Bridge non-K chips in the past, but very few of them could be overclocked beyond 6%. In fact, most of them could be overclocked only about 3 to 4%. So when EVGA discontinued the SRX (a dual CPU motherboard), that left this minuscule clocktweaking space to ASUS which, let's just say, I hate their purported sudo tweakable dual CPU Z9PE-D8 WS (I'm not alone - at Newegg, for example, this board's lack of quality led to it getting from 51% of its raters 1 or 2 eggs vs. only 39% giving it 4 or 5 eggs) . So I've labored under the impression that there was no reliable solution to get that extra 3 to 4% out of a Sandy Bridge dual CPU setup. But at some point in the past, I was laboring erroneously. Yes, the Bad Man was (or had become) WRONG (again). At many points in the past since June 2013, I failed to do my usual detective work when visiting Geekbench's browser.

III. The Solution:

When I was a 100% devotee of overclocking, I became familiar with a person who's nic is Movieman. His real name is Dave. On May 05, 2013, Movieman posted a Windows 64-bit Geekbench 2 score of 45,043 which was the highest 16-core score at that time. He tested a dual Intel Xeon system housing E5-2687Ws running on a Supermicro X9DAX motherboard. What I failed to note, until tonight, was that his Geekbench details indicate that the chips were running at 3.293 GHz. I have a pair of those chips and I know that at factory speed that they run at 3.1 GHz, but that just didn't catch my eye. What helped me to note that discrepancy were Movieman's more recent posts and in particular this one: Sep 16, 2013 - Supermicro X9DAX - Intel Xeon E5-2687W v2 - 3.656 GHz - 16 cores - Windows 64-bit score = 49,366, which is now the highest 16-core score [ http://browser.primatelabs.com/geekbench3/multicore and http://browser.primatelabs.com/geekbench2/top ]. I had just been viewing another thread where members where discussing what CPUs they expected in the 2013 Mac Pro lineup and comparing what was stated with CPU World's site data, so that 3.656 GHz figure just leaped out before my eyes because that figure should have been 3.4 GHz. So I asked myself, "Why were Movieman's scores based on speeds that exceed factory speed on a Supermicro motherboard?" Then I went to Supermicro's site and this is what I found:


"Hyper-Speed Solutions

Supermicro delivers the industry's fastest, most powerful server solutions with compute speed and reliability as the primary focus, targeting mission critical applications. Built upon the X9DAX series motherboard, Supermicro is able to enhance the highest performance Intel®Xeon® processors E5-2600/E5-2600 v2 family (up to 150W TDP) with Hyper-Speed, achieving application performance improvements up to 30%. Supermicro' Hyper-Speed Solutions are available in Tower, 4U and 2U rackmount configurations. Applications that benefit from the high-speed processing power of these solutions include HFT, Computational Finance, EDA, HPC, Scientific and Energy Research.

DP Hyper-Speed Motherboards:
(A) X9DAX-iF [base model includes Intel® i350 Dual Port Gigabit Ethernet - Virtual Machine Device Queues reduce I/O overhead - Supports 10BASE-T, 100BASE-TX, and 1000BASE-T, RJ45 output, 1x Realtek RTL8201N PHY (dedicated IPMI) and 2 IEEE 1394a Headers],
(B) X9DAX-iTF [just above base model - adds to base: (i) Serial Port / Header - 1 Fast UART 16550 Header and (ii) Intel® X540 Dual Port 10GBase-T - Virtual Machine Device Queues reduce I/O overhead -Supports 10GBase-T, 100BASE-TX, and 1000BASE-T, RJ45 output]],
(C) X9DAX-7F [just below highest priced model - adds to base: (i) Serial Port / Header -1 Fast UART 16550 Header, (ii) 4 more SATA 2.0 ports (3Gbps), and (iii) 8 SAS2 ports (6Gbps)] , and
(D) X9DAX-7TF [highest priced model - adds to base: (i) Serial Port / Header -1 Fast UART 16550 Header, (ii) 4 more SATA 2.0 ports (3Gbps), (iii) 8 SAS2 ports (6Gbps)] and (iv) Intel® X540 Dual Port 10GBase-T - Virtual Machine Device Queues reduce I/O overhead -Supports 10GBase-T, 100BASE-TX, and 1000BASE-T, RJ45 output]..." [ http://www.supermicro.com/products/nfo/Hyper-Speed.cfm ] The X9DAX was released March 8, 2013.

At this stage, I knew that I had a nose. Then, I downloaded the manual for Supermicro X9DAX, which states, in relevant part:

"4-4 Overclocking
Use this submenu to override selected CPU voltage settings. Warning: Overclocking may cause system instability and is not recommended by Supermicro for standard use of the product.

CPU Voltage
Use this feature to override the CPU Voltage settings specified by the manufacturer. The VID+Offset options range from 0m Volts to 500m Volts.

CPU BCLK
Use this feature to override the CPU BCLK (Base Clock) settings specified by the manufacturer. The options are Auto, BCLK 101, BCLK102, BCLK 104, BCLK105, and BCLK 106.

System Recovery Mode
Use this feature to select the recovery mode setting. When Auto is selected, the BCLK and memory speed will return to default settings after a system reset by the Watchdog Timer. When Disabled is selected, BIOS settings will not change on a system reset by the Watchdog Timer. When Load Defaults is selected, all settings on the Overclocking submenu will return to default values after a system reset by the Watchdog Timer. The options are Auto, Disabled, and Load Defaults.
4-24

Memory Voltage
Use this feature to override the Memory Voltage settings specified by the manufacturer. The VID+Offset options range from 0m Volts to 310m Volts.

DDR Speed
Use this feature to force a DDR3 memory module to run at a frequency other than what is specified by the manufacturer. The options are Auto, Force DDR3-800, Force DDR3-1066, Force DDR3-1333, Force DDR3-1600, Force DDR3 1866, and Force SPD.

tCL, tRCD, tRP
Use the above items to set the tCL (Cas Latency), tRCD (Row to Col Delay), and Ras Precharge values. The options are Auto, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, and 18.

tRAS
Use this item to set the Ras Active Time value. The options are Auto, 12, 13, 14, 15, 16, 17, and 18.

tRRD
Use this item to set the minimum tRDD (Row Active to Row Active Delay) value. The options are Auto, 19, 23, 27, 32, 36, and 41.

tRRD
Use this item to set the minimum tRDD (Row Active to Row Active Delay) value. The options are Auto, 4, 5, 6, 7, and 8.

tWR
Use this item to set the minimum tWR (Write Recovery) time. The options are Auto, 3, 4, 5, 6, 7, and 8.

tRTP
Use this item to set the minimum internal tRTP (Read to Precharge) command delay time. The options are Auto, 3, 4, 5, 6, 7 , and 8.
Chapter 4: AMI BIOS
4-25"

After reading this, the the solution to my temptations became as clear to me as the nose on my face is now. It was not my imagination. Some things do change even more than I would ever guess: Supermicro makes a motherboard that allows you to tweak the CPU (albeit minimally) and the memory on Sandy and Ivy Bridge Xeon systems.

This power and stability does come at a price (to begin with, the motherboards cost between $519 to $872). See, e.g., [ http://www.costcentral.com/proddetail/SUPERMICRO_X9DAX_7F/MBDX9DAX7FO/11854973/ ], [ http://www.costcentral.com/proddetail/SUPERMICRO_X9DAX_7TF/MBDX9DAX7TFO/11854972/ ], [ http://www.costcentral.com/proddetail/SUPERMICRO_X9DAX_iTF/MBDX9DAXITFO/11847193/ ] and [ For best pic see first following URL http://shopcomputech.com/system-com....html?___store=english&___from_store=canadian or http://www.costcentral.com/proddetail/SUPERMICRO_X9DAX_iF/MBDX9DAXIFO/11726442/ or http://www.acmemicro.com/Product/11...DDR3-SATA3-RAID-GbE-HD-Audio-PCIe-eATX-Retail or http://www.compsource.com/pn/MBDX9D...9daxIf-Motherboard-EAtx-C602-2011-512gb-Ddr3/ or http://www.pcsuperstore.com/products/11726442-SuperMicro-MBDX9DAXIFO.html or for best price - $519, including shipping, for base model as of 9/28/13[/B] http://www.xpcgear.com/supermicro-mbd-x9dax-if-o.html ].

This is the chassis that you'll need [ http://www.provantage.com/supermicro-cse747tgr1400bsq~7SUPM33Q.htm ]. It costs $774 at Provantage, plus about $50 for shipping - depending on your location ] So, the total cost for the case w/ dual 1400W 2 Gold Level Redundant Power Supplies and the base motherboard is about $1350, plus you'll need to supply
(1) CPU heat sinks w/ fans [est = $72 for 2 - http://www.wiredzone.com/Supermicro...e-CPU-Heat-Sink-for-X9-UP---DP~10021908~0.htm ],
(2) 128 gig of ram [est = $1800 for 2x CORSAIR Dominator Platinum 64GB (8 x 8GB) 240-Pin DDR3 SDRAM DDR3 2133 Desktop Memory Kits - http://www.newegg.com/Product/Product.aspx?Item=N82E16820233363 ],
(3) storage [est = $400 for two Mushkin Enhanced Atlas Series MKNSSDAT240GB-DX mSATA 240GB SATA III MLC Internal Solid State Drives (SSD) http://www.newegg.com/Product/Product.aspx?Item=N82E16820226321,
and
est = $400 for a couple of Western Digital WD SE WD3000F9YZ 3TB 7200 RPM 64MB Cache SATA 6.0Gb/s 3.5" Internal Hard Drive - http://www.newegg.com/Product/Product.aspx?Item=N82E16822236521 ],
(4) video [est = $1000 for a EVGA 06G-P4-2790-KR GeForce GTX TITAN 6GB 384-bit GDDR5 SLI Support Video Card - http://www.newegg.com/Product/Product.aspx?Item=N82E16814130897 ],
(5) two of those pesky CPUs [est = $4400 for 2 Intel Xeon E5-2687W v2 Ivy Bridge-EP 3.4GHz 20MB L3 Cache LGA 2011 150W 8-Core Server Processors - http://www.newegg.com/Product/Product.aspx?Item=N82E16819116937 and
(6) Then add another $200 for your OS(es) of choice.
EST TOTAL = ~ $9,700 and that includes some extra for shipping.

What did you say? I heard, "Will the Bad Man never stop trying to spend my money?" Who'd want to drop $9,700 on a machine made mostly by Supermicro, that gets lousy Geekbench 2.0 and 3.0 scores in excess of 48,000 points and attains Cinebench 11.5 scores in excess of 29 points w/2 fast tweaked E5-2687Ws V2s running at about 3.9 GHz on all 24 cores, that has 128 gigs of fast tweaked ram, that has a GTX Titan for CUDA chores, that has 480 Gigs of fast SSD raid 0 storage and 6 T of fast raid 0 HHD storage and room for even more; leaving you with, from the above configuration, a single empty x16 full length PCIe slot, an empty x8 half length PCIe slot, and an empty 1x UIO PCI-E 3.0 x8 slot (for SMC UIO HBA) or 1x PCI-E 2.0 x4 (in x8) slot for audio/video, etc. assistive cards - All IN ONE CASE? If there is just one of you whom these post help in any way, then I'm satisfied. And, of course, you're welcomed to shave these recommendations where they have whiskers or to trim the perceived fat from them to stay on your $$ diet or if you're into big systembuilding and the recommendations appear to be skinny in places, just pack more muscle on the system where you need to.

If I've recommended to you a 2 CPU solution in the past and you haven't pulled the trigger, you may need to give these Supermicro motherboards consideration. If I haven't recommended to you a 2 CPU solution in the past, then you may want to re-visit this post later.


PS - (1) Dave says in his Sandy Bridge thread about his SM X9DAX-iF motherboard that he got a 6% CPU overclock easily and stable. http://www.xtremesystems.org/forums/showthread.php?286416-Supermicro-X9DAX-iF..Almost-perfection!
(2) Here's what Dave says in his Ivy Bridge thread about his SM X9DAX-iF motherboard: "Solid as a rock with the IB xeons at up to an indicated 107.55 BCLK and that is 3871MHz on all cores with the E5-2687W V2's" [ http://www.xtremesystems.org/forums...iF-and-Ivy-Bridge-xeons&p=5206937#post5206937 ].
(3) SM X9DAX will hold only one double wide x16 PCIe card (and up to 3 single wide half length x8 cards and a single width x16 card if you put your double wide card in the outer x16 PCIe slot).
 
Last edited:
EVGA X79 DARK is not a Mac friendly board. Had many issues under Mac when testing. EVGA SR-X works great and is the most stable dual socket LGA 2011 motherboard that I have tested to date. To bad they stopped making it...

Ivy Bridge E chips require a modded kernel as Apple is only using internal builds of 10.8.5 and 10.9 DP for testing the new Mac Pro. Still can't get the kernel to work under 10.9 DP. The performance of the Intel Core i7-4930K @4.2 GHz beats out every Geek Bench of the new Mac Pro released to date. 24500-25000 points. PM does not work OOB but with a patched AICPM.kext it is working fine.

Asus Rampage 4 Extreme and Gigabyte UP4 motherboards are my favorite boards at this time for mac. However all Asus, EVGA, Gigabyte, AsRock, Intel and others work great.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.