All We Know About Maximizing CPU Related Performance

Tutor · May 3, 2013

Tesselator said:
... . And not really that expensive either... Did I notice the price on one of those as under $2K?

Tyan makes the system and the other two are resellers. The lowest price that I've seen for that system is $4,700, though while not cheap, is still not a bad price for the expansion allowed even once you add the prices of the HDs, ram and CPUs. This is especially so when you consider what Cubix and Magma charge for their always headless 8 slotted double wide chassis: http://www.bhphotovideo.com/c/product/789309-REG/Cubix_XPRM_X16_82_OSV_2x_Compute_Segments_4x.html (Price: $6,974.07); http://www.bhphotovideo.com/c/search?sts=ma&Ns=p_PRICE_2|1&N=0&srtclk=sort&Ntt=magma (Price: $8,274.95). I'd rather speed my money for a full computer that costs less and still gives me the same no. of slots and even heftier and more PSUs. I like my Macs, but not so much that it blinds me to pay through the nose just to avoid running another OS.

Tesselator said:
Man... those would be a kick to hackintosh up! ... .

While Sandy Bridges are faster than Westmeres, as for Hackintoshes the Westmeres still rule because one can get better performance from them because native power management works on them, but no one has been able to get it to work satisfactory YET with Sandy Bridge CPUs. That means that the Hackintoshes that use Sandy Bridge CPUs currently don't turbo boost correctly and thus are slower than their Westmere counterparts. If anyone is interested in Hackintoshing an 8 PCI-e slotted/double wide server, I'll look for one based on Westmere CPUs. However, whether based on Westmeres or Sandy Bridges and whether running their intended OS or as Hacintoshes, they're still faster than all of Apple's current offerings.

Rampage Dev · May 4, 2013

Tutor said:
Tyan makes the system and the other two are resellers. The lowest price that I've seen for that system is $4,700, though while not cheap, is still not a bad price for the expansion allowed even once you add the prices of the HDs, ram and CPUs. This is especially so when you consider what Cubix and Magma charge for their always headless 8 slotted double wide chassis: http://www.bhphotovideo.com/c/product/789309-REG/Cubix_XPRM_X16_82_OSV_2x_Compute_Segments_4x.html (Price: $6,974.07); http://www.bhphotovideo.com/c/search?sts=ma&Ns=p_PRICE_2|1&N=0&srtclk=sort&Ntt=magma (Price: $8,274.95). I'd rather speed my money for a full computer that costs less and still gives me the same no. of slots and even heftier and more PSUs. I like my Macs, but not so much that it blinds me to pay through the nose just to avoid running another OS.

While Sandy Bridges are faster than Westmeres, as for Hackintoshes the Westmeres still rule because one can get better performance from them because native power management works on them, but no one has been able to get it to work satisfactory YET with Sandy Bridge CPUs. That means that the Hackintoshes that use Sandy Bridge CPUs currently don't turbo boost correctly and thus are slower than their Westmere counterparts. If anyone is interested in Hackintoshing an 8 PCI-e slotted/double wide server, I'll look for one based on Westmere CPUs. However, whether based on Westmeres or Sandy Bridges and whether running their intended OS or as Hacintoshes, they're still faster than all of Apple's current offerings.

I have helped create several Sandy Bridge E Xeon Dual CPU based systems and they work fine. However due to the lack of OC ability the Westmeres can be OC unlike the newest chips giving them the edge.

Tutor · May 4, 2013

Help has arrived thanks to Rampage Dev.

Rampage Dev said:
I have helped create several Sandy Bridge E Xeon Dual CPU based systems and they work fine. However due to the lack of OC ability the Westmeres can be OC unlike the newest chips giving them the edge.

Great! Then, you're the one who I need to consult.

Tutor · May 10, 2013

Keplers are now the recommended cards for Octane render.

It now seems that Otoy has rewritten Octane renderer to take better advantage of the higher single precision floating point peak performance of the Kepler cards, for here is what the user manual now says: "We recommend to use GPUs based on the Kepler architecture as these cards have more memory and consume less power than Fermi GPUs, but are just as fast with OctaneRender™." (Compare to post # 512, above.) So the scramble to find top-end Fermi cards for Octane should now end. So those who own GTX 600 series cards and want to use Octane can now rejoice.

Tutor · May 15, 2013

Consolidation is in order.

Here are my current thoughts on reconsolidating my GPGPU compute render farm (see sig. below for current structure).

I. CUDA/Nvidia OCL

a. Pairing Group 1
{(1) 3XGTX580/3G - EVGA SR-2 w/dual x5680s (aka "WolfPack1" GB2 = 40,100);

(2) 3XGTX580/3G - EVGA SR-2 w/dual x5680s (aka "WolfPack2" GB2 = 40,051)};

b. Pairing Group 2
{(3) 4xGTX480/1.5G- - Supermicro 4 double wide PCI-e System w/5th stump card slot w/dual x2680s (aka "WolfPack3" - under construction);

(4) 3xGTX680/4G - GA-7PESH3 w/dual x2687Ws (aka "WolfPack4" - under construction)};

c. Pairing Group 3
{(5) 6XTitan/6G+1xGTX690/4G - Tyan Server w/dual x5580s/x5675s or x2680s, depending on Tyan model chosen (Maybe I'll name it, "AlphaCanisLupus0"); and

(6) 3xGTXTitans - GA-7PESH3 w/dual x2680s (aka "WolfPack5" - under construction)}.

----------------------------------------------------------------------

(7) 1xGTX680/4G - MacPro 2009->2010->2012 2x5675s;

(8) 3xGTX295/1.8G - Gigabyte UD5 i7-980x (aka "CubPack1" GB2 = 21,824).

----------------------------------------------------------------------
II. ATI OCL

(1) 2xATI5970/2G - Supermicro System-8047R-7RFT 4x4650s (aka "WolfPackPrime0" GB2 = 58,027) and

(2) 1xATI5970/2G - Gigabyte EX58-UD5 i7-980X (aka "CubPack2" GB2 = 21,000+).

{} represents pairing for Infiniband cards.

Early this summer, I intend to perform the personality makeover discussed in posts nos. 609 and 615, above, changing the identities of all of the cards in the second half of each pairing group, as well as that of at least one card in the first half of each pairing group to its Tesla counterpart. Then I can take advantage of the RDMA for GPU Direct feature of CUDA. By having an empty slot in each system, I can insert an Infiniband card in each empty slot. Then theoretically, all twenty-three of the Nvidia cards in systems [(I)1 through (I)6, will behave as one system when performing CUDA renders if I connect them all by Infiniband by way an extremely expensive Infiniband switch box (and do the makeover to all of the GPU cards in the slave render boxes), or maybe, I'll just connect those six systems into 3 separate pairs, as shown above, to avoid the immediate cost of an Infiniband switch box. Just two systems can be connected by Infiniband without having to have a switch. It's hard to imagine how much CUDA muster 9 Titans [alike Tesla K20s and K20Xs] plus a GTX690 [alike a Tesla K10] {or 6 GTX580s [alike Tesla M2090s]} {or 4 GTX480s [alike Tesla C2050s, C2070s, and C2075s] and 3 GTX680s [alike Tesla K10s]} can yield by just pairing, not to mention having them all connected by the same Infiniband switch box. Thus, that would be the equivalent of having 9 - Tesla K20Xs overclocked, 4 - Tesla K10s overclocked (but 3 with only half the Tesla standard no. of cores), 6 - Tesla M2090s overclocked, and 4 - Tesla C2075s overclocked, with all, except in the case of the Titans, minus some memory, and none having EEC memory advantages (error correction) and disadvantages (lesser speed).

Tutor · May 27, 2013

Stage One Consolidation - WolfPackAlphaCanisLupus0

Here are pics of Tyan-based WolfPackAlphaCanisLupus0. This WolfPack has 2 Westmere Xeon X5675s, each running at 3.07 GHz. It will have 72 Gigs of ram. It has two hard drives: a 4 Terabyte Barracuda and a 3 Terabyte Barracuda. It will have multiple Oses.

Tutor · May 27, 2013

More on WolfPackAlphaCanisLupus0

Here are some measures of AlphaCanisLupus0. According to the CUDA-Z utility, AlphaCanisLupus0 has over 16 Tflop/s [2078 x 8 = 16,624] of double precision peak floating point performance. Since each Titan, as I've clock-tweaked them, has over 4.5 Tflop/s of single precision peak floating point performance, namely 6.26 Tflops, AlphaCanisLupus0 has over 50 Tflops/s [ 4.5 x 1.39 overclock = 6.26; 6.26 x 8 = 50.1] of single precision peak floating point performance.

I'm in the process of running ATI-centric OCL tests here: http://www.luxrender.net/luxmark/. Next, I'll run CUDA tests.

Update - see one CUDA test at post no. 13 here - https://forums.macrumors.com/showthread.php?p=17321143#post17321143 .

5050 · May 27, 2013

Tutor said:
Here are pics of Tyan-based WolfPackAlphaCanisLupus0. This WolfPack has 2 Westmere Xeon X5675s, each running at 3.07 GHz. It will have 72 Gigs of ram. It has two hard drives: a 4 Terabyte Barracuda and a 3 Terabyte Barracuda. It will have multiple Oses.

Outstanding. This setup is beyond words.

Tutor · May 27, 2013

5050 said:
Outstanding. This setup is beyond words.

Thanks for the compliment.

Topper · May 27, 2013

Tutor said:
http://www.luxrender.net/luxmark/. Next, I'll run CUDA tests.

Wow, you are moving on up there, congrats!
I can say I knew you when.

Maybe I could make the "Overall Others." There is nobody there.

Tutor · May 27, 2013

Topper said:
Wow, you are moving on up there, congrats!
I can say I knew you when.

Maybe I could make the "Overall Others." There is nobody there.

Thanks. I don't put to much stock in the score differences between Nvidia and ATIs cards in Luxmark, since I've been told that the metrics were coded for ATIs. However, I did run the tests for 4, 5, 6, 7 and 8 Titan cards to get a better assessment of relative OCL linearity on Titans.

Tesselator · May 27, 2013

Holy smokes...

Tutor · May 27, 2013

Tesselator said:
Holy smokes...

Image

This one shouldn't cause any smoke. It supplies 3000 watts through it's three 1200w PSUs on 120V lines, each connected to an outlet on three separate circuit breakers.

I'm thinking about painting that grill silver to match the rest of the case and to further enhance the hackintosh motiff. You can purchase the bare bones system here: http://www.newegg.com/Product/Product.aspx?Item=N82E16816152125 . It was a whole lot less expensive than going the underpowered headless 8 PCIe external chassis route [ http://www.bhphotovideo.com/c/searc...&N=0&InitialSearch=yes&sts=ma&Top+Nav-Search= ], even when adding the cost of getting it configured as a whole additional system.

I included that Unigine benchmark especially because of your thread on bogus benchmarks.

Tutor · May 27, 2013

WolfPackAlphaCanisLupus0 - Xeon X5675s performance

Although not a primary motivating factor in my building WolfPackAlphaCanisLupus0, it's CPUs, tho' no match for WolfPackPrime0's Cinebench score of 48.5 nor WolfPack1's and WolfPack2's Cinebench scores of 24.7+, perform as I expected and make a nice addition to the CPU aspect of my render farm.

GermanyChris · May 27, 2013

Those titans cost more than my Jeep

Good Job!

DJenkins · May 30, 2013

This has got to be the most powerful GPU rendering solution ever seen in a single box... by a huge margin I'm sure!

Fantastic work Tutor, looks like it will be really nice and tidy in that case as well.

How about the cooling? Those cards would generate an enormous amount of heat, and most people with Titans would be reaching straight for a watercooled solution...

Tutor · May 31, 2013

DJenkins said:
This has got to be the most powerful GPU rendering solution ever seen in a single box... by a huge margin I'm sure!

Fantastic work Tutor, looks like it will be really nice and tidy in that case as well.

How about the cooling? Those cards would generate an enormous amount of heat, and most people with Titans would be reaching straight for a watercooled solution...

Those 3 fans shown in the pics rapidly push the heat far out of the case and keep the cards at between 55 to 60 degrees C when overclocked about 1.4x and constantly rendering, but the eight cards don't generate as much heat as I thought they would and 60 degrees C is quite cool considering the loads.

Tutor · Jun 7, 2013

GTX-480, GTX-680 and GTX-Titan Performance on Octane Demo

Here's how four of each of the following three GPUs (all untweaked) performed, using the PMT setting, on the OctaneBenchmark Scene (see, e.g., pic below) in the OctaneRender_1_0_DemoSuite, running OctaneRender Demo 1.0 [ http://render.otoy.com/downloads.php ] :
1) four Galaxy GTX-680s (54 sec),
2) four EVGA GTX-480s (53 sec), and
3) four EVGA GTX-Titans SC (24 sec).

Tutor · Jun 8, 2013

4 GTX680s vs 2 Xeon 5680s Performances in Blender Cycles

Here's how four tweaked GTX-680s vs 2 tweaked Xeon 5680s alone performed on BlenderProjects_demos261-1_Cycles-Testfiles_BMX-MikePan.blend (see pics below) [ http://www.blender.org/development/release-logs/blender-261/blender-261-demo-files/ ], using Blender 2.67 :
1) two Xeon 5680s alone (15:19.19 min) and
2) with four Galaxy GTX-680s (3:11.63 min).
With 4 GTX-680s, the system rendered the scene in almost 1/5th of the time relying only on its 2 Xeon 5680 CPUs. Each GTX-680 appears to have cut the render time by almost 3 minutes on average.
The GTX-680s were overclocked about 5% each. The X5680s were tweaked to yield, together, Geekbench 2 scores of 35,000+ and Cinebench 11.5 scores of 22.00+ by underclocking and turbo biasing (Turbo ratio was 13,13,13,13,14,14 and idle clock was 2.21 GHz and max turbo was 4.6 GHz) for each CPU.

Topper · Jun 8, 2013

Tutor said:
Here's how four tweaked GTX-680s vs 2 tweaked Xeon 5680s alone performed on BlenderProjects

You've certainly lost me here.
How do you get four gpus versus two cpus?
Why did you use the 680s instead of the Titans?
Where do you live so I can borrow your equipment?

Tutor · Jun 8, 2013

Some of the "Whys and Hows" of it all.

Topper said:
... .
How do you get four gpus versus two cpus?

By rendering the project in Blender - first with all GPUs deselected in Blender as compute engines; so the 2 CPUs carry the full load; then next by rendering the project in Blender with all four GTX-680s designated as the compute engines.

Topper said:
Why did you use the 680s instead of the Titans?

Like mountain climbers often say when asked, "Why'd you do it?" I'd have to respond because it was there - I used 680s because that is what this system contains. As a more profound afterthought I might have responded: "Since more users have 680s than Titans, I decided to test the 680s first and the Titans last." But that wasn't the case.

Topper said:
Where do you live so I can borrow your equipment?

Location: Within 5 miles of my hometown in Birmingham - the most highly populated SMSA in the Heart of Dixie (Alabama), between the Honda (Lincoln, Alabama) and Toyota (Huntsville, Alabama - also home of Redstone Arsenal military facilities) manufacturing plants to the North and the Mercedes Benz (Tuscaloosa, Alabama) and Hyundai Motor (Montgomery, Alabama) manufacturing plants to the South. Alabama is where eagles wage war (Auburn University's War Eagles: 2011 BCS National Championship Winner) and tides roll red (University of Alabama's Crimson Tide: BCS National Championship Winner 2010, 2012 and 2013). Auburn University is also the alma mater (1982) of Apple CEO Tim Cook, born in Robertsdale, Alabama in 11/01/1960.

Topper · Jun 8, 2013

Tutor said:
Location: Within 5 miles of my hometown in Birmingham - the most highly populated SMSA in the Heart of Dixie (Alabama), between the Honda (Lincoln, Alabama) and Toyota (Huntsville, Alabama - also home of Redstone Arsenal military facilities) manufacturing plants to the North and the Mercedes Benz (Tuscaloosa, Alabama) and Hyundai Motor (Montgomery, Alabama) manufacturing plants to the South. Alabama is where eagles wage war (Auburn University's War Eagles: 2011 BCS National Championship Winner) and tides roll red (University of Alabama's Crimson Tide: BCS National Championship Winner 2010, 2012 and 2013). Auburn University is also the alma mater (1982) of Apple CEO Tim Cook, born in Robertsdale, Alabama in 11/01/1960.

I'm kinda sorry I asked.

p.s. I think I'd keep the Tim Cook part a secret.

Tutor · Jun 8, 2013

Topper said:
... .
p.s. I think I'd keep the Tim Cook part a secret.

Too late. The cat has pawed his way out of the bag; so let's give the cat a little more time to scratch his way up or down into his_story.

ticotoo · Jun 9, 2013

Forgot NASA

And just south of HSV, MSFC home of NASA's rocket center. Where we "over clock" shuttle main engines rocket engines to 108%

.

Tutor · Jun 9, 2013

ticotoo said:
And just south of HSV, MSFC home of NASA's rocket center. Where we "over clock" shuttle main engines rocket engines to 108% .

Ticotoo,
Thanks for pointing out that very important tenant. Nope, I didn't forget NASA, I just didn't point out all of Redstone Arsenal's tenants. Redstone Arsenal is a garrison for a number of tenants including the United States Army Materiel Command, Army's Aviation and Missile Command, the Missile Defense Agency of the Department of Defense, and NASA's Marshall Space Flight Center.

All We Know About Maximizing CPU Related Performance

macrumors 65816

macrumors member

macrumors 65816

macrumors 65816

macrumors 65816

macrumors 65816

Attachments

macrumors 65816

Attachments

macrumors regular

macrumors 65816

macrumors 65816

macrumors 65816

macrumors 601

macrumors 65816

macrumors 65816

Attachments

macrumors 601

macrumors 6502

macrumors 65816

macrumors 65816

Attachments

macrumors 65816

Attachments

macrumors 65816

macrumors 65816

macrumors 65816

macrumors 65816

macrumors member

macrumors 65816

Our Staff