Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Hey PunkNugs, what codec are you rendering to and what codec is your source media? Final Cut has always had a major preference for ProRes for obvious reasons. If you are using h.264 native Canon 5D material or XDCAM long GOP material maybe there's a bottleneck with the codec?

Also I don't think Final Cut 7 was ever 64bit, not sure what sort of impact this has though.

I have a bunch of different media on my machine at the moment, I might do some tests and see how the different codecs behave.

Hey D,

I just PM'd you...
 
Ok I've done a bit of testing and research.

Final Cut Pro 7 was never built to fully utilise multiple cores and was a big sore point for the FCP community.
This is why I think they had to completely rebuild the app for Final Cut X. It seems the FCP 7 32bit limitation also limited the amount of RAM available to 4GB.

Now, my second finding, through practical tests is that use of multi-core rendering is very specific to the specific task within the app, not the app itself overall.

Rendering 5D h.264 media in a ProRes 422HQ sequence only saw FCP use 270-300% in the activity monitor.

GoPro AVC media in the same sequence was far less consistent, using anywhere between 150-290% and jumped around greatly.

Use of non-realtime effects like the Radial Blur (on ProRes media in a matched ProRes timeline) consitently stayed at about 230%.

Note that I barely saw the total user% of CPU power break 10%... :(

I think the processor use varies depending on the effect used, I'm not going to test all of them though!

I did enable Apple Qmaster, which is designed for multiprocessor distribution in Compressor. I did see process instances appear in the Activity Monitor, however they didn't seem to be doing any work for Final Cut. I think it's built for Compressor only.

Now just for comparison I did some tests in Avid Media Composer 6.

Importing 5D media to DNX HD 185 MXF format still only used about 130%.
Radial Blur only used 120%
3D Ball used ~300%

HOWEVER
Scaling a clip with 3D Warp - 1700%+
Rendering a colour correction with the hue wheels - 1700%+
Safe colour limiter - 600-1300%
These last three tests were using up to 80% of the total user CPU%.

Now when you stack a Radial Blur on top of a Colour correction and Safe Colour Limit effect, it processes the entire clip at the speed of the slowest effect. :(

So now we can see, even in an app which is designed for multi-processors, there are still limitations depending on the tasks!

----------

Hey D,

I just PM'd you...

Hey sorry, was busy writing that reply and didn't see your PM!
 
Last edited:
Hey DJenkins,

I didn't want to go into some kind of tangent with this (new) topic, when the main focus really should be about what Tutor intended it to be for on CPU performance. What I did was compose some of our quotes and posted a NEW thread here:

https://forums.macrumors.com/showthread.php?p=15986278#post15986278

Please reply there as I still would like to hear a solution to what you saying. Thanks again for taking to the time to help... :cool:
 
WolfPackPrime0 Update

WolfPackPrime0 renders very, very fast under Windows. Moreover, I just rendered 72 frames of an animation (heavy with lighting and atmospheric effects) at film res (4096 x 3112) in 2:40 minutes.
 

Attachments

  • CB115_41.15.jpg
    CB115_41.15.jpg
    2.2 MB · Views: 130
Last edited:
Is that a special file you used in cine bench?
Not this time; but I have at other times used my own project files in Cinebench just to test out tweaks.
If so, can you share that with others to compare?
If you're interested, I'll explain in a subsequent post how to use your own project files inside of Cinebench.
BTW, any luck with super micro boards yet?
Right now I'm just focusing on how to get the best performance from my Sandy Bridge i7-3930k, E5-2687W and E5-4650 systems by just using the standard bios settings under different Oses. After I've gotten the best performance under each OS - Linux, Windows and OSX - using the standard bios settings, then I focus on the more exotic tweaks.
 
WolfPackPrime0 renders very, very fast under Windows. Moreover, I just rendered 72 frames of an animation (heavy with lighting and atmospheric effects) at film res (4096 x 3112) in 2:40 minutes.

Good for you Tutor, keep crankin' it !!! BTW, as you probably know, seems like I found a great solution to my video editing issue. Check out the link:

https://forums.macrumors.com/threads/1463796/

Later... :cool:
 
Not this time; but I have at other times used my own project files in Cinebench just to test out tweaks. If you're interested, I'll explain in a subsequent post how to use your own project files inside of Cinebench. Right now I'm just focusing on how to get the best performance from my Sandy Bridge i7-3930k, E5-2687W and E5-4650 systems by just using the standard bios settings under different Oses. After I've gotten the best performance under each OS - Linux, Windows and OSX - using the standard bios settings, then I focus on the more exotic tweaks.

Was just figuring if we could all get the same file for benchmark comparison to your monster build. Might just find a used SR2 board thats known to work for a hackintoch and sell the supermicro board. Its a solid board but sweaking isn't its best feature. Thanks.
 
Was just figuring if we could all get the same file for benchmark comparison to your monster build. Might just find a used SR2 board thats known to work for a hackintoch and sell the supermicro board. Its a solid board but sweaking isn't its best feature. Thanks.

Unless one modifies the Super Micro mobo (a little to drastic for the 99+%) or finds a software solution, bios mods are all there are.
 
Last edited:
WolfPackPrime0 - After a little bit more bios tweaking.

WolfPackPrime0 renders very, very fast under Windows. Moreover, I just rendered 72 frames of an animation (heavy with lighting and atmospheric effects) at film res (4096 x 3112) in 2:40 minutes.
After a little bit more bios tweaking, got Cinebench 11.5 scores above 42.5. Example shown below.
 

Attachments

  • CB115_42.80.jpg
    CB115_42.80.jpg
    2.1 MB · Views: 154
Last edited:
SR-2 X5680's

Hi all, read through all this thread and several others involving the SR-2 and overclocking/underclocking.

Tried a few overclocks and underclocks but I'm finding the temps are shooting up pretty quickly using underclocking instead of overclocking, is this something that you would expect?

Thanks for all the info it's been great reading through it all, thoroughly enjoyed it!!!
 
Hi all, read through all this thread and several others involving the SR-2 and overclocking/underclocking.

Tried a few overclocks and underclocks but I'm finding the temps are shooting up pretty quickly using underclocking instead of overclocking, is this something that you would expect?

Thanks for all the info it's been great reading through it all, thoroughly enjoyed it!!!

My temps shoot up from low 30s/high 20s (at idle) (climate dependent) to low to mid 60s/high 50's (when doing 3d/video rendering) (also climate dependent). What temps are you seeing?
 
Hi all, read through all this thread and several others involving the SR-2 and overclocking/underclocking.

Tried a few overclocks and underclocks but I'm finding the temps are shooting up pretty quickly using underclocking instead of overclocking, is this something that you would expect?

Thanks for all the info it's been great reading through it all, thoroughly enjoyed it!!!

My temps (when my system is not being stressed) at around 27C. But under full load it's right around 57C in a 79F room. That being said, the way UC'ing works (as you already know by now) puts the workload on the CPUs ONLY WHEN NEEDED. That's why I choose it over OC'ing. I mean you can OC; heck I have the MacHakPro1 that's OC'd to the max (Xeon W3680 @ 4.2GHz) and watercooled with Corsair's H100 unit and it's top temp is 32C at idle and full load 57C. Again, you can do it either way, but if you're looking to eek out the most out of your system UC'ing is really going to push your existing system further than OC'ing can't.

Perfect example is BrainDeadFool has his OC'd and his max GB score is 36,500; Tutor's UC'd and his max GB is 40,000. So there you go. Also their temps are about the same on idle and under load, but you see what UC'ing gets you. Even my machine got a max GB score just under 37,000. I probably (with more tweaking) could've got about 39,000 to 40,000+, but I don't need to prove that since Tutor's already made that happen. Hope this info helps. Later... :cool:
 
Last edited:
Hey guys, my SR-2 has been running nice a solid for a couple months.
As you may remember I had a whole range of raid and firewire issues but got there in the end, thanks to braindead and PunkNugget for getting me going in the right direction!

Now with the potential of this machine sitting idle I think it's time to have a crack at a decent OC.

As PunkNugget also did, over the last several weeks I have been reading this thread start to finish a few times and also the OC guides linked by BDM. Every time it makes more and more sense and I think I'm ready to put it into practice, hopefully without coming back and annoying you all too much haha.

But I do have a few questions first, so that later on I can minimize the need to ask for help every 10 minutes :)

I need to run OSX 10.8 in order for my Raid card to function properly. This card was one of the biggest troubles I had with the build but 10.8 has unexpectedly fixed this and it's been running great. This means unfortunately I will be looking to overclock, NOT underclock. :(

When overclocking in 10.8 do I set Generate C states & P states to YES or NO in my smbios.plist?

My RAM is g.skill ripjaws Z rated at 1600MHz
If I recall correctly your systems were using 2000MHz RAM which obviously has a bit more headroom.
Will my RAM impact the overall limit of my OC? My CPUs are X5679 3.2GHz with a CPU multiplier of 24.

One thing I'm curious about is that I've read the memory speed set on the X5679 CPUs is 1066, or 2:8 as I understand it - http://forums.lenovo.com/t5/ThinkStation/Xeon-x5679/td-p/627741/page/2

However in BIOS I can have it set at 1333 and it shows up in OSX as 1333. Even when I set it to 1066 in BIOS, OSX still shows as 1333. Is this strange behaviour or is it unrelated? So many values and settings to remember!

Now sorry if I'm thinking out loud here but I swear it helps :)
As a rough outline my goal is a maximum 4GHz if my Noctua NHD14 air coolers can keep temps ok.

So if my target CPU Freq is 4008MHz

That means BCLK = 167 and CPU Multiplier = 24

My understanding is that the Memory Frequency shown in BIOS is equal to the ratios as follows:
6x ratio = 2:6 = 800MHz
8x ratio =2:8 = 1066MHz
10x ratio =2:10 = 1333MHz

Please correct me if I'm wrong, I wouldn't be surprised if I have this mixed up! I see it written in so many different ways all the time.

Now if I run my memory at 1333MHz (10x) as recommended earlier in the teachings in this thread, does that mean after OC my memory will be running at 10x BCLK (167) = 1670? I think this would put it over it's 1600MHz spec.

Should I lower the BCLK to 160 to match the RAM (giving me a CPU OC of only 3840MHz)
OR
Lower the RAM ratio to 1066MHz (8x) meaning with OC it will be running at 1336MHz - safely under it's rated spec. But I can still keep my 4008MHz OC.

My reason for asking is I have the impression that when you OC the CPU, the RAM is also OC by the same value and it's all linked together.

I hope I'm on the right track and thanks for any replies!
 
...my SR-2... I need to run OSX 10.8 in order for my Raid card to function properly. This card was one of the biggest troubles I had with the build but 10.8 has unexpectedly fixed this and it's been running great. This means unfortunately I will be looking to overclock, NOT underclock. :(

When overclocking in 10.8 do I set Generate C states & P states to YES or NO in my smbios.plist?

First try them set to "No."

My RAM is g.skill ripjaws Z rated at 1600MHz
If I recall correctly your systems were using 2000MHz RAM which obviously has a bit more headroom.
Will my RAM impact the overall limit of my OC? My CPUs are X5679 3.2GHz with a CPU multiplier of 24.

One thing I'm curious about is that I've read the memory speed set on the X5679 CPUs is 1066, or 2:8 as I understand it - http://forums.lenovo.com/t5/ThinkStation/Xeon-x5679/td-p/627741/page/2

However in BIOS I can have it set at 1333 and it shows up in OSX as 1333. Even when I set it to 1066 in BIOS, OSX still shows as 1333. Is this strange behaviour or is it unrelated? So many values and settings to remember!

Now sorry if I'm thinking out loud here but I swear it helps :)
As a rough outline my goal is a maximum 4GHz if my Noctua NHD14 air coolers can keep temps ok.

So if my target CPU Freq is 4008MHz

That means BCLK = 167 and CPU Multiplier = 24

My understanding is that the Memory Frequency shown in BIOS is equal to the ratios as follows:
6x ratio = 2:6 = 800MHz
8x ratio =2:8 = 1066MHz
10x ratio =2:10 = 1333MHz

Please correct me if I'm wrong, I wouldn't be surprised if I have this mixed up! I see it written in so many different ways all the time.

Now if I run my memory at 1333MHz (10x) as recommended earlier in the teachings in this thread, does that mean after OC my memory will be running at 10x BCLK (167) = 1670? I think this would put it over it's 1600MHz spec.

Should I lower the BCLK to 160 to match the RAM (giving me a CPU OC of only 3840MHz)
OR
Lower the RAM ratio to 1066MHz (8x) meaning with OC it will be running at 1336MHz - safely under it's rated spec. But I can still keep my 4008MHz OC.

My reason for asking is I have the impression that when you OC the CPU, the RAM is also OC by the same value and it's all linked together.

I hope I'm on the right track and thanks for any replies!

Here's the dirty little not-so secret Math secrets: On Nehalem/Westmere mobos, if you use BCLK to tweak CPU speeds, by what ever percent BLCK is raised over the factory 133 MHz it also gets applied to the CPU interconnect buss (QPI), uncore and ram. Thus, you must note what all of the values for these parameters are at the factory settings.

Generally, on most Sandy Bridge ("SB") and Ivy Bridge ("IB") CPUs, the overclock unfortunately gets applies to a host of other busses like the PCI-e buss {causing havok to SATA/mouse/keyboard/video connections} causing big problems, but that's another sad story.

So the Math is:167 (BCLK oc'ed) / 133 (BCLK base) = ~ 1.26 (rounded up); check: 3.2 GHz x 1.26 = 4.032 (in the ball park!).

The 1.26 factor also gets applied to QPI [factory high setting is 6.4] (speed at which CPUs talk to each other and other kids on their blocks): 1.26 x 6.4 = 8.064 (that's too high for the SR-2 - that's about how fast dual SBs {and in the future, dual IBs will} communicate); so QPI has to be lowered to 5.8 (rounded); yielding 1.26 x 5.8 = 7.308 (meaning that you'll have to set more voltage to QPI and tweak the signal strength downward or OC will fail, unless you lower QPI more to 4.8 [4.8 x 1.26 = 6.048, well within 6.4 factory setting - so no more voltage will be needed for QPI and you may not have to adjust signal strengths downward].

The 1.26 factor also gets applied to ram: 1.26 x 1333 = 1679.58; so if you have 1600 MHz rated ram, it'll be overclocked and have to be tweaked with more voltage and/or adjustments to the first 3-5 ram parameter settings and you may have to adjust signal strengths, lowering all of them. But if you set ram to 1066, then 1066 x 1.26 = 1343.16; well under 1600 MHz, so you will not have to overclock ram or tweak down those first few parameters or adjust signal strengths.

The same analysis and math applies to your uncore settings.

Also, many overclockers of the SR-2 begin by lowering all of the signal strength parameters (except for the PCI-e ones) to their lowest negative value settings.

Don't get me wrong - the fastest overclocks will be derived from a modest amount of overlocking of the QPI, ram and uncore (in addition to those CPUs), but it will require more experimentation on your part to find the sweet spots.
 
Last edited:
Thanks Tutor!

Ok so going past the manufacturer rated specs is possible but you need to tweak voltages? I guess that's the general idea of overclocking anyway.

I think I'll go with the lower values for now just to get started, and see how my temps are.
If it's not getting too hot I'll push on further.

Would be interesting to see how big a difference there is between the two approaches, knowing that cosmetically the CPU speed would stay at the same ~4GHz.

BDM has compiled all the info in this thread into a mini-guide for me so I'll be following that very closely.
 
Last edited:
Well I'm sort of getting there.

I started off following the stage 1 approach of isolating BCLK and VTT but I kept getting FF, even at values close to what you guys had been reporting as final overclock values. If it didn't FF, I was in the death cycle of BIOS resets.

Well that was probably because I had been screwing around with switching my Windows installation over to a RevoDrive PCIe SSD and using the A56 BIOS.

Now after a few hours and starting to get a bit frustrated I switched back to A49 BIOS and thought I'd try the Dummy OC at 3.6GHz that I'd tested out months ago. Lo and behold, BIOS settings stuck, but I now found out I've ruined my Windows boot manager during my earlier messing about.

Unfortunately this is where my methods got a bit un-glamorous. With that minor success in mind I took note of what the dummy OC values were for everything. Then I went a bit gung ho (awaits forum persecution) and just went for a reasonably generous target OC, and upped the VTT and VCore slightly higher than the dummy OC.

Verbose boot into OSX gave be a PCI error, so i went back and upped PCI frequency to 102.

Hooray I have booted into OSX! First GB scores were quite ok, but I wanted to turn C and P states off. Next round of GB and CB scores were slightly better. Temps only just hit over 50º which was lower than I expected.

Geekbench: 32004
Cinebench: 21.14

So there is definitely a lot more to do. Now I know it's at least possible I'm going to compose myself and start again from the beginning, systematically working my way up to find the lowest stable voltages for my BCLK and take it from there.

What I have achieved is in no way torture tested at the moment so when I get Windows going again OCCT is going to get a work out!

All these scores were with QPI set to 4.8GHz and Memory Frequency at 1066MHz (8x). So there could be room for improvement there too, as Tutor said if I want to explore applying higher voltages.

I'll report back once I've revised everything... then will come the big thankyou list to you all :)
 

Attachments

  • GB_4-01GHZ_nostates.png
    GB_4-01GHZ_nostates.png
    119.7 KB · Views: 107
  • CB_4-01GHZ_nostates.png
    CB_4-01GHZ_nostates.png
    256.4 KB · Views: 131
Well I'm sort of getting there.

I started off following the stage 1 approach of isolating BCLK and VTT but I kept getting FF, even at values close to what you guys had been reporting as final overclock values. If it didn't FF, I was in the death cycle of BIOS resets.

Well that was probably because I had been screwing around with switching my Windows installation over to a RevoDrive PCIe SSD and using the A56 BIOS.

Now after a few hours and starting to get a bit frustrated I switched back to A49 BIOS and thought I'd try the Dummy OC at 3.6GHz that I'd tested out months ago. Lo and behold, BIOS settings stuck, but I now found out I've ruined my Windows boot manager during my earlier messing about.

Unfortunately this is where my methods got a bit un-glamorous. With that minor success in mind I took note of what the dummy OC values were for everything. Then I went a bit gung ho (awaits forum persecution) and just went for a reasonably generous target OC, and upped the VTT and VCore slightly higher than the dummy OC.

Verbose boot into OSX gave be a PCI error, so i went back and upped PCI frequency to 102.

Hooray I have booted into OSX! First GB scores were quite ok, but I wanted to turn C and P states off. Next round of GB and CB scores were slightly better. Temps only just hit over 50º which was lower than I expected.

Geekbench: 32004
Cinebench: 21.14

So there is definitely a lot more to do. Now I know it's at least possible I'm going to compose myself and start again from the beginning, systematically working my way up to find the lowest stable voltages for my BCLK and take it from there.

What I have achieved is in no way torture tested at the moment so when I get Windows going again OCCT is going to get a work out!

All these scores were with QPI set to 4.8GHz and Memory Frequency at 1066MHz (8x). So there could be room for improvement there too, as Tutor said if I want to explore applying higher voltages.

I'll report back once I've revised everything... then will come the big thankyou list to you all :)

Good for you my man !!! Enjoy TT'ing using OCCT. With the right tweaking you should easily reach a GB of over 35,000+. Later... :)
 
The ascendency of the GPU

As deconstruct60 cogently points out here: https://forums.macrumors.com/threads/1422732/ - "[F]ocusing on the CPU is misguided going forward. Currently and going forward there will likely be much more computational "horsepower" in two 200W PCI-e cards than there will be in the CPU "chamber" of a workstation. It isn't primarily just about the CPUs anymore and that is one of the problems with the current Mac Pro case design." This is especially so because of the CUDA cards. One GTX 690 renders 2.4x faster than 4 E5-4650s.

Currently, my render farm has 13,356+ overclocked CUDA cores (29,088+ GFLOPS) and 17,520+ overclocked stream processing units (26,160+ GFLOPS). Currently, the CUDA cores are the most useful. So my days of purchasing ATI cards are coming to an end.
 
Last edited:
That's quite an impressive array of CUDA power!

I think the world of CUDA is very promising, and up-take seems to have got Nvidia off guard a little - with gaming cards being preferred in some pro apps. Not too long ago this was clearly the domain of the Quadro series.

My only hope is that software keeps being developed to take advantage of CUDA... as seen in my discussion with PunkNugget a few posts back, even though an app claims 'multi-core/multi-processor optimised' does not mean all features are using the machine to it's full potential. And multi-cores have been around for quite a while now.

It does seem hard to get my head around hundreds of tiny GPU cores churning through work faster than a massive CPU though... Tutor do you know how CUDA is able to work so efficiently? What tasks is it best suited to and what does it possibly do a terrible job at?
 
That's quite an impressive array of CUDA power!

I think the world of CUDA is very promising, and up-take seems to have got Nvidia off guard a little - with gaming cards being preferred in some pro apps. Not too long ago this was clearly the domain of the Quadro series.

My only hope is that software keeps being developed to take advantage of CUDA... as seen in my discussion with PunkNugget a few posts back, even though an app claims 'multi-core/multi-processor optimised' does not mean all features are using the machine to it's full potential. And multi-cores have been around for quite a while now.

It does seem hard to get my head around hundreds of tiny GPU cores churning through work faster than a massive CPU though... Tutor do you know how CUDA is able to work so efficiently? What tasks is it best suited to and what does it possibly do a terrible job at?

Tutor might give you some info, but he (as he's shared with me before) doesn't know much about nVidia's product line as he's a big ATI fan; especially with using the 4890 (which I have 2 of them). If he replies here, he'll explain all the reasons why. In the meantime this was the initial article that got me going (in my head) in wanting to reinvestigate the CUDA / Mercury (rendering) thing (when it comes to using Adobe's CS5 & CS6 apps that utilize this feature:

PLEASE READ THE WHOLE ARTICLE FIRST BEFORE YOU CONTINUE:
http://film-sound-color.tumblr.com/post/26071716910/how-to-enable-gpu-cuda-in-adobe-cs6-for-mac-by

"I was able to export a short but complicated high-definition sequence with Premiere Pro CS5.5 in 2 minutes, 29 seconds, and then in Premiere Pro CS6 in 2 minutes, 18 seconds--a minor difference. However, when I disabled the Mercury Playback Engine in CS6 and exported the same project using only my workstation's dual Xeon CPUs, the job required 14 minutes, 30 seconds. Clearly, if you're going to be using Premiere extensively, your money's best put into GPU power, not CPU power." - Alan Stafford, PCWorld

Now I'm actually going to be putting this to the test within the next two days with my nVidia 580, 480 (but I have to get that recognized by my system first as the 480 is not a compatible card for CS6 product line), and the ATI 6870. So we'll see if what this guy says is true. Mind you I use ALL of Adobe's product line and now use CS6 as my full time product; especially when using Adobe Premiere and After Effects. So if this test holds true, then I'll be going back to using the 480 and 580 in a heartbeat as I'm noticing very slow rendering times when I'm outputting 10 to 15 min vids. Pre-renders though are holding up well using the 6870, but not when I'm exporting. So again, we'll see. I have to perform this test within the next 72 hours. I will get back to you on this... But to be fair to this site I will guide you here instead for further GPU communication:

https://forums.macrumors.com/threads/1485401/

and throughly look at ALL the links and read through ALL of them... throughly, so you get a better understanding of what I'm finding out that may be true. But again, I need to do my own personal tests in that area this week to get the results that I hope will be encouraging news... :cool:
 
That's quite an impressive array of CUDA power!

I think the world of CUDA is very promising, and up-take seems to have got Nvidia off guard a little - with gaming cards being preferred in some pro apps. Not too long ago this was clearly the domain of the Quadro series.

For computation I'd go with a Tesla rather than a Quadro for the memory. Otherwise I'd go with Tutor's setup, but that would be difficult once you're looking at loads of texture files which must be cached to video ram. This was handled differently prior to 64 bit rendering applications with things being mapped to virtual memory. Computing is rarely regressive though. If the gpu had better access to main memory, that would be an ideal solution.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.