My game plan for tackling threads
Umbongo,
I believe, like you, that Intel prefers to sell one chip for core count (for servers) and one for clock-speed (for workstations). However, as more and more workstation users run apps that are becoming more and more multithreaded, those users need a machine which has a high core count and has high clock speed performance. However, high core count and high clock speed performance constantly compete against each other because they both generate one of the nemesis of systems containing swirling, somewhat captive electrons and that is heat. That is why I preferred to roll with the EVGA Classified SR-2 as the basis for my workstation because it currently provides the most flexibility to meld high core count and high clock speed performance into a single team.
I stretched the turbo range as large as I could, without sacrificing stability, because I need my machine to run with stability 24/7, i.e., weeks at a time rendering long form projects. My 5680's have factory standard nonturbo steppings from 12 to 25 and for turbo boosting from 26 (for 4 cores) to 27 (for two cores) [per CPU] times the internal base clock (BCLK), so in theory if my system was stable at a setting of 12 my turbo ratio would be EEEEFF [27-13=14 (E in hex); 27-12=15 (F in hex)]. However, setting it to 12 causes it to drop, despite all of my efforts to the contrary, all of the C-state native power management settings. So I have to be content with setting the lowest stepping at 13 [26-13=13 (D in hex); 27-13=14 (E in hex)] to yield a ratio of DDDDEE for each CPU. What this means is that if I set BCLK at 165, then the base speed for all 12 cores is 13 x 165 or 2.145 GHz and at low turbo four cores from each CPU rise to 26 x165 or 4.290 GHz (total of 8 cores at that speed at any one instance of trigger time) and at top turbo two cores from each CPU rise to 27 x 165 or 4.455 GHz (total of 4 cores at that speed at any one instance of trigger time). On my single CPU quad systems, the turbo ratios are DDDE for that one CPU. So I presume as a matter of similar progression that as I use dual CPU's with more cores, the effects becomes even more exaggerated to yield a total system that mimics (1) the then current lower core count speed demon and (2) the "maximal game design" (defined below) for high threaded apps. For example, I suspect that in a dual 10-core Ivy Bridge system, each one of the two ten core Ivy Bridges CPUs will have a comparable ratio of DDDDDDEEEE, such that I would then be able to have teams of 12 core participants (total no. of D's on both CPUs) in low turbo or teams of 8 core participants in high turbo (total no. of E's on both CPUs). Since even Intel has shown consumer Sandies running at or near 5 GHz on air cooling, I suspect that I can get their Xeon descendants to turbo past 5 GHz with no sweat (low voltages and underclocking them even more) and using a hybrid H20 system (either modding two of Intel's new self-contained H20 coolers or modding two Corsair 80's ), yielding teams of 8 cores at about 5.4+ GHz at high turbo and teams of 12 cores at about 5+ Ghz at low turbo.
I use the phrase "maximal game design" to mean a strategy that takes advantage of the shortcomings/advantages in CPU/programming constructs, design, and execution (sort of like watching film of your opponent's game play to discover their underlying strengths and weaknesses). Core allocation state occurs at the trigger point in time by a command from the coach (master core) such as, "You 12 (or 8) perform this task." Within CPU0 lies the core which at any point in time calls the plays. Moreover, even the coach may direct himself to enter the field of play. For these reasons, I allocate a relatively, slightly higher voltage to CPU0 than to CPU1. However, not every task is equal or the same and not every core finishes its assigned task at the exact same moment in time. So when the very next assigment is made it's crucial to have a pool of capable (not voltage laden), fresh (cool) team members to have the environment such that turbo will trigger in a multithreaded league. That core that just finished executing the previous task and has run back to the sideline might be too tired, hot and sweaty to immediately go back into the field of play. I liken it to the structure of a football or soccer team where there is a need for a capable, fresh bench. Of course, the more capable, fresh players on the bench, the more options the the coach has to select the most capable, freshest ones so that it can satisfy the turbo trigger rule in a multithreaded league. And keep in mine that this is occurring continuously and much, much faster than any of us could ever conceive of it occurring (sort of like how the Oregon Ducks football team runs their plays). So the more cool cores at disposal, the more likely it becomes that turbo behaves, in a multithreaded environment, like a continuous phenomenon until what we conceive of as a singular task, e.g., all animation frames have been rendered, is completed. This phenomenon is dependent, however, on the particulars of the decision made by the team owner, i.e., the program's entire thread capable design and the execution of that design.
Please excuse my sports analogies. I can't help it. I reside in Alabama, USA, where football, particularly at Alabama and at Auburn, is more than a simple pastime. For me, football has become a paradigm for threaded computer programming and systems analysis. I'm willing to bet you that Intel's chip designers and engineers, as well as most designers of multithreaded programs, use similar paradigms; however, I'll concede that the sports teams they use probably differ.