Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
... along with some other posts musing as to how the 2.0-2.2GHz of the new E5 2600's might be too slow for single threaded work....





New Anandtech benchmarks on a server. [ Might not be such good results on older , Pentium 3 or 4, optimized code but on modern compiled software looking to leverage the float abilities..... ]


Image
http://www.anandtech.com/show/5553/the-xeon-e52600-dual-sandybridge-for-servers/10

If the vast majority of the workload is single thread then the E5 1620 is probably the much better option (in a bang for the buck context). But it seems straightforward at this point that if only have workload on a single core the turbo capabilities of the E5 is substantially better than what Turbo was in the X5600 series.

Apple doesn't "have to" go to the 130W bleeding edge to turn in performance greater than the former Mac Pro top of the line : X5670 .
The E5 2665 or 2670 ( 2.4, 2.6GHz base at 115W) would probably put a decent gap between the old top range and the new just on single thread without having to resort to matching base clock rates exactly.

Right, the turbo boost has changed the game with SB-E5. The 2620 6 core 2.0 turbos to 2.5, while the 2650 2.0 8 core turbos to 2.8(!). I was cautiously optimistic it would be that good, but I wasn't counting on it.
 
Right, the turbo boost has changed the game with SB-E5. The 2620 6 core 2.0 turbos to 2.5, while the 2650 2.0 8 core turbos to 2.8(!). I was cautiously optimistic it would be that good, but I wasn't counting on it.

Magnified turboboost ranges is the road to be traveled by speed lovers who put to much use highly threaded applications. Along that journey, you'll see the old and the new. Turbo biasing rules even on yesteryears' Westmeres, particularly because Westmere's cores can leap 13 or 14 bins in a round robin, from low 2 GHz underclocks to the mid to high 4 GHz range, all the while keeping the Vcore within spec VID, i.e., no overvolting sanctioned here. My favorite native power management turbo boost ratio is DDDDEE (13,13,13,13,14,14) per each 6-core 5680 CPU. I've found that for many things such as for that 2 GHz underclock, less is more. A one core can be clocked higher with less voltage, heat and current consumption than a two core which can be clocked higher with less voltage, heat and current consumption than a four core which can be clocked higher with less voltage, heat and current consumption than a six core which can be clocked higher with less voltage, heat and current consumption than an eight core, etc. Turboboost thrives in low voltage, cool, low current conditions when a load is encountered. Until a paradigm shift occurs, we'll have to make the best of and enjoy the in the meantime. So don't be overly focused on the SBee's lower Ghz clockings at idle, if the turbo range is high, unless all that you mainly use are low- or non-threaded applications.
 
Magnified turboboost ranges is the road to be traveled by speed lovers who put to much use highly threaded applications.
... blah ... blah ... blah ....

Effective, highly threaded applications would deactivate turbo.... not leveraged or be highly sought after but those who have such applications.

Turbo is good for the apps that primary only activate only one core. If that is the only program the user is activity working with at the moment (and there are no background task) then the other cores can be turned off and the subset of remaining ones run at a higher clock rate.

If all the cores are highly active Turbo will turn off because there is no "excess" power lying around unused elsewhere on the die.

The lone subset of threaded problems that could benefit from Turbo are those that has a have repetitive and lengthy fork/join phases where the applications drops out of being highly threaded into a bottlenecked scalar session after the the "join". In the Anandtech review the 3DS benchmark has that flaw. (in constrast the LS DYNA doesn't. ) . That's is only because it is the scalar aspect that is the hold up. It isn't "Turbo" that necesarily need, just raw clock because it is a scalar problem masquerading as a parallel one.
 
Effective, highly threaded applications would deactivate turbo.... not leveraged or be highly sought after but those who have such applications.

Turbo is good for the apps that primary only activate only one core. If that is the only program the user is activity working with at the moment (and there are no background task) then the other cores can be turned off and the subset of remaining ones run at a higher clock rate.

If all the cores are highly active Turbo will turn off because there is no "excess" power lying around unused elsewhere on the die.

The lone subset of threaded problems that could benefit from Turbo are those that has a have repetitive and lengthy fork/join phases where the applications drops out of being highly threaded into a bottlenecked scalar session after the the "join". In the Anandtech review the 3DS benchmark has that flaw. (in constrast the LS DYNA doesn't. ) . That's is only because it is the scalar aspect that is the hold up. It isn't "Turbo" that necesarily need, just raw clock because it is a scalar problem masquerading as a parallel one.

A pair of Sandy Bridge E5 2.9 GHz 2690s 16-(8x2)-cores match my underclocked/turbobiased, pair of Westmere 5680 12-(6x2)-cores in Cinebench and are extremely close to my one and a half year old system in Geekbench 2.

What do you classify as "Effective, highly threaded applications?" In which category would you assign Cinema4d, Cinebench, Final Cut X, 3d Max, Geekbench2, and AIDA64/Lavalys EVEREST Ultimate Edition - 64 bit? At an underclock setting of 2.483 Ghz, my turbobiased WolfPack1 with dual 6-core 5680s scores 40,100 in Geekbench 2 (see URL in sig) and 24.7 in Cinebench 11.5 (the same score as reported here [ http://www.anandtech.com/show/5553/the-xeon-e52600-dual-sandybridge-for-servers/10 ] by Anand for dual 8-core Sandy Bridge E5 2.9 GHz 2690s. Wolfie also renders HD video and animations faster than one could imagine and AIDA64/EVEREST concurrently shows that there are rapidly changing 6-core, 4-core and 2-core speed groupings (per CPU) of cores who participate at each of three speed levels and in different groups, with corresponding temperature fluctuations throughout the rendering process. In fact, Anand says, in the article referenced above by me, that the Sandy Bridge E5s have a turbo stage for all cores, then another for a smaller subset, where he states, "The first clockspeed mentioned is the regular clock, the second the turbo clock with all cores active (most realistic one) and the last the maximum turbo clock." [http://www.anandtech.com/show/5553/the-xeon-e52600-dual-sandybridge-for-servers/2 - Emphasis added] Thus, I disagree with your assertions that highly threaded apps deactivate turbo and that turbo is good primarily for apps that task just one core (unless you're referring only to a pre-Sandy Bridge E dual or 4-core CPU with turboboost capability - such low core CPUs activate turbo only for one core at a time, although that one core isn't necessarily the same one on each occasion because the more active one becomes hotter and on the next trigger, if not much time has passed, a cooler one gets selected). In fact, the more apps I run simultaneously, the more turbo kicks on as is shown to me by running AIDA64/EVEREST at the same time that I run the underlying application. And as for there being "no excess power lying around" one of your blahs may have hidden my statement that I set my system's Vcore below spec VID, i.e., I undervolt, and I underclock so that there is some "excess power lying around. "How do you monitor the activity of your cores under various settings/conditions and how many cores are you managing per system? Any more advice that you can give me on making my system perform better would be greatly appreciated because I would like to have the fastest two processor Geekbench tested system running OSX for this the fourth year in a row.

P.S. Just as in the case of my 24.7 Cinebench 11.5 score (if you browse the top Geekbench 2 scores, you'll see that) I can't be too secure with that 40,100 Geekbench 2 score for my dual 6-core Westmeres because today a Patrick_ServeTheHome tested a dual Xeon Sandy Bridge E5 2.9 GHz 2690 16-(8x2)-core system running Windows, and it's hot on Wolfie's two processor 12-core 2.483 GHz tail, scoring 39,239. So please, "HELP!" I knew that the day was coming when a pair of expensive top of the line Sandy Bridge E5s would challenge my older and much cheaper WolfPack1. Luckily, WolfPack1 has two bigger brothers in hiding until EVGA releases that SRX. Each big brother scores in excess of 24,000 in Geekbench2 and close to 14 in Cinebench 11.5, without any tweaking; then, they'll get together to form another WolfPack and redeem the WolfPack clan, thereby greatly improving the yield from the render forest.
 
Last edited:
Isn't Haswell more for mobile computing? Really low power consumption, that kind of thing...
malch
 
Isn't Haswell more for mobile computing? Really low power consumption, that kind of thing...
malch

Yeah but so were Sandy Bridge and Ivy Bridge. That's the focus now. I doubt we will see it for Servers and workstations until 2014.
 
I hope I'm wrong, but I have this sinking feeling that we may still be in for a long wait just to hear anything about MP directly from Apple. Every week that passes will seem like an eternity now.
 
Isn't Haswell more for mobile computing? Really low power consumption, that kind of thing...
malch

No. Haswell is just a name on the micro-architecture generation.

In short, I think more "non x86 core" stuff will be added in Haswell Xeon E5's than simply going with "yet another pair of cores and more L3 cache".
There are a substantial set of PCI-e add in cards that have been common in servers workstations ( RAID , very high speed I/O , GPU ). All of these are candidate for either moving into the support chipset or the CPU package itself during the Haswell era.

As to the Haswell buzz that has gotten the most attention ....

There will be a System on a Chip (SoC) Haswell model that fuses the CPU + GPU + I/O chipset into one package, but that is just one implementation of the architecture, not the whole spectrum of products.

The first chips out of the gate for Haswell that cause a big stir though probably will be these SoC models. It makes for a great story about Intel's "war" against the ARM incursion with the ARM15 implementations hitting the market about the same time. (Although, whatever the latest Atom will be taking point along that "war" front. )


However, Haswell is going to also support "Transactional Memory" semantics.

http://arstechnica.com/business/new...emory-going-mainstream-with-intel-haswell.ars


http://www.realworldtech.com/page.cfm?ArticleID=RWT021512050738

I'm pretty hard press to come up with a reason for why this is a "primarily for mobile" technology. In fact, it wouldn't be all that surprising if got weaved into the Ivy Bridge Xeon E5's , but probably won't. It is going to be far more effective on machines where there already are highly parallel treads banging on a significant number of locks (e.g., SQL Server , Oracle DB, etc. ) than grandma's inexpensive laptop running Mail and Safari. Certainly the E7's of the Haswell era will have it (perhaps tacked onto the Ivy bridge implementation). More cores ( 8+ ) , more leverage.

Haswell has features for making multiple core systems more effective. When 4 cores is commonplace (except for smallest/lightest mobile devices), there needs to be a larger number of programs that add substantive value.

Haswell is also going to add better GPU to the package that many of the "consumer" packages get. That is going to make for more inexpensive desktops also where the the PCI-e card becomes unnecessary for "good enough" graphics.

On the Xeon front Depending upon what else Intel decides to add to the E5 class, they may decide to add a "small" GPU, if OpenCL has gotten higher traction within Intel over last year or so. It is far easier to implement Thunderbolt if there is embedded GPU in the server/workstation. That GPU is hooked to the display port inputs of the TB controller along with 4x PCI-e lanes. Ta-da ... done. Even if there isn't a display hooked to the workstation/server's TB port the GPU could be used as a GPGPU with OpenCL for additional number crunching.


However, I suspect there is going to be pressure to add SATA III to the CPU package. Moving it out of the I/O hub and into the CPU package takes substantial traffic off the DMI link to the I/O hub support chipset. ( the C600 chipset has a kludge now where the extra deluxe SAS/SATA functionality would soak up PCI-e lanes in addition to the DMI link in order to reach full bandwidth back to the CPU. That kludge can be removed entirely if just put the at least a subset of the SATA controller implementation inside the CPU. There is a new I/O bandwidth problem inside, but that's easier to handle. )

Moving high end SAS/SATA frees up space and bandwidth to put USB 3.0 in the I/O support chipset . Or add 10Gb Ethernet to the chipset. Both of these are in the "mature" stage and "widely enough" deployed "stage". So adding them shouldn't inject many new bugs to work out and users will see substantive added value.

Haswell Xeon E5 will probably end up like Haswell Xeon E3's and the other "consumer" implementations with exactly the same x86 core count. E3 and consumers capped at 4, but 4 being more common. E5's capped at 8, but with 8 being more common (more 6's disappear but the 4's staying around for max GHz lovers. ).
 
Last edited:
Love the new HP workstations. Can't wait for them to get available so I can say bye to iApple... :cool:

I hear you there. Wish I had the money back that I spent on the Mac Pro. I'd get a Z820. While I still love my Macs just about all of the things I can only do on them work just fine on a C2D MBP and mini. The mini isn't quite as fast on Lion as it is on SL but scoots along at an acceptable speed.
 
Bumping again. Too much good info for this thread to slide off the first page. Hopefully with the The New iPad™ launch on Friday, we'll get some solid news on the Mac front.
 
Well if Apple thought they might skip Sandy for Ivy, then that isn't likely at all with this, and then again Intel may realize this market is shrinking and stalling. But it does give a Sandy MP update some legs for over another year.

The Ivy and Haswell crowd should just pony up for E5 as it may be a long time before we see any high performance new architecture chips. Ivy-Mobile will be released well before the IB-E and, if SB-E was any indication, add another 6 months for Xeon Ivy.
On the plus side, anything you buy should retain it's "bragging rights" for quite a while. Now AMD really needs to get it's sheet together and compete.
I hope Intel realizes the market is shrinking directly related to their own actions. No one is buying workstations right now because we all know updates are immanent. None of my users have expressed any motivation to move over from a Mac Pro to anything else. So in my land the market is either stagnant or iMac users are waiting for Mac Pro too as they realized they hate the fan noise and lack of expansion. The Mac Pro market may be shrinking in the public sector but nowhere else.
 
Well if Apple thought they might skip Sandy for Ivy,

Unless Apple is "trading down" from E5 to E3 there was and is no skip Sandy strategy.

and then again Intel may realize this market is shrinking and stalling. But it does give a Sandy MP update some legs for over another year.

I don't read that at all. At least not shrinking. Haswell is also sliding out. The E5 (and derivative SB-E) have PCI-e v3. The more mainstream Ivy Bridge are just going to get it with the new updates.

If Haswell (and its transactional memory) is sliding out then it would make sense for Ivy Bridge E5's (and derivative IV-E) to pick it up so that hit the market around the same time. It makes zero sense to slide out IB E5's and that hog the transactional memory stuff to an even later Haswell E5 when the feature has much bigger impact with 8 cores than it does with 4 (or 2).

The stuff that Intel has queued up for Haswell (TX memory and better Trace Cache execution) are significant microarchitecture jumps that could have some hiccups in correctness. A longer roadmap gives them time to get the bugs out before blowing up partners' roadmap with a defect.

It also would make more sense to get the Atom line-up to 14nm quicker than the classic PC CPU line up to that level. That whole line up should be SoC not just the minor subset in Haswell's. Haswell can slide out and Broadwell doesn't have to be the first onto the 14nm process. Smaller SoC dies on a new process are more likely to get more working parts if the wafer "baking" isn't uniform. AMD may not be the fierce competitor of yesteryear, but the ARM implementers are.

Certainly, AMD isn't pushing them to be hyper aggressive but 14nm and 10nm are going to be harder to do in volume at the same profit levels.
 
I'd prefer seeing a slide of Intel's own roadmap.
They don't have a copy of Intel's latest RoadMap, and are making an attempt to figure it out based on the recent announcement from Intel (delay on Enthusiast IB parts).

The main point I got out of that article, is they're indicating that Intel is lengthening their release cycle time on this particular segment due to a lack of competition.
 
I hope Intel realizes the market is shrinking directly related to their own actions. No one is buying workstations right now because we all know updates are immanent.

If people are just deferring replacements longer then the market isn't shrinking. Just time shifting. The potential buyers aren't going down.

If people were bolting for AMD ( or Power or Sparc or ARM ) then it would be shrinking. In the E5 range of processors it probably more so "just wait a bit longer".
 
Sandy Bridge is fine with Ivy Bridge bringing in lower TDP and a possible 10 cores on the Xeon side. I do not expect many Sandy Bridge owners to be jumping for Ivy Bridge.

I am now set on 2013 or 2014 to replace my aging CPU.
 
If people are just deferring replacements longer then the market isn't shrinking. Just time shifting. The potential buyers aren't going down.

If people were bolting for AMD ( or Power or Sparc or ARM ) then it would be shrinking. In the E5 range of processors it probably more so "just wait a bit longer".
Keep in mind though, that some traditional workstation users are now able to utilize Enthusiast parts due to higher core counts (i.e. users that needed core counts only possible in DP systems, but not ECC). So the traditional workstation market has shrunk from this particular case.

Intel isn't loosing money in this particular case however, as these customers are still buying Intel parts. But the figures are tallied to a different segment.

Clusters have had an impact as well, particularly as they've started to come down in cost. Again, Intel is still selling CPU's in this case as well, but the figures are tallied to the server market instead of the workstation market.

So I see it as how the workstation figures are determined is all. But technically speaking, they do show a reduction, while other segments are picking up the "losses".
 
Keep in mind though, that some traditional workstation users are now able to utilize Enthusiast parts due to higher core counts (i.e. users that needed core counts only possible in DP systems, but not ECC). So the traditional workstation market has shrunk from this particular case.

Enthusiast , as Intel uses the term, is the effectively the same die as the E5s. If they slow the delivering of the E5 the, *-E , part comes slower too.

Marketing and sales folks tend to create new labels for the same old stuff just a means of indirection to create the illusion markets changing. Buzzword market du-jour. When that one wears thin ... roll out another one.

The significant loss would be to the 4 core capped "consumer" version. Loss to the E3's (and its cohorts) are a shrinking loss for the E5's (and its cohorts).


Clusters have had an impact as well, particularly as they've started to come down in cost.

All computers tend to come down in cost over time. That is the core issue. Typically users have moved down the cost curve over time along with the hardware. Some trend water ( computer costs constant), but a substantial number move down over time.

An important source of growth comes from drawing in "new" folks. The more pressing workstation problem is what new apps are 'killer apps' for a new community. Selling newer things to the same fixed population of folks who jumped in 2-5 years ago only leads to stagnant growth.

"Clusters in a box" is where workstations could see some growth. What is missing is accessible, highly value add software to drive that.
 
Enthusiast , as Intel uses the term, is the effectively the same die as the E5s. If they slow the delivering of the E5 the, *-E , part comes slower too.

Marketing and sales folks tend to create new labels for the same old stuff just a means of indirection to create the illusion markets changing. Buzzword market du-jour. When that one wears thin ... roll out another one.
Of course.

Though I didn't explicitly state it, I presumed you realized I was comparing similar parts (same core counts = same die used in the Enthusiast variant as the Xeon, thus ECC is the primary differentiating factor, if not the only one).

Not comparing E3 vs. E5 (or similar). Though consumer E3 CPUID's have also reduced the workstation figures a tad for those who's software applications can't benefit from more cores that fit on a single die (consistently/most of the time, which is the same reasons they'd be able to go to the Consumer/Enthusiast variant of the equivalent Xeon <same die>).

My comment was aimed at how the statistics for traditional workstations have shifted due to the varied CPUID's, even if it's actually the same die (i.e. result of their software usage). So for those that don't need ECC or can't leverage more cores than fit a single die, they have the option of buying a different system (one in which its sale will be accounted to consumer systems, not as a workstation, given how the current metric is derived).

They could change the metric, but that would be less accurate in determining what system type consumers are buying.

All computers tend to come down in cost over time. That is the core issue. Typically users have moved down the cost curve over time along with the hardware. Some trend water ( computer costs constant), but a substantial number move down over time.
It's for this very reason clusters are picking up steam, as they're more cost effective than using more workstations and employees to operate them.

Not sure what you got out of the post, but it's not in disagreement.

Just that the shift in system types purchased is reflected in the statistics (i.e. sales formerly part of the workstation class, are being accounted for in either consumer or server statistics as those buyers are no longer buying Xeon equipped workstations).

"Clusters in a box" is where workstations could see some growth. What is missing is accessible, highly value add software to drive that.
Again, I don't disagree at all.

My point was, that even though they're being used for workstation use, their sales are attributed to SERVERS, not workstations (for this particular case).

Again, it comes down to the current standards for defining what type a system is, and ultimately, which group the sales are attributed to. That was my primary point.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.