Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

NeuronBasher

macrumors regular
Jan 17, 2006
188
0
(Besides, the compilers do a good deal of optimization for each platform).

The compilers do a pretty good job of optimization, but I honestly don't know if they do any automatic optimzation for the Altivec or SSE engines.

After all, "pros" aren't supposed to rely on notebooks or iMacs, right?

There's where your premise goes awry. The MacBook Pro has Pro in it's name for a reason. It's meant to be a portable workstation for "pro" users.

With that said, I wasn't aware of the Apple Accelerate framework since my programming is at a higher level than that. With an extra layer of abstraction built in, it means that the developers only have to code against the Accelerate libraries and trust Apple to do the right thing with optimizations. There will be some hand-coded SSE and Altivec code, certainly, but hopefully it will be far less than I had originally assumed.
 

Catfish_Man

macrumors 68030
Sep 13, 2001
2,579
2
Portland, OR
NeuronBasher said:
The compilers do a pretty good job of optimization, but I honestly don't know if they do any automatic optimzation for the Altivec or SSE engines.

ICC, XlC, and GCC4+ will autovectorize, but it's pretty primitive compared to what a competent human can do.
 

Krevnik

macrumors 601
Sep 8, 2003
4,101
1,312
Catfish_Man said:
The Xbox doesn't use the G5. A triple-core G5 would be far far too power hungry for it. It uses a three core CELL variant called Xenon.

My theory on why IBM has failed to deliver is that Apple has failed to fund (as well as the fact that the 90nm transition was just brutal for the whole industry). A high end chip line is *expensive* to design and produce, and Apple's sales volumes just aren't big enough to justify significant investment. The G5 has done fairly well by piggybacking off the design work for the POWER4, but that doesn't help much when aiming at laptops.

Uh, not quite here... The Xenon is a tri-core 970 variant. It loses OoO execution, and some of the more complex parts of the branch prediction, but gains in pipeline width for the VMX unit (more vector processing units on each core), gains SMT and the hypervisor.

Cell uses a single 970-derived core with a lot of the same tweaks (losing the complex branching and OoO, while gaining the SMT), but seems to lack the hypervisor and the VMX additions. The rest of the SPEs on the die are completely custom pieces of work using a FlexIO bus out to IIRC, an onboard DMA controller which managed each core's communication with the outside world and through their cache.

The only reason why Xenon was able to handle three cores is because they removed an awful lot of stuff from the core design which takes a fair amount of space, and they consolidated the entire cache into a single 1MB shared cache. The 970 dual-core design allocates 1MB to each die, which accounts for a chunk of space. By dropping down to 1MB and sharing it, they almost get enough room for a third core from that alone.

I would like to mention that the 360 /is/ very power hungry, drawing peak wattage of about 200W.
 

Catfish_Man

macrumors 68030
Sep 13, 2001
2,579
2
Portland, OR
Krevnik said:
Uh, not quite here... The Xenon is a tri-core 970 variant. It loses OoO execution, and some of the more complex parts of the branch prediction, but gains in pipeline width for the VMX unit (more vector processing units on each core), gains SMT and the hypervisor.

Cell uses a single 970-derived core with a lot of the same tweaks (losing the complex branching and OoO, while gaining the SMT), but seems to lack the hypervisor and the VMX additions. The rest of the SPEs on the die are completely custom pieces of work using a FlexIO bus out to IIRC, an onboard DMA controller which managed each core's communication with the outside world and through their cache.

The only reason why Xenon was able to handle three cores is because they removed an awful lot of stuff from the core design which takes a fair amount of space, and they consolidated the entire cache into a single 1MB shared cache. The 970 dual-core design allocates 1MB to each die, which accounts for a chunk of space. By dropping down to 1MB and sharing it, they almost get enough room for a third core from that alone.

I would like to mention that the 360 /is/ very power hungry, drawing peak wattage of about 200W.

The CELL PPE is in no way 970 derived, except possibly the altivec unit. Going from 5 issue out-of-order to 2 issue in-order with SMT is more than enough to qualify as "completely different" in my book. More different than the 604e and 970, certainly, which most people consider unrelated. The PPE
also has a different pipeline length, and makes extensive use of custom-designed logic, while the 970 is built primarily using IBM's automated layout tools. realworldtech.com has a series of excellent articles on the subject; well worth reading for those interested. http://www.research.ibm.com/arl/projects/rivina.html is also likely related.

I admit I'm not 100% sure on how similar Xenon is to the CELL's PPE (the cache structure is quite different, as you noted), but it's definitely quite close. I would dig up some references to back that up, but it'll have to wait until I'm on a more reliable net connection.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.