Thanks for the replies, everyone. The 'brick wall' so to speak, that Intel hit, sums it up for me. I wasn't aware that they ran into that issue.
The point which others haven't explained (and I'll try to) is why today's processors are faster, at a lower clock speed (discounting the multi processor thing).
Intel had got itself into two problems with the Pentium 4. Firstly, they'd hit the wall in terms of clock speed (around 4GHz). Secondly, in order to reach this high speed, they'd had to increase CPU pipeline lengths.
A CPU can't simply process a whole complex instruction every clock cycle... these instructions must be broken down into smaller simpler steps (get instruction, decode, get first operand, get second operand etc.)
In order to speed things up, a pipelined processor operates on multiple instructions at once. So while the newest instruction is being fetched, the previous instruction is being decoded, the prior one is having it's first operand fetched etc. The last generation of the P4 had a pipeline that was 31 stages in length - that is, 31 instructions were 'in flight' at once.
The problem with a very long pipeline is what happens if the program takes a change of direction. If you're in a loop, or the program hits an 'if' comparison etc. In the worst case, you only discover that you've branched off when a comparison instruction is most of the way to being processed. At this point, all the instructions in the pipeline behind it have to be thrown away - because you're branching off elsewhere and those instructions never should have been executed. This is called a pipeline 'bubble', and it meant that as the
P4 range got higher clock speeds and longer pipelines, they became steadily less efficient. Sure, there are ways to lessen the impact of 'bubbles' - but the P4 approach had run out of steam.
Luckily for Intel, they had a team of engineers in Israel working on a new processor design. This took the old PIII design (which had a short pipeline) and added new improvements to it. In a brave move, Intel decided to completely abandon the P4 architecture and take a step back... and the
Core family was launched.
This new family scaled back the processor speed, halved the pipeline length, but managed to do more for each clock cycle. Instructions were bunched up, SSE instructions executed quicker, more instructions set off at once and yes, the new architecture also better allowed the integration of multiple cores on a single IC. Other important changes came with '64 bit' processors - which also introduces new registers (short term storage areas) and instructions, improving the ancient x86 architecture (which frankly sucked).
Do you think there will be a point in time when CPU manufacturers won't be able to 'outpower' the previous generation of machines due to power consumption issues?
I think we're still some way off that.
I think our current problem is more in software than in hardware. While the use of multi core CPUs has become more prevalent, I think app writers have failed to make their software take advantage of this.
I own a 4 year old 8 core Mac Pro. While many of the newer consumer Mac line is faster for single threaded tasks, this computer still holds its own with applications that can use all of those cores. I've actually seen a speed-up in apps over time on my machine, as more use is made of multi core (for instance Adobe lightroom parallelises more things in the last two releases). Most of the time though, even when running flat out - I'm only seeing perhaps 1/4 of the total available CPU power being used. Only very specialist apps (like video encoding) make use of that power.
So there's a lot of unused power out there, and if CPU makers continue to add cores rather than GHz as a means of increasing speed - then OS and app writers will need to work a lot harder to make use of that. If they don't - then we're going nowhere.