There is a Xeon redesign coming in a couple years with a new serial front side bus more similar to the HyperTransport bus used by AMD than the current one. This should help alleviate some of the bandwidth issues.
But as we go to massively multicore systems (Intel has in the roadmap a 16-core with hyperthreading chip for late 2008 already,) we'll have to completely re-think how the processor talks to memory. I mean, four cores on one chip already can end up starving the cores for memory bandwidth, even at 1.33 GHz/64-bit. It will only get worse with more cores. Even if Intel bumps their current FSB to 2 GHz, it won't be enough.
That is what the real purpose of Intel's "80-core" chip was. Not to make an ultra-high-core chip, but to investigate new ways of connecting them. This chip had 80 processing cores, but more importantly, each core had its own connection 'upward' out of the chip. This could allow for a second layer on top containing memory. We might just stack memory right on top of the processor.
For reference, the theoretical maximum bandwidth from each socket to the northbridge is 10.6 GB/s. (Combined 21.3 GB/s for both chip sockets.) By comparison, the maximum theoretical bandwidth from the memory subsystem to the northbridge is 21.3 GB/s. However, this can only be achieved with memory in all four channels, and is best achieved with only ONE module in each of the four channels. And, this is only 'theoretical' maximum. The serial nature of Fully Buffered memory means that actual bandwidth can be noticeably lower. Testing on other sites has shown that quad-channel FB-DIMMs at 667 MHz (theoretical 21.3 GB/s,) can barely outperform dual-channel conventional DDR-2 800 MHz memory (theoretical max 12.8 GB/s.) And conventional DDR-2 at 1066 MHz (theoretical max 17 GB/s,) seems to reliably outperform FB-DIMMs.