Interesting discussion here...
It is funny, when Macs first started out, you would have systems with a CPU (like the 68000, 68020 or 68030) and in some cases an FPU (like the 68881 or 68882). There was a lot of confusion when Motorola released the 68040 because the FPU was moved onto the same die with the CPU. And what was even more odd was the fact that they could turn off the FPU within the chip (which was the 68LC040).
One of the aspects of this change was the fact that software the required an FPU would warn you that it didn't see either the 68881 or 68882 co-processor when run on 68040 systems. Some software would work fine after the warning while other software wouldn't.
So, where did this whole dual core thing come from? It came from IBM.
IBM's POWER series was designed for their high end workstations and servers running Unix. By the end of the 90's they were selling workstations with 2, 4 or 8 processors (either the PowerPC 604e or the POWER3 series) to get the performance they wanted. And when they started designing their new 64 bit processor, they decided the put two processors on the same die within a single chip. This was the POWER4 chip.
Part of the key to this idea can actually be traced back to Cray, who realized that to design his supercomputers to be as fast as possible, he needed to shorten the distance the information had to travel. Many people thought that the round designs of Crays were for stylistic reasons, when infact it was to shorten the distance between components.
This same philosophy has been behind much of what we see in processor design today. By putting both the CPU and the FPU on the same die, the distance between them is nearly eliminated, making faster systems and taking less space and resources. By moving part of the memory onto the die (L1 cache) key instructions can be kept almost at the processor (rather than having to retrieve it from main memory). In the case of the later 604e series systems, L2 was given a faster dedicated bus and the G3 and G4 improved on that idea even further. Finally people moved L2 memory right into the chip itself (eliminating the need for a special bus between the L2 cache and the processor).
IBM took this whole idea to the next level with it's 400 series PowerPC processors which had things like a USB controller built into the chip itself... they were basically computers on a chip.
When IBM thought that they might have problems moving their 32 bit based clients (who were using 604e and POWER3 series based systems) to the POWER4 and beyond, they decided to build a transitional chip... based on a single core of the POWER4, but that could execute both 32 bit and 64 bit instructions. The original G5 was a modified single core version of the POWER4.
Back to the question of the thread, a lot of this has to do with how processes are executed on a system... a processor is, after all, the thing that executes the processes of applications.
In the days of the 9500MP, 9600MP and the Daystar multiprocessor systems, System 7 wasn't able to see more than one processor. Daystar wrote some software (which Apple licensed for using in System 7 and later) to allow applications to address the additional processor (or processors in the case of some Daystar systems that had up to four processors). But if apps didn't know that the processors were there or weren't written to take advantage of them, you only got the performance of a single processor system.
Mac OS X changed this... but not by much. Applications still need to be written to take advantage of multiple processors. If they aren't, then they will be executing all of their processes on a single processor. The advantage of Mac OS X is that the system can decide what app executes on what processor. An app that uses a lot of a CPU may get a processor to itself while other (less intensive) apps use another one.
Oddly enough, Adobe's Premiere 6.5 for Mac OS X was only able to use a single processor where as Final Cut Pro could use more than one. This was (in my opinion) Adobe's way of crippling the app to sell their PC Preferred campaign from back then.
Where am I going with all this? Basically, software hasn't changed.
If you have an application that can only execute on one processor, you are effectively no better off with two cores than you would be with one... because the cores are processors. And Mac OS X will allocate both cores in the same way that it would if they were two physically separate chips.
So where is the advantage? We're back to the distance issue... and by having all the processors in a single chip along with large amounts of cache all communicating at the same clock speed as the processors themselves makes multi-processor aware apps significantly faster.
Remember, the primary limitation in computing is the speed of light. There is nothing you can do to go faster than the speed of light, so the obvious solution (as Cray pointed out years ago) is to shorten the distance.
Multi-core chips are multiple processors connected with an incredibly fast bus with almost no distance between them. But from a software perspective, each core is just a processor.