I guess my answer, from everything else expressed in this thread, is that you should assume that multi-core performance in most apps won't meaningfully improve within the useful life of this computer. For that to happen, a significant number of normal Mac and Windows users will have buy computers with 6-8 core chips. Otherwise only those companies making (very expensive) software for niche industries like 3D animation/rendering will bother to do the optimizations required.
But you don't need much beyond 2-4 cores for office software, web browsing, etc. Not even really for games right now.
So look at the software that you actually use and make a decision to buy (or not) based on whether more cores will let you get your current or near-future work done faster than a computer with a higher clock speed would. Or if you need a better GPU for WORK, and not just playing games.
Your general point about multi-core performance is correct, but your conclusion that "optimizing" of software for multi-core is mostly based on the developer's "will" to do the work to make that happen is much more complicated.
Most software can't and won't be "optimized" for multi-core simply because present logic and the development tools don't work that way.
Data that can easily be broken into separate chunks, processed individually on separate cores, and recomposed at the end into a final result are primed for multi-core. Audio/video encoding was one of the first major areas to benefit from multi-core because you're transcoding from a "known" (i.e. the source file).
Where multi-cores wouldn't be helpful is recording a single track of audio... if one core can easily handling the recording of the one track in real-time, additional cores are of no help - they can't process the "future" of the recording... only the real-time. It's an obvious example, but that's the concept of why everything isn't just magically "optimized" for multi-core.
Most data needs to be processed sequentially. If you're doing a math equation: a + b = c + d = e... there's no way to multi-core that... in order to get "e", you have to compute "c+d", and in order to get "c", you have to computer "a+b" first... if you give one core "a+b", and a second core "c+d", then that second core is going to be
waiting for "c". There is no getting around this. This is a very simplified concept of the issue, but that's been 90% of software since the beginning.
In the larger scheme of things, in most cases, the above equation example is unavoidable. In other cases, the developer can re-write the software to avoid that "equation" altogether, and find a different way to arrive at "e" that doesn't involve needing to compute "c" first. But that's often much easier said than done.
TL;DR: In most cases, it's not the developers fault that software doesn't use multi-core... it's literally just the way the universe works.
