It's likely that Apple has already done what would make you happy.
I'd like to be proven wrong by Apple. I would be happy if the new Mac Pro has some compelling method to facilitate all 7 tFlops to whatever usage is required and even manages to shift the load seamless between OpenCL and OpenGL calls.
A single FirePro W9000 yields 3.994 teraflops of computing power. 2 x 3.994 = 7.988 or just shy of 8 teraflops. However, Apple's add says the yield in the nMP is 7 teraflops of computing power. Thus, about one teraflop of computing power is lost when one (? or both) of the cards is tasked with OpenGL duties also. In other words, the amount of peak computing performance the video card manufactures give is measured by the card's performance when it is only tasked with that function and what a computer system manufacturer/integrator, like Apple, gives is peak compute performance that may have been adjusted because of a particular use case, e.g., no additional card dedicated to OpenGL only. Thus, if you use a third (TB connected) video card for display, you might even get close to 8 teraflops on the two cards installed in the nMP. But I'd still prefer a trio of 7990s for 23.5+ teraflops.
Whereas the FirePro is exhibiting OpenCL prowess, the same type of phenomenon, i.e., a diminution of OpenCL performance because CUDA is using the resources, occurs with CUDA. Here's what happens when running Nvidia CUDA cards in Octane Render and what Otoy, the Octane render seller, states:
"Single PCI-E Slot
If the computer has a single PCI-E slot, the upgrade options are fewer. One could simply add a more powerful GPU as long as the power supply can provide enough power for the new GPU. Dual GPU, single slot card solutions like the GTX 590 or GTX 690 may also be used in this situation, again assuming that the power supply is sufficient to power the video card.
A second option is to use an external expansion box which contains multiple GPUs. This allows the use of multiple GPU's with a computer that only has a single PCI-E slot. For the smoothest user experience with OctaneRenderTM, it is recommended to dedicate one GPU for the display and OS to avoid slow and jerky interaction and navigation. The dedicated video card could be a cheap, low powered card since it will not be used for rendering and it should be unticked (off) in CUDA devices in the Device Manager/Preferences.
Two PCI-E Slot Motherboard
If the computer has two PCI-E slots the user is presented with many additional upgrade options. One option is to install a second graphics card along with the currently installed GPU. If the existing GPU is slow, it can be used to power the display only and the second card can be dedicated to OctaneRenderTM. This will allow the OS to be smooth and the computer will still be responsive while the second GPU is tasked with rendering. Another option would be to add an additional GPU to assist in rendering. In this situation, it is best to have both GPU's match in model and ram content. This allows multi-GPU rendering but the OS interface may still be slowed as all the GPU processing power is dedicated to the rendering process." [
http://render.otoy.com/downloads/OctaneRenderUserManual_1_20.pdf ]
So, in Octane render, the bias is towards rendering so interface interactivity may suffer is all cards are selected in preferences to render. The nMP appears to have the opposite bias, where OpenGL performance is favored, given the video display bias it appears that Apple has.
But generally marketing numbers related to GPU performance like to brag about peak performance and only seldom talk about sustained performance. As Photovore pointed out in his comment, as soon as you split workload on several OpenCL devices, memory transfer and synchronization limit your peak performance. You'll get only close to peak performance if your task consists of completely independent small work units which are trivial to parallelize. Unfortunately, in most cases this is not applicable.
In CUDA, at least when using Octane render, each additional Nvidia card results in a virtually linear increase in performance if you dedicate a particular GPU for display and disable it from having any CUDA chores or if you use an ATI card for interactivity. But otherwise, I agree with point that marketing favors pitching peak over sustained performance.