nMP is a small form factor WS and tipically a 1S machine, although with some 2600 CPUs for the higher core count. Still, you surely don't expect to see a nMP with Omni Path, 6ch mem and all that stuff, right?
So, what happens with the nMP by that time? Dead? Or nMP will be a low end WS and finally a full blown WS/Server is created with Purley? That would be the way to go, but I don't see it.
"low end WS" is a bit overblown. It doesn't "die" or go "Dead". On just x86 core count, Xeon E5 1600 doesn't mean you are stuck in time.
Xeon E5 1600 v2 nominally toped out a 6 cores. ( yes there was a E5 1680 v2 at 8 but really was priced just like the tweaked 10 core 2600 that it really was ). Base entry count was 4
http://ark.intel.com/products/series/75771/Intel-Xeon-Processor-E5-1600-v2-Product-Family
Xeon E5 1600 v3 nominally toped out at 8 cores and the base count still 4.
http://ark.intel.com/products/series/81064/Intel-Xeon-Processor-E5-1600-v3-Product-Family
Xeon E5 1600 v4 may or may not get 10. If stay at 8 then mostly will get clock bumps. I wouldn't be surprised if the base entry count is still 4 (only with non-kneecapped clock rates so more reasonable)
Xeon E5 1600 v5 at least probably can get to 10. The entry base count probably moves to 6.
It is still on a increasing core count path... just not shooting for the maximum number. If have a mix of legacy serial code and high counts to run then clock is about as important as core count. Intel can deliver that better in a independent single socket line up.
However, very high core counts........ have to get off of the only x86 cores count path to a short-sighted future. A more robust OpenCL implementation and really not stuck with just x86 cores anymore. If have a workstation that is oriented toward graphics work then the GPGPUs are probably going to play an increasing computational role over time. If memory access is evened out what solely specific to a x86 do you actually need no large graphically oriented data arrays? Audio data isn't necessarily excluded either if need to do some embarrassingly parallel work on it..
Do you think Intel will provide Apple with earlier v5 Xeons?
No.
Maybe the 48 lanes of PCIe 3 will como to 1600 v5 but by then you'll need even more. The PCH can even be discarded and the CPU will be monstrous.
You get more than the 48 because the PCH/chipset will have more. Tbe PCH is absolutely
not going to get discarded any time soon in desktop systems. The ethernet , WiFI, Bluetooth, audio aren't going anywhere and that is where they attach. It would be more sensible if the PCI-e SSDs nominally attached there along with any SATA high capacity storage ( Mac Pro may have dumped SATA but doubtful the majority of the 1S WS market is going that route ).
P.S. I think there may be some Purely "WS" but they probably will be the smaller subset of folks. Some will have Xeon Phi to crank up their x86 fixated binaries and others will to ever expansive RAM caches for larger data sets.
Of course Apple doesn't need the whole Intel PCH, most of it is unused anyway. But will Intel make a special cut down version for only a "few" units for Apple's nMP? Very much doubt it.
Server blades need 12 on board SATA lanes why? They need 12 USB sockets why? It is farce that just Apple under uses chipset capabilities. There are 1S servers too that don't need UPI and OmniPath or lots of legacy I/O.
The current C612 chipset is more skewed for 2S set ups than 1S ones. If drop the 2S system then the set is going to get more streamlined. [ built in RAID makes SATA HDDs go faster, but top end SSDs don't really need it if the capacity levels are covering mere TBs not 10's of TB or higher. ]
The problem with PCIe 3 in the PCH is that of the bottleneck even with DMI3 in Lewisburg. Helps, but makes no miracles.
Don't really need a miracle is just add a few more on the CPU package. It is up in the air whether Apple will provision having two internal SSDs. The sorely missing piece on the C612 is just 4x for a current top of the line x4 PCI-e v3 SSD. Minimally that is all Apple needs. [ Could use x8 v3 -> x4 v3 , x4 v3 switch and a x4 controller in a x4 allocation to do 3 TB v3 with two GPUs if bumped up the CPU allocation slightly. ]