I understand what you are saying, this actually has been a somewhat significant problem.
Check out this picture of the AMD 390X. Look how many memory modules are around that card.
For comparison, here is the D700. Basically with GDDR5 if you want to up the memory bandwidth you have to add memory modules. Its not impossible though, as
Nvidia gets away with a reasonable amount of memory modules to go on the Titan X with 12 GB of VRAM.
Right, at this point the way things have played-out is that (for various reasons), the preferred design is to surround the GPU with a 1-deep "ring" of memory chips, finding a reasonable compromise between having them be closer-in (better for the electronics/wiring) and further-out (makes thermal management easier).
But that's "in hindsight". If you had to guess the designs of today from, let's say, 5 years ago, that looks like the probable outcome, but it's not entirely guaranteed; back then, you could've at least made an argument that the "right" tradeoff would be to have more chips -- maybe a "2-deep ring" -- essentially on the theory the right trade-off would have been to aim for "slower but wider" (and with more on-board memory).
Thus even though today it's obvious most of the board is basically "dead space", it'd be hard to be confident that's how it'll play out as long as video memory remains off-package.
Once you *believe* it's all going on-package in the near future, you can have a much higher confidence level that you won't really *need* most of that board area anymore, which would then make an nMP-esque design look like a lot less of a compromise.
However, I think the biggest constraint on GPUs in the mac pro is power/thermals and not PCB area. Reduced sized GPUs have existed for a long time in MXM format so the only thing different here is that Apple used their own design. Of course, it did restrict them from using obscene amounts of VRAM but Apple was never going to do that anyways. For instance, Apple did not use AMD's Hawaii GPU not because it wouldn't fit in the mac pro, but because it was too hot to get much benefit over Tahiti. (However, now looking at that picture it may because they couldn't get 8 GB of memory in there too).
Again, I don't know how to say it any more simply than this: to adopt the nMP design you have to be able to make a colorable case it's not going to force too much compromise (on the metrics of interest), and so PCB size isn't a direct issue; it's only an issue if there's functionality that fits on a full-size board but not on the scaled-down custom boards (thus MXM isn't relevant either way: it's not "can you buy a small board" it's "do we need a big board to fit all the things we'll want?"). For HBM-based systems, no worries; for GDDR, you might have capacity constraints (especially if the slow-and-wide option was what the market had gone with...).
But, yes: the biggest miscalculation is the power envelope on the nMP. That higher-end--but still non-niche--GPUs have stayed at such high power levels was very unexpected and has permanently cramped the nMP's potential.
The expectation was you could fit a top-end GPU into it--HBM eliminates any risk of an nMP case being "too small"--and in a worst-case scenario down-clock it *a little bit* to, say, shave of 20-40% of the power draw in exchange for a 10-20% performance drop.
With higher-end GPUs stabilizing at current power draws, this trade-off just isn't there, with obvious consequences.
Eh, HBM1's biggest failing is that it only appeared on a graphics card that appeals to a very small niche. Fury is only a high end gaming card, and even 4 GB of VRAM is not that much for a card that sells for >=$500. I think they designed it as a competitor to the Nvidia GTX 980 and then Fury got delayed and trumped by the 980 Ti. HBM1 is not "dead-end tech", its just the first generation of new tech. It will be interesting to see if Fiji and HBM1 live on in AMD's next graphics lineup because of how big and expensive it is.
Again, as a technology HBM1 is fine; it's only "dead end tech" if you look at it as the product of "ecosystem", but in that broader context it's a dead man walking for sure.