I know you've talked about horizontal scaling before but I just don't buy it. The Mac Pro isn't a server. Any Apple Silicon Mac Pro will be used to run software similar to what your Max or Ultra chip will run but users expect it to run much faster or run much bigger models. If you scale horizontally, software won't run faster unless they're rewritten from the ground up to take advantage of horizontal scaling - which very few software can make use of.
There is very little financial incentive for Apple to rearchitect most of macOS and rewrite most of their software to use horizontal scaling just for a niche machine that is already dying in relevance.
Hence, I'm still on the side of 4x Max dies glued together for a Mac Pro chip. This chip won't be cheap in R&D cost and it won't be cheap to manufacture. This is why I'm speculating that Apple will create an Apple Silicon Cloud to expand the market of any "Extreme" chip.
Between these options: cancel the Mac Pro, use only the Ultra, glue 4x max together, and horizontally scale Ultra chips, I think the horizontal scaling is the least likely option.
Gluing together four dies is also horizontal scaling, it’s just more software-friendly because you can have faster die-to-die communication. The largest issue with separate compute boards is the high synchronization overhead. But if this can be solved (e.g. with a proprietary low-latency high-bandwidth bridge), then the distinction between the two approaches disappear. But building such a bridge is far from trivial, if it’s even possible.
Vertical scaling with large SoCs sounds like the most reasonable approach to me.