Just being a killjoy. I thought Apple examined a 4 chip M1 Hydra and determined it was too expensive to produce and the market at the necessary price point would be too small to be equitable. Yes there are relevant Apple patents but Apple (like most high technology companies) have loads of patents that have never led to an actual product.
So what has so drastically chaged that now make an 4 [large] die M4 Hydra any closer to now being a real product?
I thought the whole point of chiplets was to use many multiple medium sized dies to get away from the dependence of large dies for high performance chips
Edited: spelling
I consider it unlikely that Apple ever entertained business thoughts of an M1 Extreme, for technical reasons.
The M1 Ultra was impressive in many ways, but also disappointing in its scaling. This was surely expected *within* Apple; it was a learning step.
One aspect of scaling that had to be fixed immediately (and probably was fixed to a substantial extent with M2) was GPU scheduling - you want to schedule kernels that will use common data on a common GPU so that (as far as possible) data is not sloshing back and forth between the two L2 SRAM blocks of the two Max's.
There's a similar sort of concern for ANE scheduling. Some of this scheduling is done by the OS (at a very high level) or by the GPU or ANE ARM companion core, but the companion core needs access to an on-going stream of telemetry to make optimal decisions (along with, perhaps, augmented data structures in the GPU or ANE to hold tokens representing those decisions).
So point is, GPU and ANE needed better scheduling to really work well in an Ultra-style design, and that scheduling required hardware assistance.
On the CPU (and entire SoC) side, the cache coherence protocol also needed to be made more powerful, so that less overhead is spent simply keeping various caches informed about what other caches are doing. This protocol was designed and patented a few years ago but may not be implemented yet. (Cache protocols are HARD. I could believe something like an initial version was put on the M2 to test in the M2 Ultra, various edge cases were found, and elements of the design had to be refined).
So point is, technically I don't think anyone in Apple imagined that scaling M1 Max 4 ways was worth doing. More likely the expectation was "We learn what we can from the patches applied to M2 Ultra, and as soon as we have those elements working, we level up to an Extreme and see what the issues in that design are". So, again technically, nothing about business decisions, I could see it as plausible that an M4 is ready for an extreme design; maybe with internal testing to see if an octa-SoC design actually works.
The other thing that starts to kick in is that the obvious layout of an Extreme is a pretty hefty block of silicon, I think I calculated about 3x3 inches (so larger than two hands side by side). You start running into questions of how the geometry is best laid out, where the connections, do you use double-side DRAM (eg ranks, as we discussed a few weeks ago). Again the time seems probably right to deal with these questions in the M4 generation?
But for an Octa maybe other ideas spring into play. Could you stack two of these things with aggressive cooling?
On the business side, I don't think we can guess.
nVidia sell DGX's, in various sizes. If LLM upscaling doesn't all end in tears soon, there may be a market for similar Apple products (at similar prices). I've seen some ML researchers praising the Studio Ultra as a surprisingly nice training machine for a budget, and if word of that spreads, and people want to bump up their existing Metal/CoreML code to an Extreme with say a $15K budget, or even an Octa at say a $40K budget???
And of course how many of these things could Apple use internally if they want to build OpenAI-sized training centers and Meta-sized data warehouses?
Apple PCC (Private Compute Cloud) doesn't have the words AI or ML in it...
There's no obvious reason Apple doesn't expand this functionality, as I've said before, to any sort of use cases (by developer or user) where it makes sense to shunt some large computation into the cloud for a few seconds or a few minutes.