I see no evidence that they block 3rd party NANDs? While Apple requires specific kinds of NANDs they don't make them, those are buyable from 3rd parties. The whole point of the other thread is how to buy and install your own even on soldered systems - the Mac Pro should theoretically be easier. However, even if that wasn't the case and Apple blocked anything other than Apple NANDs for their internal storage the rest of my argument still holds.Actually, perhaps I could try one more round , by simplifying things:
It seems if you took your argument (that if Apple offered upgradeable RAM on the MP, they would allow 3rd party options), and applied it to the upgradeable NAND on the MP, you would conclude Apple would allow 3rd party options there. Yet they don't. I assert that invalidates your argument.
Can you explain precisely why upgradeable RAM on the MP represents a qualitatively different business case from upgradeable NAND on the MP, such that Apple would allow the former when they don't allow the latter? That's what I've not been able to find in your arguments.
I apologize for not being more clear then: Because the upgradeable DRAM on the MP would be much more akin to PCIe storage (and in fact CXL it really would be over PCIe, it'd be PCIe DRAM). The unified memory is the built-in memory akin to the internal storage NAND.
All the arguments you made for why PCIe storage is "allowed" applies to CXL/DRAM (if you've got DRAM sticks and if those are on a PCIe board, people will have expectations) and on top of that it isn't the main RAM of the system, as we're positing that it's basically used by the OS as overflow/faster swap. Thus, if you want the high performance necessary to feed the CPU/GPU, you'd still have to buy a substantial UMA memory pool from Apple. This even stands in contrast with PCIe storage which is still internal to the device and can indeed basically replace buying internal storage from Apple. So if anything 3rd party CXL/DRAM is even less likely to affect Apple's bottom line selling expensive UMA LPDDR RAM than PCIe storage does for them selling expensive SSD storage.
Where I feel you err is thinking that the upgradeable CXL/DDR RAM would be a replacement for buying UMA RAM from Apple. It wouldn't be. It would be what you optionally buy on top of buying UMA RAM from Apple. If Apple were to go down this route and this is all hypothetical since we have no idea what their plans are, then Apple would still be selling built-in non-upgradeable UMA RAM. Thus the analogy is:
UMA RAM <---> internal storage
CXL/DRAM <---> PCIe storage
Therefore upgradeable CXL/DRAM would far more likely be something you could get 3rd party DRAM/PCIe cards for than not.
I don't think that would change things that much. Apple Silicon Macs already feel like NUMA systems, even if it's hidden from the user. For example, look at these memory latency measurements I've made:
Working set iMac (i9-10910, 128 GiB) MBP (M2 Max, 96 GiB) 1 GiB 94 ns 117 ns 2 GiB 96 ns 123 ns 4 GiB 108 ns 123 ns 8 GiB 134 ns 129 ns 16 GiB 164 ns 143 ns 32 GiB 182 ns 201 ns 64 GiB 191 ns 365 ns
The exact numbers don't matter, as they are noisy. However, there is a huge increase in latency on M2 Max when the working set increases from 16 GiB (below 1/4 capacity) to 32 GiB (above 1/4), and again from 32 GiB (below 1/2) to 64 GiB (above 1/2). I'd expect that the effect would be even more significant with Ultra.
I'm not sure Apple wants to add NUMA support into macOS without much financial benefit. I would think latency would be an order of magnitude worst or more with multi-socket CPU as synchronisation will have to be done by the OS.
Indeed, @JouniS while those latency numbers are really interesting as the RAM gets higher that's not really the pattern expected from non-NUMA aware multi-chip systems, especially since you know the Max is all one die. (In fact, even when Apple had multiple chips they never implemented NUMA.) In multi-chips systems, it's about the latency/bandwidth from core A on chip 0 to RAM stick 4 on chip 1 and the latency is generally another factor of X when that happens and that's beyond the latency of trying to access large data sets (i.e. you could multiply all those numbers by like 4 such that accessing large data sets on the opposite chip could be an order of magnitude more time consuming than accessing the 1GiB set on the same chip). Building in NUMA-awarness, which again macOS does not have, is meant to avoid that scenario as much as possible and keep data localized to the right processor.
EDIT: and the real concern is the GPU and other accelerators on the SOC, again you'd basically break UMA to add a second socket with an interconnect. Even though the Ultra is two dies with an interconnect and I rather suspect that you're right that on the Ultra latency from die 0 to 1 is worse than anything we see here, Ultrafusion is still fast enough *and high enough bandwidth*. Trying to do multi-socket with a GPU on board ... well the reason why multi-GPU gaming systems largely went by the wayside (even with special connectors like the original nvlink) and no one built multi-die GPUs before modern packaging techniques like Ultrafusion and the M1 Ultra. For those you've got to have a fat enough pipe between the chips. The only way to do that is to package them together. Building multi-socket SOCs with GPUs and accelerators, I'm not saying it's impossible but that would be a mess.
Last edited: