Thank you for the excellent clarification.Technically, you can make all access "far" for every one of the chiplets; then it is uniformly slower.
It could probably be claimed that no memory is 100% uniform - if you have two DIMMs in a bank the second DIMM would be slower due to light speed delays because the motherboard traces are longer.
So the real issue is the difference in latency/bandwidth between "near" and "far" memory. If the difference is small, then NUMA causes no real problems.
It would be interesting to know what the near/far differences are on these various designs, and how they compare to the 64ns/105ns difference on the Threadripper.