I read this as each of the dGPU will have their own kernel processes running, but each of those processes can potentially read data off other dGPU's VRAMs. If this is the case, probably have to code the dGPU kernels to understand this fact and make the GPU codes more complicated.However, the GPUs in the DGX are connected by NVLink, so I thought that was answered (in the negative) by your quote from the NVIDIA developer, who said "NVLink provides a fast interconnect between GPUs, but does not aggregate those GPUs into a single logical device."
Apple's UMA with UltraFusion makes this a lot more elegant IMHO.