As the video shows - even a single Mac Studio is a decent machine for local LLM because of the advantage of having a relatively large amount of Unified RAM directly shared between CPU/GPU/NPU vs. limited VRAM (which has to be loaded via main RAM) on a dGPU card.
An Apple Silicon Mac Pro with a competitive level of bandwidth+lanes for PCIe based GPUs or NPUs isn't possible (without Apple designing a new SoC die just for the Mac Pro) and doesn't exist... and, even if it did, would be reliant on the same NVIDIA or AMD dGPUs that can be used in any generic Xeon/Ryzen box and lack the unified RAM advantage which lets a Mx processor with integrated GPU punch above its weight.
The Mac Pro may look like it has plenty of PCIe slots, but the M2/M3 Ultra only has 32 lanes of PCIe4 of which only 22 are available for the PCIe slots (with various constraints on how they can be allocated to slots). Current Xeons and Threadrippers have 128+ lanes of PCIe5. AFAIK many LLM tools run in Linux just as well as MacOS/Unix, and CPU power consumption is irrelevant on a personal workstation stuffed with NVIDIA space heaters, so if you want a GPU-based LLM platform a Xeon or Threadripper is probably the tool for the job.
Want a Mac Pro cluster? - you'd have to use Thunderbolt, same as the Studio.