…for LLMs… you can put up to 512GB of unified memory in the Mac Studio Ultra and destroy the Nvidia 5090. Watched a few YT comparison videos. Even when running LLMs locally that are small enough to work on the 5090, it got crushed by the M3 Ultra. Anything that requires VRAM gets crushed except video games. In addition, the power draw was almost comically different. Spend at least 4x as much on the power to run the 5090 when running but in idle mode, spend even more than the M3 Ultra. Almost everything was faster for those two purposes as well as advanced maths and anything science or work would be done on would use a Mac Studio…
Not intended to nitpick but I think the Youtuber was probably running some specific workloads that favor Apple Silicon.
For large models and to some extent fine tuning (when linking multiple Mac Studios) Apple Silicon is in a league of its own because it can fit everything into memory but for inference speed they aren’t, they need much better matrix hardware on the GPU side which hopefully we’ll get with M5.