If Apple are doing what I think they're doing then I'm incredibly excited.
- Earlier in this thread I posted a WWDC video where Apple daisy-chained together 4 studios with thunderbolt to perform machine learning.
- Apple's Swift language is building up towards version 6 which is aiming to be a radically thread-safe language, and they've also just come out with a feature called Swift Distributed Actors which makes it "easy" to run code across distributed systems.
- There's also the fact that Raspberry Pi has something that is also called a "Compute Module" which is a tiny standalone ARM SoC that plugs into a main board.
- Also the fact that Apple seemingly cancelled the 2xUltra variant makes me think that the future of the Mac Pro is distributed.
1. The "four studio via Thunderbolt" is primarily leveraging an existing TBv2+ feature of point-to-point 10GbE for 'free' over TB. The major 'feature' there is cost savings via no 10GbE switch to manage the connectivity. As far as 'distribute over Ethernet (virtual one or not)' that would work just as well with a 10GbE (or better) switch as without one.
2. Swift Distributed Actors
"... This abstraction does not intend to completely hide away the fact that distributed calls are crossing the network, though. In a way, we are doing the opposite and programming assuming that calls
may be remote. This small yet crucial observation allows us to build systems primarily intended for distribution and testable in local test clusters that may even efficiently simulate various error scenarios. ..."
We’re thrilled to announce a new open-source package for the Swift on Server ecosystem, Swift Distributed Actors, a complete server-oriented cluster library for the upcoming distributed actor language feature!
www.swift.org
I does not hide the distribution (meaning need code changes). There are lower hurdles if using the concurrency model of Swift already, but need to presume in the application being written that all concurrency calls are remote ( and if just happen to be local , then that is the 'transparent' part. ). This is substantially different in spinning this makes assumed local calls transparent distributed. Applications would have to be changed around this.
At the top of the blog post there is also
" ... a complete server-oriented cluster library for the upcoming distributed actor language feature! ... "
Apple has been pushing Swift as viable non-macOS (i.e., Linux) , server backend language tool. There isn't much macOS there. Apple has killed their "macOS Server" product. Apple's Cloud services largely run on Linux ; not macOS. If Apple is 'eating their own dog food' here, it is far more likely on Linux than on macOS.
A server and a single user , high GUI workstation are two different markets.
3. The Raspberry Pi 4 compute module does not plug into the main regular Pi board.
The plain compute module is a board stripped of some ports and some embedded/industrial features added.
https://www.makeuseof.com/raspberry-pi-4-vs-raspberry-pi-compute-module-4-key-differences/
The 4S Compute Module looks like a DIMM but is electrically not a DIMM. ( main Pi 4 board doesn't have a DIMM slot either).
Raspberry Pi Compute Module 4S is a SODIMM with the same processor as the Raspberry Pi 4 and CM4
liliputing.com
[ Note: the I/O ports being provisioned over these DIMM-like connections are no where near the bandwidth levels a M-series would need to hit. ]
4. The rumor is that Apple cancelled the 4 die solution relatively very late in the process. Apple is doing a major pivot isn't particularly likely. They'll just go without that one , of two , SoC options they had planned.
Stuffing 2-4 independent Mac instances onto cards and inside of a Mac Pro really isn't going to completely cover the same space that a much larger SoC would with the same applications. The applications that presume one , unified address space are not really built to be distributed.
To me all these things point to the Mac Pro being a cluster of Apple Silicon SoCs working in parallel like the nVidia DGX A100.
Not like a DGX at all. DGX has very elaborate mechanisms for doing NUMA shared memory. There is nothing in the above that really highlights that at all. iOS as a key/critical foundational element certainly does
not .
If Apple was doing a Mac M.o.C. ( Mac on a Card) there is some synergies with their support of the 'rent a Mac in the Cloud for at least a day" marketplace. If put 3-4 Macs on a 75W PCI-e card and then used the host CPU/GPU/etc infrastructure to serve as the core file/compute/etc serve for a group of folks of an individual business that would have some synegiers. A rack version of the Mac Pro is around 18" wide. Say 16" and diviide by 2" means could get 8 Mini / MiniPro in roughtly the same rack space. That is more dense CPU compute than a "cluster inside the box of M2/M2 Pro" but it is much neater and if layer virtual Ethernet over the PCI-e switch backplane ... possibly cheaper also (since skipping 10GbE (or better) switch.
Some other folks could also use it as a "cluster inside the box" in a tower/deskside form. And when M3 Pro , M4 Pro , etc comes out can upgrade nodes in the cluster and keep the same central chassis.
Apple would still need a GPGPU compute accelrator for other workloads though (e.g., MI210 , MI310 ). "Macs on a Card" still doesn't get you ECC for data integrity sensitive HPC workloads. And the ALU density isn't going to keep up either.
I think contrary to the pessimism of this thread the 8.1 might end up being an absolute monster (for certain workloads)
If they are doing a "Mac MoC" on a regular 75W add-in-card it wouldn't necessarily be limited to the Mac Pro. Could put the cluster module in a Thunderbolt PCI-e card enclosure and get still get around a virtual 10GbE connection back to the host MBA , Mini , or Studio.
Doing something that is hyper coupled to a very minor variant of the MPX slot would mean would limit sales just to the Mac Pro. If going to do some kind of compute module it probably would work better to sell more rather than less. It could be a card inside of a Windows back if just connecting back to the host via a virtual Ethernet-over-PCI-e connection.