Nvidia's Grace CPU for servers? Likely only somewhat limited overlaps with that SoC.
Discover the key features and benefits of NVIDIA Grace CPU, the first data center CPU developed by NVIDIA. It has been built from the ground up to create the world’s first superchips.
developer.nvidia.com
First, you likely have that completely backwards in terms of PCI-e. Grace has
four x16 PCI-e v5 link bundles. Apple hasn't touched v5 and hasn't cracked the double digit number of lanes ( 4 Max , 8 Ultra ). I don't expect Apple to do better than two x16 v4 link bundles. The die edge space usage priority is likely on "more UltraFusion" links (to their own Apple stuff) than on "PCI-e links" to other vendors stuff. All of that MC bandwidth on Grace is delivered to the CPU cores. Apple's solution is likely only going to a bounded subset to the CPU cores. (in part because the CPU core count likely won't be anywhere close to Grace's).
There is some similarity in that Grace has 16 memory controllers with LPDDR stacks sitting off of each of them. Nvidia is provisioning ECC support (probably extra data the MC is sending to the LPDDR5 package. 1 channel the data. 1 channel the checksum. ) . I wouldn't hold my breadth that Apple is going to cover that. Apple is wider in that they will have four channels per MC and probably lower capacity (this round).
Apple has UltraFusion and Nvidia has NVLINK C2C there is some functional overlaps there. UltraFusion is way wider and high bandwidth , but also likely way shorter (i.e., only useful for intra package communication via 2.5/3D chip interposers) .
Same article.
" ....
..with lower latency. NVLink-C2C also requires just 1.3 picojoules/bit transferred, which is more than 5x the energy efficiency of PCIe Gen 5. ..
....
... and, finally, the bandwidth between the Hopper GPU and the Grace CPU is critical to maximizing the performance of the Grace Hopper Superchip. GPU-to-CPU memory reads and writes are expected to be 429 GB/s and 407 GB/s, respectively, representing more than 95% and more than 90% of the peak theoretical unidirectional transfer rates of NVLink-C2C. ..."
Nvidia is slower than Utlrafusion bandwidth, but it scales over multiple cards and nodes (via NVswitch ) . It the oppostite trade-off from the PCI-e case where Apple likely swaps more die edge space for more/wider UltraFusion. Nvidia has a narrower C2C , but can roll out enough PCI-e v5 bandwidth to deal with 100+ GbE network cards.
Apple's SoC is going to be a personal (single user) oriented SoC. Not a server SoC.
Some limited similarity there in that a Grace-Hopper superchip is probably not going to directly drive multiple monitors. ( remote virtualized screens. perhaps. But directly coupled point to point wires? no. )
MP 2010 - 2012
MP 2013
MP 2019
All were 'dead end' CPU sockets when the systems launched. That's isn't really a 'new' thing for Apple. [ 2012 they actually skipped a new socket that could have spanned a tick-tock cycle, but rolled out 'warmed over leftovers' in the old socket. ]
If the Apple GPU cores were the only ones on the die, that would be more significant.