When I asked him about the Apple Silicon Mac Pro
he replied with the following:
... It’s possible that apple allows slotted ram and puts its own gpu on a separate die, sure. But if it does that it will still be a shared memory architecture. ..... An independent GPU is more likely; the technical issues with that are not very big, but the economics don’t make much sense given apple’s strategy of leveraging its silicon across all products. Still, I’d give that a 33 percent chance. And it wouldn’t be a plug in card or anything - just a separate GPU die in the package using something like fusion interconnect ...
So, he believes a 1% chance of DIMMs, and 33% chance of discrete GPU but still on-package, just not integrated in the SoC.
To be a discrete GPU(dGPU) it would have to meet two criteria. First, be a separate die. Second, it would have to have its own independent memory. If don't have both then it isn't really discrete in any substantive sense. Might have corner case if on a physically separate 'board' that you could pull from the main logicboard.
Only the die criteria fits there. Apple having a "GPU cores only" die but all of the other aspects of the two-die Ultra M1 package would still be an intergrated GPU (iGPU). Two , three, four dies inside of the same package is just a functional manufacturing disaggregation mechanism. (can be cheaper to make several smaller chips and 'glue' them together. ), but that doesn't have to have any impact on the integrated/discrete status of the RAM. it is all 100% shared memory then it is integrated.
Replying to my followup question about the GPU being third-party or Apple designed,
his response was:
... Given how parallelizable GPU stuff is, it’s quite possible that they simply put together a die that is just made up of a ton of the same GPU cores they have on their SoCs. You could imagine that, for modular high end machines, instead of partitioning die like: [CPU cores+GPU cores][CPU cores+GPU cores]… it may make more economic sense to do [CPU cores][CPU cores]…[GPU cores][GPU cores]…. (Or, even, [CPU cores+GPU cores][CPU cores+GPU cores]…[GPU cores]…
As far as the economics are concerned:
What that is covering is a possibly more economical way of making a very large iGPU; not a discrete one at all.
If Apple built two "CPU" focused chiplets ( 10 and 20 cores) and two GPU focused chiplets ( 32 , 64 ) cores they could mix-and-match a wider set of products.
(C -- CPU die , G -- GPU die )
10 C + 32 G
10 C + 64 G
10 C + 10C + 10C + 32G
10 C + 32 G + 32G + 32G
20 C + 32 G
20 C + 20C + 10C + 32G
20C + 64G
20C + 64G + 64G + 64G
20C + 10C + 32G + 32G
Instead of having to make 9 different die masks and producing just the right number dies in each category, they could creating just four dies and combine them in different packages from a shared pool of constructed dies. ( one package starts to outsell another ... just make more of the other from pool of dies have stockpiled. )
All of that stuff would be packaged up as dual or quad dies in a single SoC. If all of these dies are mounted inside the SoC package then nothing is 'slotted' in . Especially if trying to hit industry leading Pref/Watt. The "modularlity" is in the package construction context not in the end user deployment sense.
If they hang the I/O ports off a different die than the CPU than don't have to get redundant elements ( multiple secure enclaves , multiple SSD controllers , 4x1 PCI-e v4 complexes , etc. ). That is where might get some end usage modularity. If Apple bumped up the common I/O subset to 2 -4 x8/x16 PCI-e v4 complexes to provision out far more generally useful lanes .
So, according to this veteran CPU architect, if Apple does include a GPU alongside the SoC, it's going to be their own design,
When he says "a separate die
inside the package" that isn't "alongside" at all in any significant sense. You are tagging disaggregation with disintegration. Those are two different things.
not AMD or Nvidia, won't be available with add-on boards, and Apple will only implement it if they can leverage it in multiple products.
There is not a lot of good evidence for an Apple "add-on" board either. There is lots of evidence Apple likes their own GPU more. And they are spending tons of effort. deploying new tools, and developer educational efforts to push folks to optimize for iGPUs. All the inertia of Apple GPU is as an iGPU. (the optimizations are about leveraging the shared and tile memory). The tile memory specific is the only subs area where things get somewhat akin to discrete GPU optimizations (as that GPU local area of memory is mostly separate, but also more of a cache than large independent working pool. Not reaslly shared across all the GPU cores either. ) .
Most of Apple's upper 'half' desktop lineup isn't going to take the very top end of what AMD or Nvidia are going to offer going forward either. There is no room in a Studio. Pretty good chance no room in a "iMac Pro" even if they return with one in a 'slimmed down' chassis in a year or two.
The problem at the moment is that they are dependent upon the upper end laptops. Doubling a laptop optimized Max class die happens to work OK for an Ultra inside only a Mac Studio. But if Apple is going to get to a broader upper half desktop line up they that is modularly configurable for a wider range than what the laptop die is targeted at. However, it still would have to cover much more than just the Mac Pro. The Mac Studio would have to be inside the scope to get enough volume. Perhaps some other desktops.
If they can make the disaggregated dies and packaging very highly Pref/Watt effective than perhaps can take out the monolithic "Max" class die also. that would boost the chiplet volume produced even more. That would be a more stable economic foundation over the long term. But it isn't going to create Threadripper or Xeon W 3x000 'killer' SoCs. Or some deep commitment to AMD dGPUs for GUI workloads.
Where Apple's approach has bigger issues is compute workloads that can be more easily distributed over multiple GPGPU cards. Not just a single GPU workload, but to multiple ones that partition and farm out work to that scales in closer to "embarrassingly parallel" fashion. Apple's GPU can't scale past a package. Apple can make a very big package with TSMC lastest CoWos-LSI packaging , but it is going to remain limited.
2-3 very high end AMD/Nvidia GPUs lashed together with InfinityFabric/NVLink . No good indication that Apple's UltraFusion links can scale like IF/NVLink can. Metal is also somewhat a problem too because tends to interwine GUI with "Compute". ( deprecating OpenCL is a limitation. No portable , compute focused API is a problem. )