Anyone who haven't read the recent rumor from Bloomberg, Apple is preparing new chips for pro lineups.
Later in 2021, for higher-end desktop
- 32+? core CPU
- 64, 128 core GPU, ‘several times faster than the current graphics modules Apple uses from Nvidia and AMD in its Intel-powered hardware
Now there are several questions that need to be answered.
Will it be an SoC?
Probably. However, I suspect that will will not see the combination of "max" big CPU cores paired up with "max" GPU core count at all.
128 GPU cores more so a MBP 16" or iMac 27" thing. Unless Apple has chased away all of the 3rd party GPU options, then a count that large won't come integrated to the Mac Pro.
The current M1 is about 120mm^2. if about 45% of that is 4 big , System Level Cache (SLC) , 8 GPU and RAM controllers then that subset is about 54mm^2.
5 * 54mm^2 is 270mm^2. 270 + 120 = 390mm^2. That would be tractable in 12 months or so at 5nm.
6 * 54mm^2 is 324mm^2 324 + 120 = 444mm^2 I suspect that is probably more in the zone where Apple would 'quit' with a monolithic die.
If they start to punt on more GPU cores once get to the 32 GPUs core mark then can double the number of big CPU cores in that 54mm^2 block. For example.
chunk1 : 4 big , 8 GPU , more SLC , another set of RAM controllers.
chunk2 : 4 big , 8 GPU , more SLC , another set of RAM controllers.
chunk3 : 4 big , 8 GPU , more SLC , another set of RAM controllers.
chunk4 : 8 big , more SLC , another set of RAM controllers.
chunk5 : 8 big , more SLC , another set of RAM controllers
coupled to the baseline's 4 , 8 and the rest, that nets to 32 big , 32 GPU , 6 SLC blocks. 6 memory controllers.
For the > 32 GPU SoC I think Apple will make the opposite trade off. Put a cap on big CPU cores and soak that CPU core allocate up with doubling up on GPU core blocks. So if the big CPU core count caps at 12 then could get the following
chunk1 : 4 big , 8 GPU , more SLC , another set of RAM controllers
chunk2 : 4 big , 8 GPU , more SLC , another set of RAM controllers
chunk3 : 16 GPU , more SLC , another set of RAM controllers
chunk4 : 16 GPU , more SLC , another set of RAM controllers
chunk5 : 8 GPU
coupled to the baseline's 4, 8 and the rest , that nets to 12 big (+ 4 small) , 64 GPU , 5 memory controllers.
For the 128 GPU cores... I doubt that would come in a SoC package. At least not at 5nm.
32 CPU and 128 GPU cores into one large "mega" package would probably force them into chiplets. But also don't see the point of that unless totally wrapped up in Apple only GPUs that are only iGPUs dogma future. Apple's lack of SMT and extra super wide instruction dispatch is highly leveraged on bigger and deeper caches. If go to chiplets and one unified System Level cache that will introduce substantive latency. That in turn will put a drag on the performance that the more unified, monolithic implementations achieve. I suspect Apple is going to try to avoid that. Even with monolithic as the SLC cache gets much bigger keeping uniform, extra low latency access is going to be tricky. (more snooping, farther distances , more chatter/traffic. etc. )
Additionally, for 32-128 GPU cores at some point it probably not going to make sense to couple them to LPDDR4 ( or LPDDR5). LPDDR5 is better than previous generations of GDDR implementations but HBM and GDDR6X (or better). Not really. Unless keeping those GPUs cores at relatively low clocks the bandwidth contention is going to get quite high once put multiple concurrent workloads on different subsections.
Apple is out to buy nobody's iGPUs for the Macs with only those.
Apple is probably out to cut the number of dGPUs they buy way down ( minimally remove from MBP 16" and iMac 21-24" and possibility also from iMac 27" ). That actually gets them into the slippery slope zone with 3rd parties. The sales of iMac Pro and Mac Pro class systems are relatively small. ( probably in the sub 100K run rate per year zone). If they bring back eGPU and those sales is there something big enough to keep a 3rd party interested in doing driver and software support work or not? Weave in some BTO iMac 27" and then it is probably not an issue. ( not super happy but decent enough market for AMD to put the time in).
If Apple is killing off 3rd party GPU kernel access and 3rd party options they 64 and 128 may be a dGPU deviation off the more tightly coupled Unified Memory iGPU driver model. Once operating as a dGPU it doesn't have to be a "chiplet'. They Can dump all the SoC baseline logic/fixed function cores and CPU cores. That's is pretty good chance of both Apple making a grab but also having to backfill where vendors all just 'quit' also (Apple also has to fill the role because have painted themselves into another corner. ).
If yes then it will be huge & its thermal envelope will be huge as well. Will it be feasible?
If keeping the LPDDR4 (LPDDR5) memory then if 6 chunks were all 20W each then in the 120W range. That too is tractable. Core counts probably not as much of an issue as increasing the extremely high I/O bandwidth out ( shifting to PCI-e v4 or v5 ). There is some substantive things that Apple punted on to get to lower power ( which makes sense for an iPad Pro and not so much for a higher end desktop system. )
We don't know the yields of TSMC's 5nm. Now Apple may bear the cost but eventually they would pass the cost onto customers. So, in the end it will just increase the price of future Mac Pro. Monolithic die will just increase the cost.
There is an opportunity cost for Apple. Pragmatically there is an upper limit to how many 5nm wafers they can get. If they can get 4-5x as many iPhone/iPad SoCs out of a wafer and sell 4x as many devices then that might get the allocation.
Also, another thing that needs to be addressed is that whether the RAM & GPU will be included in the package or not. Many of the ASIC's functionality in M1 SoC can be transferred to GPU (things like exporting, encoding, decoding etc.), but we don't know if Apple would do that or not.
As the package gets bigger mounting the RAM gets more tractable. ( at least for GPU sizes. )
RAM on the package does become an issue if Apple is going to try to track the max RAM capacities of the higher end desktops. Getting to triple digit GB RAM on package is going to be problematical even for DDR5. Quad digit ( > 1TB) even more so. But Apple may just choose to backslide there. ( let some customers 'go' to cover the average user capacity at even higher margins. )