Damn, it's still 8GB but with 512GB SSD
Something odds about memory bandwidth of M3 Pro: 150GB/s and 18GB of RAM, hmm. Apple is reducing memory bus from 256-bit to 192-bit, Damn Apple
I'm seeing a lot of these sorts of claims. I'm not convinced.
My guess is that
- M3 Pro (and Max) use LPDD5X
- 150GB/s is "enough" for the target uses of this chip. Note that QC, targeting the same sort of price segment for their chip, also go with only 150GB/s.
- look at the Max. If we assume the obvious stuff scattered around the edges of the GPU is memory controller plus PHY, then two things stand out
+ Pro and M3 both have eight of them (though the layout geometry is slightly different between the two)
+ Max has something different: 8+8, also each is double wide.
The obvious assumption is something like Max has four times as many EXTERNAL pins to DRAM, but twice as many internal pins. Meaning something like it has quadruple the RAM capacity, but twice the RAM bandwidth, as seems to be the case.
This suggests something for M1 and M2 generation, Apple was kinda forced to tie together RAM capacity and bandwidth in a way that perhaps was not optimal. (Of course one always likes more RAM bandwidth, but not at the expense of area and power.) With the new design (and the higher intrinsic speed of LPDDR5X) they can recalibrate this so that DRAM capacity can double in the Max without requiring the full array of DRAM paraphernalia in the M2 and M1 Max. If you compare relative to the M1 or M2 generation, the M3 Max's memory stuff looks the same, the M3 and Pro memory stuff is half as wide. So in a sense Apple has substantially shrunk the area (and power?) required to communicate with memory in the M3 and Pro, and while the area has not shrunk on the M3 Max, the memory capacity has doubled.
The interesting case especially will be the Ultra and Extreme, presumably at 600 and 1200GB/s. That seems low compared to nVidia high end, but Apple may believe (and may be correct?) that
- their large available memory (presumably 512GB for a maxed out Extreme) supports customers trying models that simply won't fit even on the largest nVidia system
- their various technologies (eg SLC and tagging of GPU streams to give them much more intelligent memory behavior) allow them to match much higher raw bandwidth at lower power?
Other cases of interests:
- Seems, like I suggested, that the new cluster size is (up to) 6 cores. In principle all 6 could share the same infrastructure (same L2, same L2 TLB, same page de/compression engine, same AMX). OR you could sub-cluster so that, eg, three cores share an AMX. By eye it's not clear to me. The 6 P-core clusters are very clear, but AMX could be duplicated or not.
- anyone know what the "Dynamic Caching" stuff is about? My best guess is that
+ GPUs do not allow for dynamic allocation of many resources (like Scratchpad or Ray tracing temporaries) and so apps are forced to allocate the maximum size they might require. Which in turn means that you often can't pack as many threadblocks onto the core as you would like to because they all claim to want a lot of (then unused) Scratchpad.
+ Apple works around this; perhaps by the second level paging that I discovered in the patents but did not understand.
?Apps allocate lots of space in Scratchpad or Ray address space, but that's "virtual" allocation. Attempts to touch the address space trigger a physical allocation in Scratchpad or Ray cache, but if you never touch the address space...? Basically like standard VM and its various magics like page faults for demand allocation, only handled by the GPU (presumably the GPU companion core) rather than the OS proper.
An alternative possibility is they copy what (Ampere? one of the recent nVidia chips) has started to do where one core can use the local storage of a neighboring core. So rather than separate Scratchpads per core, it's more like there is a large pool of Scratchpad, and any threadblock can allocate within that large pool.
Yet a third option (again copying nVidia) is you have common storage for GPU L1D and what Apple call's Tile memory (ie basically Scratchpad) so instead of say 8K of L1D and 64k of Tile, you get 128K of combined, use as much Tile/Scratchpad as you need, and the rest is "dynamically" used as L1D.
Other non-obvious points:
- why no Pro or Max in an iMac? Apple knows their market better than I do, but my guess would be lots of people want that!
The obvious rejoinder is that (at some point...) an iMac Pro is coming, maybe 32", and at that point we'll see the full range.
But even so, to my eyes the obvious configs are iMac with M3 or M3 Pro [cf the mini], and iMac Pro with Max, Ultra or even Extreme.
- why not announce the mini's at this same time? What's the point of delaying them? Is it purely a business decision, in the sense that different products get announced in different quarters to smooth revenue? That's my best guess.