Ada is significantly faster yet — the 4090 mobile features whopping 76 SMs running at 1.4-1.7Ghz — that's around 2x-2.4x more theoretical compute than M3 Max! Ada also features larger caches and faster RT — all this allows it to achieve a commanding lead of 2x over M3 Max and 3080 Ti mobile in Blender.
This is the situation we have today. What about tomorrow though?
Well, the obvious next step for Apple to do is to "pull an Ampere" and add FP32 capability to either FP16 or FP32 pipe....
Actually the obvious 'next step' is a 'two times' M3 Max which would fundalmentally close the gap on any '2x commanding lead'. Don't really "have to" wait another whole silicon generation .
The bar is going to move for both Apple and Nvidia when both sides get to take a 'whack' at new GPUs cores resting on incrementally newer fab processing constraints.
(I think FP16 pipe is a more likely candidate as this would retain useful concurrent FP+INT execution). If Apple goes this route, each of their cores would be capable of 256FP32+128FP32 per clock, making them 2x more capable than Nvidia SMs. This should instantly boost Apple's performance on FP-heavy code by 30-60%, without increasing clocks. And this should be fairly easy for Apple to do, as their register files and caches already support the operand pressure.
Support the operand pressure of the the magnitude of the current operand flow? Yeah. Already latent support for operand flow that is 2x what have deployed now? Probably not. There is a substantive extra 1x 'horsepower' buffer still around doing nothing 98% of the time? I suspect that substantive parts of 'extra' data pressure handling is assigned to something else other than FP32 and also that this "other stuff" will still be around in the future config to take claim on what it substantively concurrently uses.
In theory, they might go even further: make all pipes symmetrical and do 3x dispatch per cycle, but that would likely be very expensive.
As long as sharing transistor budget with P , E , NPU, A/V , etc cores (and higher than average caches to offset bandwidth constraints of "poor man's HBM" on the same die that makes it relatively even more expensive.
At any rate, if we look at Apple's progress with the GPUs, I think one can see long term plan. Each generation incrementally adds new features that are used to unlock new features the next generation. This is a well-executed multi-year plan, that has been delivering consistent performance and capability improvements every single release.
In the general sense, that 'plan' is about as applicable to Intel as it is to Apple ( never mind Nvidia and AMD.)
There is relatively little in Apple's move so far to indicate that they are ramping to a x090(Nvidia) / x900 (AMD) 'killer' kind of path. They are incrementally making progress but it is at best the upper-mid , lower-high range they are targeting. And AMD and Nvidia are steadily folding there bigger stuff back down into that range ( with shrinks and bandwidth upgrades as the costs for those fall). The 4090+4080Ti/Super/etc are not the bulk of the cards that Nvidia will sell this generation. Many of these treads about Apple getting 'killed' in the desktop are about moving to some of the lowest volume cards that Nvidia sales and saying Apple isn't there. Apple doesn't 'have to' be there to be generally competitive.
I doubt that the current Apple GPUs architecture is close to its plateau, simply because there are still obvious things they can do to get healthy performance boosts. The same isn't really true for Nvidia or AMD. I just don't see how Nvidia can further improve Ada SM design to make it significantly faster —
1. Faster at what? Faster in RT and Tensor. Yes there probably is get out the FP32 myopic focus. ( Apple using their Metal in iOS/iPadOS to drag uplilft in the macOS context ... Nvidia doing same thing with their AI/ML foothold to pull up more mainstream Windows/Linux users into larger, faster loads wouldn't be shocking. If Microsoft pragmatically mandates that Windows 11+ has NPU present the growth factor pretty much a given. )
2. Apple boat anchored on LPDDR, while competitors are on GDDR, and cache sizes plateauing because not scaling on new fab processes means Apple's monolithic dies are going to face headwaters in performance gains. Apple has already engaged workarounds for those limitations. Those are 'cards already played' just like Nvidia has lots of 'plaid cards'.
they can either continue increasing the SM count and raising the clocks, or they have to design a fundamentally new SM architecture that boosts compute density. Definitely looking forward to see what Blackwell will bring — does Nvidia intend to continue pushing their successful SM model with bigger and bigger designs, or will they do something new? For Nvidia's sake, I hope it is the latter.
Apple is on N3 already. Nvidia is on N4. Apple has already used the shrinkage while Nvidia can still cram 'more' into the same size dies they are using now. And if Nvidia waits until N3P to jump into the 'just as big' game then all the more. ( Nvidia doing just midrange just on N3E wouldn't be surprising. Also wouldn't be surprising for them to basically walk away from the low end. ). Also decent chance there is a even bigger gap between arch of B100 and the 'mainstream raster , high frame rate' oriented stuff.
More than decent chance that the 5090 will be more expensive than the 4090 is. Reports are that AMD cancelled their "expensive as possible " 5090 competitor. For AMD even higher prices probably wouldn't work well marketwise and limited CoWoS throughput likely solely devoted to MI300 (and up) which have better margins to go with the lower relative volume. If AMD puts a hefty focus on providing performance value in the mid range, then that is likely a bigger 'desktop' problem for Apple in immediate future than Nvidia's next moves are.
Even higher likelihood that Nvidia is going to attack MI300 like competition with chiplets also (not just a bigger monolithic die). There is a pretty good chance the cache ratio isn't going to be the same if they go 3D stacking. (and Apple won't in the interim future. Expense and performance/watt won't be laptop optimized. )
Apple sells its GPUs at 'workstation GPU card' price levels so Nvidia's higher prices really aren't going to hurt them much in head-to-head with Apple on that aspect.