So naive, maybe you should give the clearly incompetent engineers at AMD and Nvidia a call to help them improve their performance and crush their competition, they are sure to offer you an amazing job.
Unsure why you re quoting my post, I am not a CPU/GPU engineer. On a more general note, it's fairly clear that Apple's designs outclass the competition in terms of performance per watt by a large margin. In particular, they only need 5 watts to deliver a level of single-core CPU performance that Intel and AMD need 20 Watts for.
I mean, it has totally never happened (except literally every single GPU generation) that scaling up the number of processing units in a GPU lead to diminishing or even negative gains in performance. The GPU archs have definitely never been memory bandwidth starved or had other inefficiencies that only became overwhelming while scaling up.
Why, someone seems rather angry
I assumed it was obvious that Apple would have to scale up memory bandwidth if they are to scale up their processing clusters. Their communication on the mater has been rather transparent — they are investing in wide-memory architectures to achieve high memory bandwidth with low latency.
And talking about scaling, it's pretty much linear, especially in the GPU land, as long as you can control all the other factors (bandwidth, ROPs etc). It's fairly cleal when you look at the performance of GPUs within a single generation. Comparing across generations is much more tricky, especially given the misleading marketing of GPU makers (like Nvidia's "CUDA core"). Here, we are talking about increasing the number of Apple GPU clusters (each of which comprises of 4 32-wide ALUs, local memory and a dispatch/control unit). If you can ensure that the work is evenly distributed across these clusters (which is not that complicated in a tiled architecture), you can get pretty much guaranteed linear scaling. All you need is enough bandwidth — which is solvable.
Just to put things into perspective: an entire M1 SoC has 68.25GB/s memory bandwidth, while a single 3090 has a 936.2 GB/s bandwidth to its memory.
M1 has bandwidth appropriate to its processor configuration. Future Apple Silicon clusters will have larger caches and more bandwidth. We also should keep in mind that for graphical applications, Apple GPUs need significantly less bandwidth than Nvidia or AMD, because they use bandwidth more efficiently. Compute work can be a different matter, but here Apple Silicon main selling point is low CPU/GPU communication latency.
And while we are at it, let's have
@iPadified school the CPU designers by explaining how easily Apple could kill the whole CPU industry and how the performance scales linearly with the number of cores. It's not like already 32-core threadrippers easily get memory starved with their quad RAM channels and the same goes for the high core count Epyc CPUs with 8 memory channels.
Which is why I believe that Apple will be using something like 16 64-bit memory channels in their pro-desktop chips.
LOL just the part of the chip dealing with routing data in and out of an Epic CPU has 5 times the area of a whole M1 SoC.
... and is built on an inferior process. I don't think anyone has any estimate how big these things will end up when fabricated on 5 or 3nm TSCM.
High end stuff is a whole different world. And Apple might do well there too, but you know, other companies are doing well too and it's not trivial.
Of course it's not trivial. But you seem to be ignoring a crucial bit: Apple does not need to sell these chips to anyone. The chip itself doesn't have to be market-competitive, the resulting PC does. Intel and AMD need to make all these huge, complex chips and be able to sell them at a profit. For Apple R&D + manufacture just needs to cost around the same of what they pay Intel + AMD.