The M1/M2 claim to fame, is that they're very power efficient and offer the same performance on battery as on AC. That's great for laptops, but there's really no advantage to desktops
Constrained to narrow threaded tech sensational benchmarks, maybe so.
For heavyweight grunt multithreaded workloads? yes it can be. The top end single thread performance between a M1 , Pro, Max , and Ultra is basically the same. Not loosing either base clock speeds or single threaded (ST) performance all the way up to 20 cores. If strapped four dies into an monster "Extreme" package , there is very good chance could do the exact same thing all the way up to 40 CPU cores.
Go from 8 cores to 40+ cores in the x86_64 lines ups and see if base clocks and ST performance doesn't drop. Or the power consumption veer off into substantially higher costs zone.
So largely depends upon world view. Maximum speed with a liquid nitro heat sink and massive memory overlocking on narrow ST as the baseline metric or throughput on broad heavyweight loads.
It is working for Amazon (Graviton 2 , 3 ). The fastest CPU package deployment grown in AWS is Graviton. That isn't because primarily Amazon makes it, that is because customers select it. Lower operating costs, no service level agreement misses , etc. It is working for Ampere Computing on other major cloud services deployments.
Does that mean Apple is going to drift into the > 64 core zone and enter the "Core count " wars? No. But in the 16 < c < 64 SoC space probably going to compete pretty competitively. For who has stuff that runs on 1-4 cores 80+% of the time and mostly have heavily ST bound apps where Turbo speed is 'everything'. Yes, Apple probably will loose some traction there.
But is that where modern architecture and actively maintained applications are going over the next 1-4 years ? That old school pool of apps probably isn't going to grow bigger. [*** see PS below ***]
Apple's solution to compete against PCs and GPUs is to make bigger dies.
Actually that isn't really true. The dies in the Ultra are just as big as those in the Max.
Apple isn't making relatively small chiplets. Apple will likely use TSMC N3 to pull back the Max die size into more the central midrange chip size. Doubtful Apple is going to pivot to relatively small chiplets/tiles. UltraFusion is extremely wide so it takes up lots of die edge space. Make the die area too small and will run the risk of running into problems fitting the necessary I/O all around.
The can also more easily use large chiplets because their Perf/Watt is very good. If power consumption was very high then there would be a greater need for either spacing to manage the thermals or chopping clocks as core count went higher and limited to more bursty Turbo runs.
There is a ton of money being put into more advanced 2.5/3D packaging coming where future SoCs are not going to be limited to a single die for large volume product production. ( An evolutionary step past the relatively simple print a mini PCB board solution the AMD Ryzen chiplets have leveraged for a while).
At some point you can't just keep making the physical chip larger and larger to compete, there needs to be other architectural improvements - again I'm not saying apple cannot do that, just point out making things larger isn't a long term answer.
There is a class of insatiable horsepower consumption users. But there are also a substantial number of workloads were today's system are fast enough to get profitable work done for several years. Mainframes gave way to Minicomputers (actually not so mini physically). desktop PC ate away at bigger brethren for many years. Laptops have eaten away at desktops. etc.
Should Apple open a window for specialized compute accelerators drivers? Yeah. But is more a problem of deprecating OpenCL but not delivering a robust replacement for it. It is software at least as much as it is hardware.
But the architectural revolution trend line you are getting at there with "Moore Law" running out of steam is likely going to follow a path were "general" workstations will get more specialized over the long term. Part of that is because has SoC foundational building blocks like Arm , GPU , RISC-V licenses where can put together specific enough SoCs without being totally dependent upon a single build everything for everybody generically general purpose chip maker.
This isn't necessarily going to end up where is cheaper commodity stuff on the retail shelves at the upper high end.
P.S. after finishing this saw an article about faster AV1 .
"..
Google could do this by adding Frame Parallel Encoding to heavily multi-threaded configurations. ...
In other words, CPU utilization in programs such as OBS has been reduced, primarily for systems packing 16 CPU threads. As a result, they are allowing users to use those CPU resources for other tasks or increase video quality even higher without any additional performance cost .. "
AV1 content creation will become very important, now with RTX 40-series GPUs supporting the new standard.
www.tomshardware.com
Or can just add hardware AV! encoding and complete blow past that uplift. But chaning the viewpoint from 16 cores is extremely rare to being a 'reasonable' requirement actually can change performance without adding yet even more cores to run the other algorithm limited code faster or break out the extra large Turbo, flogging whip to make the old algorithm limited code run faster.