Honestly, Apple's designs are currently very well balanced. They may not be the best in all their intended use cases, but they are good enough to offer a good value proposition to their target markets.
Apple has a very good competitive analysis and performance/use case dev teams. So they keep making designs that seem to hit most sweet spots.
I don't think Apple spends a lot of time looking at other folks sweet spots. The analysis seems to be more so confined to the 'rear view mirror' mac products and areas where Apple has had lots of traction. Things that appear largely outside the iPhone/iPad/iPhone market that take off like a rocket .... Apple doesn't have a good well of tracking well on that. ( Siri *cough*. )
that can help or hurt. IF Apple had mindlessly tried to "monkey see , monkey do" Quest/Facebook headset, then they'd be in the same ditch.
The main deficiencies are mainly that their NPU is not the most efficient (in terms of area and power) and they tend to lag behind in terms of connectivity. But those tend to be IPs they have licensed elsewhere.
Nvidia spends lots more effort folding those into the mix of the GPU cores ( CUDA core is kind of related to Tensor cores like P/E cores are related to Apple's AMX (the core cluster has shared resources for the 'non matrix' core(s) and the 'matrix' core ). However, Tensor cores have a more open interface.(AMX is more of a 'magic box'. )). Apple's approach should allow them to iterate and attack different problems. Apple is far more keen on inference than training. Apple has 1B deployed devices. Nvidia, kind of does not. Apple has AMX also . So about three different ways of 'slicing' the problem with varying degrees of energy efficiency. All going to lead to different die space allocations.
the NPUs can do inferencing with the GPU mostly turned off. In that context, are they still 'not the most efficient'. If bury the NPU cores inside of something else, then to do work that something also has to be mostly awake and running. ( on a lock screen or screen saver mode .. the GPU is mostly hitting the 'snooze bar' and yet need to do face/touch ID. Or track sensors and infer. )
Their power limits approaches are also not the most adaptive, so they end up leaving lots of performance on the table for their desktop offerings (they don't scale as aggressively as they should from when moving from a laptop to a desktop cooling solution).
If the power/clocking targets where higher Apple would have even more of an area problem. AMD got to Zen 4c by tossing getting into maximum top fuel dragster, single threaded, pissing match. High clocks waste area. For CPU+GPU dies the very high clocking CPUs cores basically take area from the GPUs. One substantive contributing reason why iGPUs build up a pretty bad reputation.
There is not much of a win in Apple both making the CPU cores bigger and then also not being able to sell them in high numbers to pay for the 'alternative' roll out ( masks , certification/validations , etc. ) .
Both AMD and Intel are detaching decent chunks of there server offerings from that 'top fuel dragster' race also. ( 4c and 'Sierra Forest' ). Once up > 100 CPU cores, it starts to make less and less sense to waste space/core. If have 100 cores letting 99 of them go to sleep is a huge mismatch of workload. Huge amount of expensive die space doing nothing.
The biggest flaw, IMO, is in terms of multi die scalability. They have very poor NUMA behavior when migrating threads from clusters in other dies (for the Ultra parts). And their GPU is even worse in that regard. They also need huge L1 caches for their cores to be performant, which may be an issue when moving to 3nm processes. Since they have seen poor SRAM scaling, so those cores may become even more significant in die space within the SoC.
This was throughly measured on M2 Ultra already???? M1 had issues even single die. That is why M2 got a substantively upgraded internal mesh network that delivers major uptick in performance with not backhaul Memory upgrade. And the 10,000 pads of Ultra... I have doubts the that UltraFusion really needed much adjustments for the 'new' M2 internal mesh adjustments. ( For two dies in the M1 era that seemed to be overkill. Even the PCI-e additions they made in M2 .. still a bit of overkill. )
Can Apple scale that 4-way. Yeah, there I'm skeptical. Can they do 3-way and that's probably enough.
I think that may be reason why they haven't been able to release a proper replacement for the high end of their product line until now.
They have to release something that enough people will pay for. There is lots of opinions on these forums that Apple should chase up into very narrow niche areas . Make a desktop single threaded 'killer' SoC, make a xx9xx GPU 'killer' that make about zero economic sense. It can't be too few that cost way, way too much. Nor does Apple even really have that many resources to chase after them ( barely any Apple 'product' out in those swamps either. )