If I’m just spitballing, the render times is the achillies heel of the current M series (which is why on every occasion, Apple shows off viewport performance, the strength of the M series). And the render times are bound by lighting calculations, which anything with dedicated hardware would have an advantage.
In pure compute, Apple is fairly competitive. In fact, it's outperforming GPUs with comparable peak advertised FLOPs. The weaknesses are lack of RT acceleration (as you mention) as well as very low operational clocks.
In theory, Apple COULD go a similar route to NVidia, and have the RT hardware work in tandem with the Neural Networks to do the lighting work. But that’s easier said than done.
Neural networks are involved in denoising and AFAIK Apple already uses them for this purpose. As to hardware RT, there is around a dozen of relevant Apple patents that paint a very clear picture of the implementation Apple is pursuing. It looks very interesting and more advanced than any other current RT tech. If it works out, Apple might be the first one to deliver real-time ultra-energy-efficient RT.
The secret sauce? RT unit operates at reduced precision, which allows it to quickly test (and reject) a large amount of nodes in parallel. The ray hits are then compacted, reordered, and handed off to a freshly launched shader that does final hit precision testing and shading. What's important here is that general-purpose shaders are only invoked for rays that are suspected hits, which should in theory dramatically improve hardware utilisation efficiency.
I suppose Apple could “just” clock their processors higher, but that assumes that they’re not going to run into stability issues or other problems.
It should be fairly clear at this point that A14/A15 u-arch is not designed to scale with higher clocks. We will see whether the next u-arch will be more flexible here. Given Apple's ludicrous lead in the smartphone market, it would make some sense to sacrifice a bit of energy efficiency for higher clock ceiling and just clock the smartphone cores lower to compensate. But of course, there are a lot of variables.
I'd say the CPU is mostly fine (and Apple can always add more multicore throughput by increasing the number of area-efficient E-cores, like Intel), but where the low clock really hurts is the GPU. Even the Mac Studio should be able to handle 1.7-1.8Ghz, which would result in a flat 30% compute improvement.
Maybe Apple could take a lesson from AMD and go the chiplet route, with different dies talking over an interposer (which, iirc, the M2 Ultra already does) this is the way I believe they’ll go.
Chiplets as AMD does them are primarily cost saving measure. I doubt this would be a good solution for Apple with their focus on mobile SoCs, as they want to keep their energy efficiency (AMD uses monolithic dies on mobile too as their desktop chipsets waste too much power...).
However, there is another venue Apple is exploring, and that's 3D die stacking. They have advanced patents here going back to 2020, so they must have been experimenting with it for some time now. From what I understand the idea is to put compute logic (CPU/GPU) on a die manufactured with a smaller, more expensive node; while putting memory controllers and cache on another die manufactured on an older process. These dies are then stacked on each other and connected directly. This gives you the best of two worlds while retaining the energy efficiency of the monolithic approach. The resulting package is also supposed to be smaller than a monolithic die. I wouldn't be surprised if we see this technology in the 3nm products, simply because 3nm is so damn expensive and doesn't scale for cache memory. So it might make sense to spend a bit more on advanced packaging that uses "cheap" 5nm for cache — the resulting cost will be lower than a purely 3nm chip, with comparable performance and energy efficiency. If not on M3, maybe M3 Pro/Max.
Where we will certainly continue seeing "chipsets" is the larger systems aka. Ultra/Extreme, as energy efficiency is less of a concern there. M1/M2 Ultra seems to be a fairly straightforward interconnect — at least from logical standpoint — they "simply" join together the two on-chip networks into one big network. But a very recent Apple patent depicts a four -chip configuration using a new routing network interconnect — similar to what Intel uses in their new Xeons, only that Apple's solution will likely have much higher bandwidth.
Of course, all this is speculation, albeit informed one. It is also very possible that M3 family will continue to be "boring" monolithic dies with conservative clocks. It's just that I hope for Apple's sake they have some exiting new tech in the pipeline because otherwise they might run out of momentum.