It's not about raytracing, it's about SIMD-heavy code. Both Apple and modern x86 CPUs can do around 512 bits worth of SIMD operations per clock (give or take). But x86 CPUs run at higher clocks and have more L1D cache bandwidth to support thee operations. It's all about tradeoffs really. Apple is more flexible (their 4x smaller SIDM units are better suited for more complex algorithms and scalar computation) and more efficient (not paying for higher bandwidth and higher clocks), but this also mean it cannot win when it comes to a raw SIMD slugfest. We see it across all kinds of SIMD-oriented workflows btw, not just CB.
Of course, if you take clock-frequency into consideration, then the throughput of the higher clocked design does have an advantage. But that is not an architectural weakness. However mobile x86 designs running at similar clocks compared to AS - the performance discrepancy is only related to how Embree handles NEON.
That is unless you compare to a desktop CPU, where Apple does not have a match.
Did you look at the sources? Float[4] is the most common data structure - which is perfect for NEON. Are you familiar with the Embree implementation?Is it possible to write a CPU raytracer that would perform better on Apple Silicon than embreee? I am sure it is. Embree isn't really written with a CPU in mind that has four (albeit smaller) independent SIMD units, and I wouldn't be surprised if certain operations (like ray-box intersection) can be implemented more efficiently on ARM SIMD. But no matter how efficient your code is, this doesn't change the fact that Apple Silicon is at disadvantage in SIMD throughout compared to x86 CPUs on the hardware level.
You again fail to understand the issue, Embree is written for AVX/SSE and only statically wrapped to NEON.