Now, we have all the information to focus on the main topic of the thread.
- AMD's Phoenix SoC
AMD’s mobile and small form factor journey has been arduous.
chipsandcheese.com
- Apple Silicon
Is this infographic the best explanation of Apple Silicon microarchitecture?
View attachment 2266831
Unfortunately, I can't find the link to the original source. Is it yours?
@name99
- Geekbench results
Some results don't make any sense. The 7840U wins in almost all but three of the multicore tests. Of the three, the Object Detection results are the strangest because M2 wins by 10% on multicore, but loses by 10% on single core. How is this possible?
This is by Dougall Johnson and comes from
https://dougallj.github.io/applecpu/firestorm.html
It is about the best you can find, but Dougall and I differ on a few points:
- We have different theories of how the ROB is laid out. I think we agree on the basic point, but I find his language of "coalesced" retirement incomprehensible (and maybe he thinks the same regarding my language!)
- I believe (based on multiple patents) that at least some of the scheduler queues are paired, so that if one queue can find no runnable instructions, it will issue the second-choice runnable instruction from the paired queue.
- And I don't think the LS scheduling queue is a single large queue.
But we have been doing very different work, using very different investigative techniques, so there's no reason to believe either of us is the absolute truth! Both of us have had to be content with quick scans of the territory, rather than the sort of careful detailed investigation of just one subsystem that you get in the x86 world, because there is so much new territory.
AMD uses 8 P cores (with hyperthreading as their kinda sorta equiv of E-cores) so the fact that they lose in ANYTHING in multicore is noteworthy.
Background Blur presumably reflects AVX512 (or whatever version this particular Zen uses); IF Apple opened up AMX a compiler might be able to route the instructions to AMX and get similar performance, but who knows when that will happen.
I don't know if Ray Tracer (single core) is written to be vectorized (with predicates). If it is, that likewise explains the great Zen performance in that case; predicates as part of "NEON" are the one thing it would be nice if Apple picked up from SVE...
As for Object Detection, well unfortunately, unlike SPEC, we don't know which subsystems GB6 stresses. Maybe Object Detection is PRIMARILY a memory bandwidth test, and running one copy is fine for both memory systems, but multiple copies overload the memory systems (of both of them, but more so HP)? You'll note that even on M2 its multicore version scales worse than the other benchmarks.