"What makes ARM superior?" is begging the question. I.e., it's assuming something that hasn't been established.
Indeed, to the extent papers have been presented in professional journals or conference proceedings on this subject, the overall conclusion has been that one ISA isn't inherently superior to another, and what really matters is instead the implementation, i.e., the microarchitecture (and, in particular, how well-optimized the microarchitecture is for the use case).
Let's start with one of the best-known papers on this subject, by Blem at al., and then check through all the papers that cited it:
"We find that ARM and x86 processors are simply engineering design points optimized for different levels of performance, and there is nothing fundamentally more energy efficient in one ISA class or the other.
The ISA being RISC or CISC seems irrelevant." [emphasis mine]
FROM: E. Blem, J. Menon and K. Sankaralingam, "Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures," 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA), Shenzhen,
2013, pp. 1-12, doi: 10.1109/HPCA.2013.6522302.
[
https://ieeexplore.ieee.org/abstract/document/6522302]
I then proceeded to do a quick scan though all 176 articles that had cited this one (
https://scholar.google.com/scholar?cluster=14820675711934164696&hl=en&as_sdt=0,5&sciodt=0,5), to see if any of the citing articles directly addressed this question (and, in particular, to see if any disagreed). I only found three, all of which broadly supported Blem et al's conclusion:
1) "Our simulation results suggest that although ARM ISA outperforms RISC-V and X86 ISAs in performance and energy consumption, the differences between ARM and RISC-V are very subtle, while the performance gaps between ARM and X86 are possibly caused by the relatively low hardware configurations used in this paper and could be narrowed or even reversed by more aggressive hardware approaches.
Our study confirms that one ISA is not fundamentally more efficient." [emphasis mine]
FROM: M. Ling, X. Xu, Y. Gu and Z. Pan, "Does the ISA Really Matter? A Simulation Based Investigation,"
2019 IEEE Pacific Rim Conference on Communications, Computers and Signal Processing (PACRIM), Victoria, BC, Canada,
2019, pp. 1-6, doi: 10.1109/PACRIM47961.2019.8985059.
[
https://ieeexplore.ieee.org/abstract/document/8985059]
2) "The difference in performance and power consumption between the studied processors seems to be determined by the intended application rather than by the choice of ISA. In other words,
in modern processors, the way the ISA is implemented, that is, the microarchitecture, plays a more significant role in determining performance and power characteristics than ISA." [emphasis mine]
FROM: Chevtchenko, S. F., and R. F. Vale. "A Comparison of RISC and CISC Architectures."
resource 2: 4.
[No year given.]
[
https://pdfs.semanticscholar.org/8977/18e3387690736f132e812d097dc40379ea2c.pdf]
3) "In this paper, we presented a survey of existing hardware performance bench- mark suites that range from evaluation of heterogeneous systems to distributed ML workloads for clusters of servers. From the survey, we selected BigDataBench in order to compare the performance of server-grade ARM and x86 processors for a diverse set of workloads and applications, using real-world datasets that are scalable. We benchmarked a state-of-the-art dual socket Cavium ThunderX CN8890 ARM processor against a dual socket Intel?Xeon?processor E5- 2620 v4 x86-64 processor.
Initial results demonstrated that ARM generally had slightly worse performance compared to x86 processors for Spark Offline Analytics workloads, and on par or superior performance for Hive workloads. We determined that the ARM server excels over x86 for write heavy workloads. It is worth noting the apparent disk I/O bottleneck of the ARM server when comparing performance results to the x86 server. There are many other BigDataBench workloads that have yet to be tested on ARM, many of which may lead to promising results when provided with larger amounts of disk and network I/O. Moreover, recording the CPU temperatures and power consumptions of these servers may yield even more fruitful results, further promoting the use of ARM in server-grade processing for ML and Big Data applications. [emphasis mine]
FROM:
Kmiec S, Wong J, Jacobsen HA. A Comparison of ARM Against x86 for Distributed Machine Learning Workloads. InTechnology Conference on Performance Evaluation and Benchmarking
2017 Aug 28 (pp. 164-184). Springer, Cham.
[
https://link.springer.com/chapter/10.1007/978-3-319-72401-0_12]