What makes ARM superior?

dmccloud · Sep 21, 2020

adib said:
Technically "among the first" ARM desktop was Acorn Archimedes, circa 1987.

The more modern ones would be Microsoft Surface RT, circa 2012.

You left off the Galaxy Book S (2018), the Lenovo Yoga C630 (2019), and the Surface Pro X, (2020...

Woochoo · Sep 21, 2020

06tb06 said:
There are some ARM machines out there but most are power-hungry servers.

Actually ARM is used in servers (i.e Amazon Graviton/Neoverse) and in supercomputers due to its efficiency. Because it offers the same (or more, depending on the design) performance for less W. In supercomputers the power consumption and heat disipation & cooling are as important if not more than the hardware itself, as it's a massive cost to cool all those so a big part of the budget goes there.

Now, if you meant a server consumption > customer computer consumption then yes that's true indeed.

theorist9 · Sep 21, 2020

Woochoo said:
Actually ARM is used in servers (i.e Amazon Graviton/Neoverse) and in supercomputers due to its efficiency. Because it offers the same (or more, depending on the design) performance for less W. In supercomputers the power consumption and heat disipation & cooling are as important if not more than the hardware itself, as it's a massive cost to cool all those so a big part of the budget goes there.

Now, if you meant a server consumption > customer computer consumption then yes that's true indeed.

TOP500 regularly publishes rankings of the 500 fastest supercomputers. In 2013, they added a new list—the Green 500—which ranks the 500 most energy-efficient supercomputers, based on GFlops/watt.

If you look at who's in the top 10 of the most recent Green 500 list (published June 2020), you'll see that it includes machines built with all the major CPU architectures: Intel Xeon, AMD EPYC, IBM Power 9, and Fujitu ARM (A64FX). This suggests to me that, at least thus far, at the higher power scale, and in this application, no one architecture is particularly more efficient than the other.

Then again, they've had many more years to optimize the Xeon/EPYC/Power 9 for this application, while this is the first generation of ARM CPUs for supercomputers. So perhaps as they gain experience with this ISA in this application, higher efficiencies will be achieved.

AWS does claim higher efficiencies for its ARM-based Graviton2 in the server space vs. Xeon designs, but here they're comparing 2.5 GHz Graviton2's with 2.9-3.2 GHz Xeons, and you'd expect higher-clocked chips to be less efficient, so this is not quite an apples-to-apples comparison.

By comparison, in the Green 500 top 10, we have (with one exception) 2.1–2.4 GHz Xeons vs 2.0–2.2 GHz A64FX's, so they're all in the same clock range.

June 2020 | TOP500

www.top500.org

Joelist · Sep 21, 2020

Now notice that the ARM being mentioned is not Cortex but rather a custom core design by Fujitsu.

theorist9 · Sep 21, 2020

leman said:
Yes, definitely, that is the most important topic on which we have no data. Anandtech tests of A12 show that power consumption raises rapidly towards the end of the frequency range, extrapolating the curve further suggests that the chip doesn’t have much space to grow. Of course, a lot could have changed in two generations and one node shrink. Let’s see what A14 can do.

As you can see from the reply I just posted to Woochoo, thus far ARM ISA's seem to to lose a lot of their power efficiency advantage to x86 at the higher end of the power scale. Maybe Apple can do a better job.

Though it should also be noted that the clocks needed to compete with the Intel Core i9's are even higher than those needed to compete with the Intel Xeons.

DaiwaGuy · Sep 21, 2020

Jmausmuc said:
I now a little bit about processor technology and have read up on x86 and ARM architecture but I still do not really understand what makes ARM so superior to Intel or x86 technology in general that has people believing the the new ARM Macs will be much better and faster than Intel based macs.

I understand that the advantages of ARM are power efficiency and the ability to have many more cores but isn’t Intel still better in raw power in multi threaded operations?
Will ARM at first be a replacement for intels mobile processors which are arguably already worse in many ways than an A12Z or A13 or will they also be able to create a processor than can beat i9 and even Xeon processors?
Can we really expect a „night and day“ difference?

By the way - just yesterday, it was announced that the fastest supercomputer of the world is now ARM based. it uses ARM processors made by Fujitsu:https://www.arm.com/company/news/2020/06/powering-the-fastest-supercomputer
Fits perfectly to Apples announcement.

Doesn't matter. As old as I am, I'll never be old enough to need one, I'll be dead before then. Given the world and the younger generations, I'm glad I've selected my own use-by date.

theorist9 · Sep 21, 2020

Joelist said:
Now notice that the ARM being mentioned is not Cortex but rather a custom core design by Fujitsu.

Sure, I was just responding to the general idea that ARM efficiency is inherently >> x86 efficiency at higher power scales. As I acknowledged in my subsequent post to Lehman, we don't know what the scaling will be for Apple's design. And as I acknowledged in my earlier post, industry has less experience with ARM ISAs at higher power levels, so there's probably more room for optimization there.

bousozoku · Sep 21, 2020

ARM is superior to Intel in one respect: Intel have been plagued by problems since the first 8086/8088 CPU. They occasionally choose to rectify the problems before allowing chips to be produced but mostly, Intel don't care.

Apple want control of their own destiny, and building their own processors based loosely on ARM designs and the instruction set put them in control.

When Motorola couldn't get the 68060, Apple shifted to PowerPC. When the successor to the 604e wasn't good enough and the G4 was flawed, Apple shifted to Intel. Now, they can go their own way without consent and without waiting.

vigilant · Sep 21, 2020

Woochoo said:
Actually ARM is used in servers (i.e Amazon Graviton/Neoverse) and in supercomputers due to its efficiency. Because it offers the same (or more, depending on the design) performance for less W. In supercomputers the power consumption and heat disipation & cooling are as important if not more than the hardware itself, as it's a massive cost to cool all those so a big part of the budget goes there.

Now, if you meant a server consumption > customer computer consumption then yes that's true indeed.

I agree with you, but I do think a few things are worth thinking about. ARM is not going ANYWHERE. But Intel, and Microsoft hold the keys to complicating things for ARM in the data center.

I don’t think Microsoft is going to by any means at all. In fact, I think Microsoft is showing that they don’t want to be tied to Intel for growth.

The problem that many companies are going to have is going to be in the adoption of frameworks and APIs that work just as good on ARM as they do on X86/64 and I think that is going to be a major thorn in the side of most companies for the next 10 maybe 20 years.

What I expect, is that theres going to be ARM solutions that have easy migrations due to other vendors doing the hard work and filling in the gaps. In the next 5 years I think we are going to see a lot of growth in that area. Inevitably, there are going to be very specific workloads, and even I don’t know what they are right now, but they will be tied to x86/64. We have that today.

I’ve got a small handful of customers that want to go to the Cloud, and they are ready to go as soon as support for their Intel boxes hit EOL (End of Life). They also have a line of business application, running on a mainframe, and if you talk about trying to do anything to migrate from it theres a shunning silence. Those systems, are too important, and migrating off would be too expensive. If they sneeze on the mainframe they are afraid it will stop everything from working fine, like it has for the past 10 years.

I say that with a lot of love, as those boxes are running the Power series, which obviously gave some of us old timers great times with the PowerPC. They are brilliantly elegant system, and they pay a licensing fee to enable hardware thats already in the box.

Linux is already running on ARM. Netflix is using ARM today. AWS ARM processors are 30% cheaper, with equal to or better performance compared to the Intel equivalent. For Netflix, it’s a no brainer, they are building their own stack so they can choose their own destiny.

Many small to large companies will not take that gamble. The cost to migrate is too high, and if it fails it could cost them business. What do they do instead? They pay the tax man, that is the providers of the hardware and support that cost just 10% shy of a cost to migrate. Why? Because for less than the cost to migrate, they know it will keep running. Thats the next big battle in the data center. Intel can only hope that at some point they can get rid of AMD having an x86 license so they can be the only ones manufacturing those chips in the data center again.

With that said, Intel mostly doesn’t care about AMD in the PC space. Why? AMD can’t produce enough chips in volume to supply laptops for Dell and HP to make it something that they can design around, and know they can get what they want out of.

Joelist · Sep 21, 2020

The funny thing here is that Apple Silicon is more or less a RISC version of Core 2 using the ARM instruction set.

raknor · Sep 21, 2020

theorist9 said:
TOP500 regularly publishes rankings of the 500 fastest supercomputers. In 2013, they added a new list—the Green 500—which ranks the 500 most energy-efficient supercomputers, based on GFlops/watt.

If you look at who's in the top 10 of the most recent Green 500 list (published June 2020), you'll see that it includes machines built with all the major CPU architectures: Intel Xeon, AMD EPYC, IBM Power 9, and Fujitu ARM (A64FX). This suggests to me that, at least thus far, at the higher power scale, and in this application, no one architecture is particularly more efficient than the other.

That performance per watt is not directly related to cpu efficiency. Most if not all of them also have custom accelerators or GPUs that contribute to their performance results.

it says at the top of the link you posted.

The most energy-efficient system on the Green500 is the MN-3, based on a new server from Preferred Networks. It achieved a record 21.1 gigaflops/watt during its 1.62 petaflops performance run. The system derives its superior power efficiency from the MN-Core chip, an accelerator optimized for matrix arithmetic. It is ranked number 395 in the TOP500 list.

theorist9 · Sep 21, 2020

raknor said:
That performance per watt is not directly related to cpu efficiency. Most if not all of them also have custom accelerators or GPUs that contribute to their performance results.

it says at the top of the link you posted.

The most energy-efficient system on the Green500 is the MN-3, based on a new server from Preferred Networks. It achieved a record 21.1 gigaflops/watt during its 1.62 petaflops performance run. The system derives its superior power efficiency from the MN-Core chip, an accelerator optimized for matrix arithmetic. It is ranked number 395 in the TOP500 list.

Thanks for pointing that out. So it seems we are left with two possibilties:

1) The CPU's used in all of the top 10 are of roughly comparable efficiency, and it's the other components, such as the accelerators, that make the difference.

2) Some of the CPUs in the top 10 are indeed much more efficient than others, but this is masked by the efficiency differences from the other components.

Stene · Sep 21, 2020

leman said:
I don’t think you are giving enough justice to Intel and AMD here. Intel’s CPUs also contain a fast GPU, a built-in AI accelerator, wide vector unit, matrix operations, encryption, video encoder, I/O controllers, integrated WiFI controller etc... Apple SoCs might contain more specialized processors and their ML accelerators are much faster, but fundamentally, the principal differences between these systems is minor.

And all these custom systems have little to do with CPU itself. A system might have best ML acceleration in the world, but it’s utility is going to be limited if it struggles with basic tasks. Apple CPUs are custom-designed, sophisticated devices that offer very high performance at very low power draw. This is their key advantage to an average user (be it a home user or a professional).

Yes, you are right - Intel and AMD also has specialised and accelerators, the main differences between these actors and Apple is that Apple can implement additional accelerators and at the same time optimise the combination of hardware / OS services. Since close to all functions in applications are using the core OS services the implementation of hardware accelerators in the SOC system will have a significant impact on end-user performance.

leman · Sep 21, 2020

theorist9 said:
As you can see from the reply I just posted to Woochoo, thus far ARM ISA's seem to to lose a lot of their power efficiency advantage to x86 at the higher end of the power scale. Maybe Apple can do a better job.

Though it should also be noted that the clocks needed to compete with the Intel Core i9's are even higher than those needed to compete with the Intel Xeons.

Its because performance has little do do with the ISA - it’s up to the CPU architecture. It is possible that ARM is inherently a bit more power efficient (simpler decoders, weaker memory ordering etc), but once you target high performance, you need to add a lot of complexity to your design. Making a fast CPU is very difficult. As already mentioned by many posters, Apples advantage is not that they use ARM, but that they have built a really sophisticated wide microarchitecture with state of the art scheduling and cache. And yes, it still remains to be seen whether their architecture can significantly outperform Intel core per core: so far we only saw that Apple can match them while being more power efficient.

Comparing supercomputer performance is tricky since as you point out the definition of peak performance is a vague one. It’s like Nvidia claiming that they have doubled the number of GPU cores on 3000 series while all they did was double the theoretical MADD peak at the expense of integer throughout (making this peak performance unreachable in practice).

theorist9 · Sep 21, 2020

leman said:
Its because performance has little do do with the ISA - it’s up to the CPU architecture.

Interestingly, as you'll probably recall (since you gave me a thumbs-up for it

), this is precisely the position I took earlier in this thread (https://forums.macrumors.com/thread...or.2242787/page-3?post=28689244#post-28689244), as well as in my subsequest discussion about this with cmaier (https://forums.macrumors.com/thread...rm.2248115/page-5?post=28733170#post-28733170) who worked on chip design for AMD. cmaier argued strongly that the ARM ISA is inherently more efficient than x86.

My conclusion was:

theorist9 said:
It certainly makes sense that some ISAs would be naturally better suited to certain tasks than others. And that could certainly explain the active interest in exploring ISAs. But as to the question at hand—namely whether one ISA is inherently superior [in efficiency] to another (x86 vs ARM)—my take-away at this point (based on the difference between what you wrote, and what's written in those papers) is that this is not yet a settled question in the field.

leman · Sep 21, 2020

theorist9 said:
Interestingly, as you'll probably recall (since you gave me a thumbs-up for it ), this is precisely the position I took earlier in this thread (https://forums.macrumors.com/thread...or.2242787/page-3?post=28689244#post-28689244), as well as in my subsequest discussion about this with cmaier (https://forums.macrumors.com/thread...rm.2248115/page-5?post=28733170#post-28733170) who worked on chip design for AMD. cmaier argued strongly that the ARM ISA is inherently more efficient than x86.

Well, cmaier certainly has more expertise, he actually works with the stuff

My take on this is that it's probably much easier to make a low-power ARM CPU, since the basic building blocks components can be made much simpler. But again, once you get into high-performance domain, you need to design a superscalar CPUs with instruction reordering, dependency tracking, register renames, write coalescing, wide vector units, complex cache hierarchies... and that is an entirely different thing. It is still a mystery to me how Apple manages to reach these performance levels with only 5 watts of actively consumed power...

On another hand, I also believe that ARMv8 is a better ISA design than Intel's monstrosity of x86... It's more symmetrical, more logical, and has less irregularity.

thekev · Sep 21, 2020

theorist9 said:
As you can see from the reply I just posted to Woochoo, thus far ARM ISA's seem to to lose a lot of their power efficiency advantage to x86 at the higher end of the power scale. Maybe Apple can do a better job.

Though it should also be noted that the clocks needed to compete with the Intel Core i9's are even higher than those needed to compete with the Intel Xeons.

An ISA just gives you a model, implemented by assembly opcodes. The cpu itself must be able to decode these, but they can be backed by micro-ops or micro-coded routines. The influence of the ISA on efficiency relates to whether it's a good model for the underlying hardware and whether it's possible to write a good compiler for it. It's not like it actually determines this. I still think Intel's assembly is messy. They change things quickly, and it takes years for gcc and clang to catch up.

leman said:
Well, cmaier certainly has more expertise, he actually works with the stuff

Neat.

leman said:
My take on this is that it's probably much easier to make a low-power ARM CPU, since the basic building blocks components can be made much simpler. But again, once you get into high-performance domain, you need to design a superscalar CPUs with instruction reordering, dependency tracking, register renames, write coalescing, wide vector units, complex cache hierarchies... and that is an entirely different thing. It is still a mystery to me how Apple manages to reach these performance levels with only 5 watts of actively consumed power...

On another hand, I also believe that ARMv8 is a better ISA design than Intel's monstrosity of x86... It's more symmetrical, more logical, and has less irregularity.

This is a minor note, but HPC benchmarks often focus on either throughput of gemm like workloads or conjugate gradient types, neither of which benefits heavily from explicit reordering. They mostly benefit from the ability to have multiple in flight instructions with in order execution and commit. This mostly means register renaming, as most architectures that show up there will use more register locations than the ISA exposes directly.

I'm also skeptical that you really need wide vector units. They presumably reduce decoding overhead, but Intel is the only one that went really wide, and they introduce a lot of weirdness in the process. They still use 128 bit lanes, so shuffles which move data across a 128 bit boundary have different latencies. They also keep incrementally adding instruction versions with memory addresses as the third operand, so that shuffle and arithmetic ops can be issued from a port that normally handles loads.

I have no idea how Intel hasn't driven their compiler team mad. I just imagine some lord of the flies scenario there, partly because it amuses me.

leman · Sep 21, 2020

thekev said:
I'm also skeptical that you really need wide vector units. They presumably reduce decoding overhead, but Intel is the only one that went really wide, and they introduce a lot of weirdness in the process.

Wide units can be beneficial for HPC, if your workloads are inherently SIMD-compatible. Fujitsu also went with 512-bit vector units for there A64FX for example (which is where most of the chips raw FLOPS performance comes from).

I completely agree hoverer that wide vector units are a waste on a general-purpose machine, especially when you consider the ISA extension fragmentation. Four separate 128-bit ALUs will almost always outperform a single 512-bit one. Apple currently has 3 128-bits ALUs. Personally, I would much prefer if they just add an additional one and implement SVE/SVE2 to allow the ALUS to be pooled together as needed. Although I can imagine that scheduling will be a challenge at this point.

thekev said:
They still use 128 bit lanes, so shuffles which move data across a 128 bit boundary have different latencies. They also keep incrementally adding instruction versions with memory addresses as the third operand, so that shuffle and arithmetic ops can be issued from a port that normally handles loads.

I have no idea how Intel hasn't driven their compiler team mad. I just imagine some lord of the flies scenario there, partly because it amuses me.

Yeah, it's a mess. Another problem is that using different SIMD extensions has non-trivial impact on frequency and power consumption. There is also the warmup delay and transaction delays... Intel wide SIMD are still completely unsuitable for quick bursty work. They are only worth it if you do thousands of cycles worth of SIMD operations in a row. If you just need to add a bit of SIMD here and there in your code, AVX2 and above might actually end up slowing you down. I run into this phenomenon while working on my game engine. That's a tremendous amount of silicon and performance potential wasted.

thekev · Sep 23, 2020

leman said:
Wide units can be beneficial for HPC, if your workloads are inherently SIMD-compatible. Fujitsu also went with 512-bit vector units for there A64FX for example (which is where most of the chips raw FLOPS performance comes from).

I completely agree hoverer that wide vector units are a waste on a general-purpose machine, especially when you consider the ISA extension fragmentation. Four separate 128-bit ALUs will almost always outperform a single 512-bit one. Apple currently has 3 128-bits ALUs. Personally, I would much prefer if they just add an additional one and implement SVE/SVE2 to allow the ALUS to be pooled together as needed. Although I can imagine that scheduling will be a challenge at this point.

SVE would be pretty awesome in general. I think you linked me to that originally. That plus not exposing fused load + ALU ops directly in assembly would clean up quite a lot on the compiler end of things, although I wish translation between scalar and SIMD code was treated similarly to the way things like multi-threading work in OpenMP like frameworks. The way it's done right now makes optimizations break too easily, and it results in a lot of diagnostic messages that often make no sense with respect to the original source.

SVE aside, 4 units would lead to better blocking patterns. It's common to block optimized code in power of 2 chunk sizes where possible, and this tends to be a better match for register name counts in general in most cases.

leman said:
Yeah, it's a mess. Another problem is that using different SIMD extensions has non-trivial impact on frequency and power consumption. There is also the warmup delay and transaction delays... Intel wide SIMD are still completely unsuitable for quick bursty work. They are only worth it if you do thousands of cycles worth of SIMD operations in a row. If you just need to add a bit of SIMD here and there in your code, AVX2 and above might actually end up slowing you down. I run into this phenomenon while working on my game engine. That's a tremendous amount of silicon and performance potential wasted.

The downclocking issue is definitely more pronounced with AVX512, although I recall reading a recommendation from intel to use base clock for tracking theoretical peak flops for AVX and AVX2.

Currently x86_64 just has too many points of divergence for me. Different architectures support 128, 256, or 512 bits. The later ones also support FMA3. Some of the AVX instructions can be issued with a memory operand. This has changed over time. Out of the 128 bit ones, the AVX instructions behave differently from their older SSE counterparts. In AVX you have a 3 operand format, so out of place ops are permitted. SSE always clobbered.

There are just too many cases. If we're talking strictly about HPC with homogeneous hardware and it happens to support wide registers, it's at least a bit more feasible to hand tune certain routines for what you need using intrinsics or assembly code as necessary, but the further you move away from that, the more those optimization steps are likely to be pushed down to the compiler.

Yebubbleman · Sep 30, 2020

Jmausmuc said:
I now a little bit about processor technology and have read up on x86 and ARM architecture but I still do not really understand what makes ARM so superior to Intel or x86 technology in general that has people believing the the new ARM Macs will be much better and faster than Intel based macs.

I understand that the advantages of ARM are power efficiency and the ability to have many more cores but isn’t Intel still better in raw power in multi threaded operations?
Will ARM at first be a replacement for intels mobile processors which are arguably already worse in many ways than an A12Z or A13 or will they also be able to create a processor than can beat i9 and even Xeon processors?
Can we really expect a „night and day“ difference?

By the way - just yesterday, it was announced that the fastest supercomputer of the world is now ARM based. it uses ARM processors made by Fujitsu:https://www.arm.com/company/news/2020/06/powering-the-fastest-supercomputer
Fits perfectly to Apples announcement.

Performance per watt is huge. They'll be able to produce a computer that's twice as powerful while consuming a fraction of the power of your average Intel processor.

Bigger than performance though is a roadmap that's actually going somewhere. Intel's 7th through 10th generation processors are just retreads of its 6th generation processors from 2016. Apple's Silicon, on the other hand, has advanced tons since the A9, A9X, and A10 processors of 2016.

johngwheeler · Sep 30, 2020

Yebubbleman said:
Performance per watt is huge. They'll be able to produce a computer that's twice as powerful while consuming a fraction of the power of your average Intel processor.

Bigger than performance though is a roadmap that's actually going somewhere. Intel's 7th through 10th generation processors are just retreads of its 6th generation processors from 2016. Apple's Silicon, on the other hand, has advanced tons since the A9, A9X, and A10 processors of 2016.

I think we're still some way away from producing ARM-based machines that are "twice as powerful" as comparable Intel/AMD x86 machines, but they are certainly getting closer:

Amazon's Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute

www.anandtech.com

)

100+ Benchmarks Of Amazon's Graviton2 64-Core CPU Against AMD's EPYC 7742 - Phoronix

www.phoronix.com

The performance/watt advantages of ARM (and probably the lower manufacturing cost) directly affect the operating price though, and you can see the advantage here: https://www.anandtech.com/show/15578/cloud-clash-amazon-graviton2-arm-against-intel-and-amd/9

Running ARM Graviton2 on AWS consistently costs about 40% less than Intel/AMD for the same performance. For a business, that is huge.

The above is for cloud servers, so is not necessarily very relevant for most consumers, but we can draw the following conclusions:

1) An Apple Silicon SoC will probably be able to match or slightly exceed Intel/AMD CPUs in the same market segment, whilst using significantly less power.

2) Apple will probably balance performance vs battery life in ASi laptops - we may see something that is "about the same speed as Intel Tiger Lake" but with 15-20 hour battery life.

3) Desktops SoCs will probably try to increase the number of CPU cores and custom SoC features to beat the current high-core-count Intel & AMD CPUs.

johngwheeler · Sep 30, 2020

Here is another article that describes the performance/cost benefits of two ARM server CPUs (80-core Ampere Altra and 96-core Marvell ThunderX3.

Stacking Up Arm Server Chips Against X86

It is pretty clear at this point that there is going to be a global recession thanks to the coronavirus outbreak. Maybe it will be a V-shaped recession

www.nextplatform.com

The thing to bear in mind is that while ARM is still (in most cases) weaker than Intel and AMD in terms of core performance, it often does better overall because ARM CPUs typically have more cores (and threads in some cases).

Moreover, the ARM CPUs seem to be cheaper to buy and may be cheaper to run long-term.

Joelist · Sep 30, 2020

Hi!

Something to remember is that Apple Silicon is only "ARM" in that it uses the ISA (with a lot of Apple addons). The microarchitecture is 100% Apple designed and microarchitecture is a FAR more important determinant of performance. Apple Silicon is already more performant than anything else out there on a per core basis. This is actually covered in other posts in threads on this site - Apple Silicon cores are much more closely related to Intel Conroe than Cortex or Neoverse. They are big, wide, short pipes with super accurate branch prediction and VERY advanced memory management and cache design.

Woochoo · Sep 30, 2020

theorist9 said:
This suggests to me that, at least thus far, at the higher power scale, and in this application, no one architecture is particularly more efficient than the other.

Even though power efficiency is one of the main concerns in big clusters, a lot more comes into play when deciding which hardware the supercomputer will be built against: frameworks, support, contracts, subventions... Extrapolating that no architecture is more efficient than the other just from having multiple archs in supercomputing is a bad idea.

theorist9 said:
AWS does claim higher efficiencies for its ARM-based Graviton2 in the server space vs. Xeon designs, but here they're comparing 2.5 GHz Graviton2's with 2.9-3.2 GHz Xeons, and you'd expect higher-clocked chips to be less efficient, so this is not quite an apples-to-apples comparison.

Trying to compare both architectures at same clockspeed is misleading. They may be achieving the same Xeon's performance at lower clockspeed, and even if that's the case different arch clockspeeds differ in efficiency due to the architecture itself (cache sizes, pipelines, ALUs...): i.e 4.5GHz in a Ryzen don't consume the same as 4.5GHz in Bulldozer, even if they both have same nº of cores.

johngwheeler · Sep 30, 2020

Joelist said:
Hi!

Something to remember is that Apple Silicon is only "ARM" in that it uses the ISA (with a lot of Apple addons). The microarchitecture is 100% Apple designed and microarchitecture is a FAR more important determinant of performance. Apple Silicon is already more performant than anything else out there on a per core basis. This is actually covered in other posts in threads on this site - Apple Silicon cores are much more closely related to Intel Conroe than Cortex or Neoverse. They are big, wide, short pipes with super accurate branch prediction and VERY advanced memory management and cache design.

That's a very good point. But I'm not sure there is a huge gap between Apple Silicon and Intel Tiger Lake, particularly the 28W version (https://www.tomshardware.com/features/intel-11th-gen-tiger-lake-superfin-10nm-benchmarks). I actually think the first ASi Mac may perform slightly worse than this in Geekbench scores, but will do better is some tasks where custom silicon makes a difference, and will offer better life.

It will exciting to see what the reality turns out to be!

What makes ARM superior?

macrumors 68040

macrumors 6502a

macrumors 68040

macrumors 6502

macrumors 68040

macrumors newbie

macrumors 68040

Moderator emeritus

macrumors 6502a

macrumors 6502

macrumors regular

macrumors 68040

macrumors newbie

macrumors Core

macrumors 68040

macrumors Core

macrumors 604

macrumors Core

macrumors 604

macrumors 603

macrumors 6502a

macrumors 6502a

macrumors 6502

macrumors 6502a

macrumors 6502a

Our Staff