Intel Alder Lake vs. Apple M1

JimmyjamesEU · Mar 8, 2022

mi7chy said:
Same also applies to CPU to utilize 20 threads.

No it doesn’t.

JimmyjamesEU · Mar 8, 2022

mi7chy said:
All the news sources I follow don't include Geekbench garbage so just a habit seeing lesser garbage Cinebench R23. Actually prefer the industry standardize on Blender.

Probably more to do with you following terrible news sources than anything to do with geekbench. After all you tried to pass off hashcat as a valid metric, and also a video where someone excluded hardware acceleration on a Mac vs the pc with hardware acceleration.

oz_rkie · Mar 8, 2022

MayaUser said:
remember how much the mac pro costs....with afterburner and after look at the mac studio
Amen

Well, I come from the Windows world and only recently bought into macs (have 2 m1s at the moment for work and personal use), so I was mainly comparing to the cost of building a high end workstation where I can pick a high end thread-ripper and an entry level GPU if I'd like, to suit my use case. Would be amazing if I had similar options on the mac side as well (powerful cpu and entry level gpu).

Mac Pro was a ridiculously priced machine anyway, I mean apart from workplaces/studios that had a very specific need to buy a macOS based machine with those specs, no one in their right mind would buy it unless money was literally not a concern for them. I would put the Mac pro very much in the professional space in terms of pricing where as the just announced mac studio would fall into a prosumer space (atleast imo).

januarydrive7 · Mar 8, 2022

oz_rkie said:
Mac Pro was a ridiculously priced machine anyway, I mean apart from workplaces/studios that had a very specific need to buy a macOS based machine with those specs, no one in their right mind would buy it unless money was literally not a concern for them. I would put the Mac pro very much in the professional space in terms of pricing where as the just announced mac studio would fall into a prosumer space (atleast imo).

On the contrary, those who purchase these high-prices machines do so specifically because money is a concern for them -- they make a lot more money with better machines, it makes perfect sense to spend the money to make more.

Gnattu · Mar 8, 2022

Rigby said:
There is still no evidence that this causes the rather big discrepancy. Apple is active in the Embree project, so nothing prevents them from "hand-crafting" optimizations for the M1.

Oh, come on, the embree library is open-source, there is no native neon implementation inside it.

Rigby said:
I think it's more likely that Apple's SIMD instructions (which are only half as wide as Intel's AVX 2) are simply not as fast, perhaps because Apple prefered to spend the chip real estate on other things (such as their highly specialized video encoders).

Wrong and wrong. Apple's SIMD is not narrower than Intel's in the hardware, where Intel has 2 AVX2 and Apple as 4 neon 128, each adds up to 512 bit wide SIMD.

oz_rkie · Mar 8, 2022

januarydrive7 said:
On the contrary, those who purchase these high-prices machines do so specifically because money is a concern for them -- they make a lot more money with better machines, it makes perfect sense to spend the money to make more.

I already mentioned that, if you read the part before the other part you highlighted

I mean apart from workplaces/studios that had a very specific need to buy a macOS based machine with those specs

I was responding to MayaUser to say that I would not compare the pricing model of the Mac Pro to the Mac Studio, simply because Mac Pro was a professional device where only the people who were making a serious ROI from the machine would be buying it where as with the mac studio, it is (relatively) cheaper that consumers/prosumers who might not make a real ROI from the device might still want to get it. My point was that in such cases, it is a bit of a shame that if you were looking to buy the highest end CPU mac has to offer, you also NEED to buy the GPU even if you don't need it.

cmaier · Mar 8, 2022

mi7chy said:
Same also applies to CPU to utilize 20 threads.

? What? No, it doesn’t.

januarydrive7 · Mar 8, 2022

oz_rkie said:
I already mentioned that, if you read the part before the other part you highlighted

I mean apart from workplaces/studios that had a very specific need to buy a macOS based machine with those specs

I was responding to MayaUser to say that I would not compare the pricing model of the Mac Pro to the Mac Studio, simply because Mac Pro was a professional device where only the people who were making a serious ROI from the machine would be buying it where as with the mac studio, it is (relatively) cheaper that consumers/prosumers who might not make a real ROI from the device might still want to get it. My point was that in such cases, it is a bit of a shame that if you were looking to buy the highest end CPU mac has to offer, you also NEED to buy the GPU even if you don't need it.

Good point, although the base M1 does provide the highest end CPU (ST) mac has to offer. If you're getting pro/max/ultra, then presumably you're looking for a machine either with higher end GPU or better MT due to workflows that would benefit your bottom line.

Your point stands, for enthusiasts who want a lot of MT for the sake of a lot of MT.

Gnattu · Mar 8, 2022

Rigby said:
There is still no evidence that this causes the rather big discrepancy.

If you want evidence, let me give you the evidence.

Below is from the patch that implements Apple silicon support. This is only a CMake config and not the implementation, so that it is readable enough for ones without programming background. When building for arm64 on Mac, Embree emulates avx2 on top of neon instead of implementing a native neon code path.

leman · Mar 8, 2022

Rigby said:
Presumably for the same reason that Apple fans simply discount it.

There are good reasons to discount it, or at least not taking it too seriously. The fact that it doesn’t properly take advantage of M1 SIMD hardware is one big red flag.

Rigby said:
There is still no evidence that this causes the rather big discrepancy. Apple is active in the Embree project, so nothing prevents them from "hand-crafting" optimizations for the M1. I think it's more likely that Apple's SIMD instructions (which are only half as wide as Intel's AVX 2) are simply not as fast, perhaps because Apple prefered to spend the chip real estate on other things (such as their highly specialized video encoders).

Apples SIMD units are only half as wide, but Apple has twice as many of them. Apple runs on a lower clock, so on small problem sizes and very data-parallel code Intel will have a slight edge due to higher clocks, but Apple has much larger caches so they will perform better in more complex problems.

The big issue with embree is that M1 path is currently coded using x86 SSE on top of an SSE-to-NEON emulation layer. Apple can do little tweaks here and there, but they are still limited by the overall approach. And rewriting everything in NEON on a project owned by a competitor is an unreasonable waste of resources.

Andropov · Mar 8, 2022

Rigby said:
Apple is active in the Embree project, so nothing prevents them from "hand-crafting" optimizations for the M1.

I have a pile of issues of a certain journal on my desk that I intend to read. I actively read them from time to time. Certainly nothing prevents me from reading all of them. And yet, the pile grows larger every week. I wonder how's that possible.

Gnattu · Mar 8, 2022

leman said:
And rewriting everything in NEON on a project owned by a competitor is an unreasonable waste of resources.

This is not necessarily the case though. Embree is an open-source project licensed under Apache License 2.0, Apple can also use this library if they want, they can even make a folk if Intel really don't want to cooperate. If it has real-world benefit other than make the benchmark numbers look prettier, they can, and in my opinion, should provide an optimized neon implementation. Vendors contribute their code to many math and crypto libraries, where platform dependent code paths are all over the place and the general implementation is usually too slow.

JimmyjamesEU · Mar 8, 2022

Mac13,2 - Geekbench

Benchmark results for a Mac13,2 with an Apple M1 Ultra processor.

browser.geekbench.com

januarydrive7 · Mar 8, 2022

Gnattu said:
This is not necessarily the case though. Embree is an open-source project licensed under Apache License 2.0, Apple can also use this library if they want, they can even make a folk if Intel really don't want to cooperate. If it has real-world benefit other than make the benchmark numbers look prettier, they can, and in my opinion, should provide an optimized neon implementation. Vendors contribute their code to many math and crypto libraries, where platform dependent code paths are all over the place and the general implementation is usually too slow.

The "gotcha" here is that the anti-Apple crowd would then say that Apple is somehow cheating in benchmarks by optimizing for their platform. As usual, it's a lose-lose situation.

Gnattu · Mar 8, 2022

januarydrive7 said:
The "gotcha" here is that the anti-Apple crowd would then say that Apple is somehow cheating in benchmarks by optimizing for their platform. As usual, it's a lose-lose situation.

I don't think "optimization" is "cheating" if the workload can indeed be optimized this way. I have a faster method to solve a math problem and get the right answer, would my math teacher call me cheating? Not at all.

dgdosen · Mar 8, 2022

JimmyjamesEU said:
Mac13,2 - Geekbench

Benchmark results for a Mac13,2 with an Apple M1 Ultra processor.

browser.geekbench.com

If true, all Alder Lake benchmark wars will switch to single core perf

Gnattu · Mar 8, 2022

dgdosen said:
If true, all Alder Lake benchmark wars will switch to single core perf

Which is expected as the Firestorm core design does not allow it to be clocked significantly higher.

januarydrive7 · Mar 8, 2022

Gnattu said:
I don't think "optimization" is "cheating" if the workload can indeed be optimized this way. I have a faster method to solve a math problem and get the right answer, would my math teacher call me cheating? Not at all.

I don't disagree with you, I'm just extrapolating the arguments I've seen in the past; e.g., arguments along the lines of: "M1-based machines are only better at ProRes workflows because they're cheating."

Also, your math teacher would likely be reasonable, unlike a lot of the rhetoric we see here.

Gnattu · Mar 8, 2022

januarydrive7 said:
I don't disagree with you, I'm just extrapolating the arguments I've seen in the past; e.g., arguments along the lines of: "M1-based machines are only better at ProRes workflows because they're cheating."

Also, your math teacher would likely be reasonable, unlike a lot of the rhetoric we see here.

Yes, for all workloads that they perform well, they are cheating. Only the ones they suck at can reflect the real performance. M1 series are simply that bad😆.

Rigby · Mar 8, 2022

Gnattu said:
Oh, come on, the embree library is open-source, there is no native neon implementation inside it.

Embree uses the SSE2NEON macro library, which converts SSE intrinsics to Neon intrinsics at compile time. There is no emulation as long as you don't use wider vectors than 128-bit (which Neon does not support).

Gnattu said:
Wrong and wrong. Apple's SIMD is not narrower than Intel's in the hardware, where Intel has 2 AVX2 and Apple as 4 neon 128, each adds up to 512 bit wide SIMD.

Recent Intel CPUs (including Alder Lake) actually have 3 ports with vector ALUs. But even if what you wrote was accurate, you can't just equate 2x256-wide with 4x128-wide since there are many other factors that determine the SIMD performance (such as load througput).

leman · Mar 8, 2022

Gnattu said:
This is not necessarily the case though. Embree is an open-source project licensed under Apache License 2.0, Apple can also use this library if they want, they can even make a folk if Intel really don't want to cooperate. If it has real-world benefit other than make the benchmark numbers look prettier, they can, and in my opinion, should provide an optimized neon implementation. Vendors contribute their code to many math and crypto libraries, where platform dependent code paths are all over the place and the general implementation is usually too slow.

Forking is out of the question, as nobody would use Apples fork. And while I am sure that Apple could submit a NEON backend to the official emreee… what would be the point, really? Apple already has an official ray tracing framework on their platforms - Metal. And that’s where their focus is going forward.

Rigby · Mar 8, 2022

Gnattu said:
If you want evidence, let me give you the evidence.

Below is from the patch that implements Apple silicon support. This is only a CMake config and not the implementation, so that it is readable enough for ones without programming background. When building for arm64 on Mac, Embree emulates avx2 on top of neon instead of implementing a native neon code path.
View attachment 1969913

All that shows is that the Neon SIMD instructions are more limited. Obviously AVX2 (256-bit) intrinsics need to be emulated because Neon is limited to 128-bit vectors. But you can also choose to use SSE (128-bit) instructions as target during compilation, in which case they will be converted at compile time and no emulation takes place.

leman · Mar 8, 2022

Rigby said:
Embree uses the SSE2NEON macro library, which converts SSE intrinsics to Neon intrinsics at compile time. There is no emulation as long as you don't use wider vectors than 128-bit (which Neon does not support).

Of course there is emulation. SSE2NEON emulates ISA semantics. Not all things have a one to one mapping, and many algorithms would be developed differently for SSE and NEON to achieve optimal results. For example, my SSE-optimized geometry primitive code heavily relies on PMOVMSKB, but that approach doesn’t work at all for ARM which lacks this instruction family. But NEON has very fast horizontal operations one can use instead.

Rigby said:
Recent Intel CPUs (including Alder Lake) actually have 3 ports with vector ALUs.

Not all of which have the same capabilities.

Rigby · Mar 8, 2022

leman said:
Of course there is emulation. SSE2NEON emulates ISA semantics. Not all things have a one to one mapping, and many algorithms would be developed differently for SSE and NEON to achieve optimal results. For example, my SSE-optimized geometry primitive code heavily relies on PMOVMSKB, but that approach doesn’t work at all for ARM which lacks this instruction family. But NEON has very fast horizontal operations one can use instead.

The vast majority of SSE intrinsics are converted 1:1:

https://github.com/embree/embree/blob/master/common/simd/arm/sse2neon.h

leman said:
Not all of which have the same capabilities.

Do the M1's SIMD units all have the same capabilities?

leman · Mar 8, 2022

Rigby said:
The vast majority of SSE intrinsics are converted 1:1:

https://github.com/embree/embree/blob/master/common/simd/arm/sse2neon.h

Dubious statement at best. Can you quantify how many intrinsic have a one to one correspondence? Because just browsing through things I definitely see a lot of complex mappings. Also, what about things SSE does not have and therefore not use but NEON does?

Rigby said:
Do the M1's SIMD units all have the same capabilities?

Yes, with the exception of division and special function which are limited to a single port. But for mul, add, fma Firestorm has double throughput of any x86 CPU, which compensates the narrower SIMD width. Of course, the situation is complicated by the fact that Intel can do addition and multiplication on different ports, so you can do two 256bit adds and one 256bit mul or via versa in parallel under certain circumstances, and there are differences in clocks, latencies and cache performance, but yeah… in general, I would expect that on a straightforward, well optimized SIMD workload (AVX2 vs. NEON) the difference in performance will more or less mirror the difference in clock rate, as throughout per clock is very similar between Firestorm and modern x86 architectures.

Intel Alder Lake vs. Apple M1

Suspended

Suspended

macrumors regular

macrumors 6502a

macrumors 65816

macrumors regular

Suspended

macrumors 6502a

macrumors 65816

macrumors Core

macrumors 6502a

macrumors 65816

Suspended

macrumors 6502a

macrumors 65816

macrumors 68030

macrumors 65816

macrumors 6502a

macrumors 65816

macrumors 603

macrumors Core

macrumors 603

macrumors Core

macrumors 603

macrumors Core

Our Staff