Intel Alder Lake vs. Apple M1

JMacHack · Mar 8, 2022

Gnattu said:
Why do you like Cinebench so much? The rendering library Cinebench uses is called Embree, a library developed by Intel and only have AVX native implementation. To use it on arm64, you have to use an avx-to-neon translation layer and it hurts the performance and will result in a lower score. The "arm64 native" is not the same level native as it is on x86, which has hand-crafted SIMD optimization.

Because it’s the place where the M1 series typically scores the lowest and he’d get gaped like ****** if he used any other benchmarks.

crazy dave · Mar 8, 2022

JouniS said:
I expect something like $6k for a 10-core Mac Pro, $10k for a 20-core model, and $15k for a 40-core model, assuming reasonable RAM/SSD options and including a decent monitor and other peripherals. The entry-level model will probably have an M1 Max or equivalent, just like the current entry-level model is comparable to a much cheaper iMac.

I kept trying to tell you that they would offer a Mac with a high core count CPU for significantly less than you were assuming.

Now a full Apple Silicon Mac Pro, whenever it gets here, will probably be more expensive than the Studio is - naturally, but Apple did introduce a desktop with lots of CPU cores for significantly less than your prediction.

Of course for my own miscalculation I felt sure that the new mini-Pro, Studio, would still have one slot of internal expansion even if it was reduced to a single x8 PCIe slot. I also thought they might offer a 16-core binned CPU in between the Max and the Ultra for less than $4000 and I have to admit the base 20-core Ultra is slightly cheaper than even I thought - probably because it doesn't have some of what I thought internally.

jeanlain · Mar 8, 2022

mi7chy said:
All the news sources I follow don't include Geekbench garbage so just a habit seeing lesser garbage Cinebench R23. Actually prefer the industry standardize on Blender. What dinosaur still renders with CPU instead of GPU?

I'd ike to know why geekbench is garbage.
As opposed to Cinebench, it's designed to be cross platform (while cinebench is more optimised for X86) and it comprises a large set of algorithms, while cinebench just does one thing: ray tracing. Cinebench is not more "real life" than geekbench, and in fact is much less representative of the overall performance.

The only argument against geekbnech appears to be that... it's runtime is too short. Which is probably better for X86 CPUs as they overheat more. (For longer tests, there's geekbench pro.)
I've heard claims that geekbench favours OSX over Windows. It doesn't. I've checked myself.

Rigby · Mar 8, 2022

leman said:
Dubious statement at best. Can you quantify how many intrinsic have a one to one correspondence? Because just browsing through things I definitely see a lot of complex mappings.

LOL. Do you want me to count them? I don't see any "complex mappings". Most of the lines in the source are just type casts which don't produce any code.

leman said:
Also, what about things SSE does not have and therefore not use but NEON does?

Such as? I have done quite a bit of optimization using various versions of SSE and AVX in my time, but don't have any experience in ARM coding on that level.

Anyway, Alder Lake is more than twice as fast in Cinebench multi-core than the M1 Max. It would require some pretty sensational intrinsics in Neon to achieve that much of an boost. If that were possible, I suspect Apple would have merged that into Embree long ago, given that it is used by some very popular projects such as Blender. But on the Embree Github page you can see them contributing optimizations that gain something like 8% ...

leman said:
I would expect that on a straightforward, well optimized SIMD workload (AVX2 vs. NEON) the difference in performance will more or less mirror the difference in clock rate, as throughout per clock is very similar between Firestorm and modern x86 architectures.

I doubt that very much. There was a significant performance improvement from AVX to AVX2, the only big difference being the longer vectors (well, and FMA for FP calculations). There are a lot of factors playing into SIMD performance besides just the execution units. For example, Alder Lake can load two 256-bit vectors per cycle from L1D (strictly speaking even two 512-bit vectors, but the consumer Alder Lakes can't make use of that since AVX512 is disabled). If you have shorter vectors, the throughput will obviously be that much lower, everything else being equal. Up to some point there is no replacement for displacement.

On the upcoming workstation Alder Lakes the difference may be even starker, since they have AVX512 enabled.

Gnattu · Mar 8, 2022

Rigby said:
Alder Lake is more than twice as fast in Cinebench multi-core than the M1 Max. It would require some pretty sensational intrinsics in Neon to achieve that much of an boost.

I recently worked with my project and the crypto backend speed is more than doubled by using a native neon implementation of chacha20 and this only happens to M1(A72 got 50% faster) Embree is having a more complex situation and it is hard to guarantee how far we can reach. By the way 8+8 Alder Lake is having more E cores, so I think comparing single core performance could be a better metric.

januarydrive7 · Mar 8, 2022

Gnattu said:
I recently worked with my project and the crypto backend speed is more than doubled by using a native neon implementation of chacha20 and this only happens to M1(A72 got 50% faster) Embree is having a more complex situation and it is hard to guarantee how far we can reach. By the way 8+8 Alder Lake is having more E cores, so I think comparing single core performance could be a better metric.

The crypto section of sse2neon header is particularly nasty, from a brief glance, with comments even mentioning that the implementations are known to be slow.

crazy dave · Mar 8, 2022

Rigby said:
LOL. Do you want me to count them? I don't see any "complex mappings". Most of the lines in the source are just type casts which don't produce any code.

Such as? I have done quite a bit of optimization using various versions of SSE and AVX in my time, but don't have any experience in ARM coding on that level.

Anyway, Alder Lake is more than twice as fast in Cinebench multi-core than the M1 Max. It would require some pretty sensational intrinsics in Neon to achieve that much of an boost. If that were possible, I suspect Apple would have merged that into Embree long ago, given that it is used by some very popular projects such as Blender. But on the Embree Github page you can see them contributing optimizations that gain something like 8% ...

I doubt that very much. There was a significant performance improvement from AVX to AVX2, the only big difference being the longer vectors (well, and FMA for FP calculations). There are a lot of factors playing into SIMD performance besides just the execution units. For example, Alder Lake can load two 256-bit vectors per cycle from L1D (strictly speaking even two 512-bit vectors, but the consumer Alder Lakes can't make use of that since AVX512 is disabled). If you have shorter vectors, the throughput will obviously be that much lower, everything else being equal. Up to some point there is no replacement for displacement. On the upcoming workstation Alder Lakes the difference may be even starker, since they have AVX512 enabled.

Personally I actually don't think the problems with CB23 stop at Intel Embree. That's part of it, but not the whole story. Other projects use Intel Embree and don't see the poor utilization of Apple cores that CB23 does. Also other renderers with CPU ray tracing don't either, the problem is specific to CB23. Again, for what it's worth Andrei has said he's talked to the developers and they had a discussion about why CB23 so underutilizes the M1 hardware (and actually non-Intel hardware seems to be an issue). What those discussions were and why remain between them. It's a closed source program and those were private discussions so we may never know. But the problem is unique to CB23 which makes it almost guaranteed to be a coding or compiling issue on their end given that we can see the power utilization for CB23 on M1 cores relative Blender or POV-ray. We've had this discussion before, I'm not sure why you're still disputing this when the numbers are up in black and white. We may not know the full picture of what's going on beneath the hood in CB23, but something is not right and it's not right with it. We know Apple's CB23 raw scores are an outlier and we know that CB23 weirdly doesn't draw as many watts as other programs or as much as it does on Intel machines which pushes Apple's perf/W scores *higher* relative to Intel.

Gerdi · Mar 8, 2022

leman said:
Dubious statement at best. Can you quantify how many intrinsic have a one to one correspondence? Because just browsing through things I definitely see a lot of complex mappings.

I think that "dubious" is the wrong word. Just browsing quickly, you almost see no instruction with 1:1 mapping. 1:2 is very common, and there are a few very expensive operations.
But just one of the many "highlights" like a simple blend operation:

#define _mm_blend_epi16(a, b, imm) \
__extension__({ \
const uint16_t _mask[8] = {((imm) & (1 << 0)) ? 0xFFFF : 0x0000, \
((imm) & (1 << 1)) ? 0xFFFF : 0x0000, \
((imm) & (1 << 2)) ? 0xFFFF : 0x0000, \
((imm) & (1 << 3)) ? 0xFFFF : 0x0000, \
((imm) & (1 << 4)) ? 0xFFFF : 0x0000, \
((imm) & (1 << 5)) ? 0xFFFF : 0x0000, \
((imm) & (1 << 6)) ? 0xFFFF : 0x0000, \
((imm) & (1 << 7)) ? 0xFFFF : 0x0000}; \
uint16x8_t _mask_vec = vld1q_u16(_mask); \
uint16x8_t _a = vreinterpretq_u16_m128i(a); \
uint16x8_t _b = vreinterpretq_u16_m128i(b); \
vreinterpretq_m128i_u16(vbslq_u16(_mask_vec, _b, _a)); \
})

And thats just because the immidiate is in a different format, than you would usually employ when using NEON. In one of my previous posts I already did point out big issues because different constant formats between SSE and NEON - and this is a perfect example.

oz_rkie · Mar 8, 2022

leman said:
Yeah, it’s a shame that all the folks known for in-depth reviews have quit. We are left with barely competent folks like Linux Tech Tips and barely incompetent folks like Max Tech 😑 Oh well…

There is AnandTech that still does pretty good in-depth reviews and they cover macs as well. There's an initial deep dive commentary of today's launch by Dr. Ian Cutress

Gerdi · Mar 8, 2022

crazy dave said:
Personally I actually don't think the problems with CB23 stop at Intel Embree. That's part of it, but not the whole story. Other projects use Intel Embree and don't see the poor utilization of Apple cores that CB23 does.

I cannot speak for M1, but i do see the same bad relative performance on my Surface Pro X, wherever Embree is used. This includes for instance Blender. Besides Povray does not use Embree. The issue with Povray is, that it is lacking a reasonable good NEON implementation at all (while it does have an AVX/SEE code path).

crazy dave · Mar 8, 2022

oz_rkie said:
There is AnandTech that still does pretty good in-depth reviews and they cover macs as well. There's an initial deep dive commentary of today's launch by Dr. Ian Cutress

Sadly he’s not at Anandtech anymore

Andrei left as well

oz_rkie · Mar 8, 2022

crazy dave said:
Sadly he’s not at Anandtech anymore

Andrei left as well

Are you sure? I see an article as recent as 18th Feb by him on AnandTech - https://www.anandtech.com/show/17243/anandtech-interview-with-dr-ann-kelleher

A bit of a shame for Anandtech if he's really left though.

In any case the above video by him is a pretty good watch.

crazy dave · Mar 8, 2022

oz_rkie said:
Are you sure? I see an article as recent as 18th Feb by him on AnandTech - https://www.anandtech.com/show/17243/anandtech-interview-with-dr-ann-kelleher

A bit of a shame for Anandtech if he's really left though.

In any case the above video by him is a pretty good watch.

Yup

From There to Here, and Beyond

www.anandtech.com

crazy dave · Mar 8, 2022

Gerdi said:
I cannot speak for M1, but i do see the same bad relative performance on my Surface Pro X, wherever Embree is used. This includes for instance Blender. Besides Povray does not use Embree. The issue with Povray is, that it is lacking a reasonable good NEON implementation at all (while it does have an AVX/SEE code path).

Yeah I think Embree is part of it but I saw another (toy) project using Embree that didn’t suffer as badly as CB23 on the M1 vs Intel but the post writer emphasized that they had a much simpler ray tracing framework than most professional renderers.

Gerdi · Mar 8, 2022

crazy dave said:
Yeah I think Embree is part of it but I saw another (toy) project using Embree that didn’t suffer as badly as CB23 on the M1 vs Intel but the post writer emphasized that they had a much simpler ray tracing framework than most professional renderers.

Surely the instruction distribution plays a role. I did update my previous post with a particular example of a simple blend operation you might want to take look at.
And there are many of such examples, which can be simply explained by different immediate formats between SSE and NEON.
Thing is the caller (from C++ code) is providing an SSE compliant immediate and the emulation converts this back into an NEON complaint immediate. If the caller would directly call NEON, it would surely pass a NEON compliant immediate.

JimmyjamesEU · Mar 8, 2022

48 core Geekbench opencl result. 77000. Pretty disappointing but not that surprising given how anandtech mentioned the test doesn’t take long enough to wake up all the gpu cores.

Mac13,2 - Geekbench

Benchmark results for a Mac13,2 with an Apple M1 Ultra processor.

browser.geekbench.com

crazy dave · Mar 8, 2022

Gerdi said:
Surely the instruction distribution plays a role. I did update my previous post with a particular example of a simple blend operation you might want to take look at.
And there are many of such examples, which can be simply explained by different immediate formats between SSE and NEON.

Absolutely. Again, I think Embree is definitely part of it - I just suspect CB23’s problems extend to more than just the ray tracing portions of their code. Blender’s BMW test uses Intel Embree too and the M1/Pro/Max CPU does okay on that test (though Apple’s engineers note it’s performance can still improved - as of 4 days ago they submitted a pull request to Embree that they say resolves the outstanding issues to get an 8% performance increase: https://github.com/embree/embree/pull/330). It’s a suspicion, I have no proof.

Gerdi · Mar 8, 2022

crazy dave said:
Blender’s BMW test uses Intel Embree too and the M1/Pro/Max CPU does okay on that test (though Apple’s engineers note it’s performance can still improved - as of 4 days ago they submitted a pull request to Embree that they say resolves the outstanding issues to get an 8% performance increase: https://github.com/embree/embree/pull/330). It’s a suspicion, I have no proof.

If i read this correctly the original pull request is from mid 2021 and it is still not resolved due to bugs

dmr727 · Mar 8, 2022

crazy dave said:
Yup

From There to Here, and Beyond

www.anandtech.com

Huh - I didn't realize that. Bummer.

crazy dave · Mar 8, 2022

Gerdi said:
If i read this correctly the original pull request is from mid 2021 and it is still not resolved due to bugs

In the final comment from 4 days ago Apple submitted a new patch that they say fixed the bugs so hopefully if true it’ll be accepted soon

JouniS · Mar 8, 2022

crazy dave said:
I kept trying to tell you that they would offer a Mac with a high core count CPU for significantly less than you were assuming. Now a full Apple Silicon Mac Pro, whenever it gets here, will probably be more expensive than the Studio is - naturally, but Apple did introduce a desktop with lots of CPU cores for significantly less than your prediction.

It's certainly cheaper than I expected, partly because we got a consumer desktop instead of a workstation. And because it's also the replacement for the normal-sized iMac.

A base model (M1 Max / 32 GB / 2 TB), one Studio Display, and basic peripherals is about $4500. The same system with M1 Ultra and 64 GB RAM is ~$6500. That's roughly $1100 and $800 per M1 equivalent, which is at the lower end of what Apple is losing by not being able to sell more M1 devices. I used 2 TB as the minimum reasonable SSD size, because external connectivity is limited and you would need 2 Thunderbolt ports to achieve similar performance with an external drive.

The prices for GPU upgrades are interesting, because you pay them only to prevent Apple from crippling your GPU deliberately. The upgrade from 24 cores to 32 cores is reasonable at $200, while the comparable upgrade from 48 cores to 64 cores costs you $1000.

I'm kind of disappointed with the Mac Studio, but I've also been expecting this from the moment I first saw the specs of the M1. It's a nice machine on the paper, but it's also more expensive than I'm willing to pay and less capable than I need in the aspects I care about. The configuration I would choose costs $5800 without a monitor or peripherals, and you don't even get 256 GB RAM for that price.

ArkSingularity · Mar 8, 2022

JouniS said:
It's certainly cheaper than I expected, partly because we got a consumer desktop instead of a workstation. And because it's also the replacement for the normal-sized iMac.

A base model (M1 Max / 32 GB / 2 TB), one Studio Display, and basic peripherals is about $4500. The same system with M1 Ultra and 64 GB RAM is ~$6500. That's roughly $1100 and $800 per M1 equivalent, which is at the lower end of what Apple is losing by not being able to sell more M1 devices. I used 2 TB as the minimum reasonable SSD size, because external connectivity is limited and you would need 2 Thunderbolt ports to achieve similar performance with an external drive.

The prices for GPU upgrades are interesting, because you pay them only to prevent Apple from crippling your GPU deliberately. The upgrade from 24 cores to 32 cores is reasonable at $200, while the comparable upgrade from 48 cores to 64 cores costs you $1000.

I'm kind of disappointed with the Mac Studio, but I've also been expecting this from the moment I first saw the specs of the M1. It's a nice machine on the paper, but it's also more expensive than I'm willing to pay and less capable than I need in the aspects I care about. The configuration I would choose costs $5800 without a monitor or peripherals, and you don't even get 256 GB RAM for that price.

The very highest end from Apple almost always costs a premium. They'll give you something mid-high tier at a very reasonable price though. The 48 GPU configuration is already a smoke show in terms of performance.

The CPU binning is understandable, but I have a feeling 56 or 60 GPU cores would have been very achievable. The marketing department probably took over on this one and said "no, 48 is good" - and now getting the full 64 costs a premium.

Overall, the binned versions are still one heck of a bargain though. You can't find this kind of performance at this price point in the X86 world.

huge_apple_fangirl · Mar 8, 2022

JouniS said:
The prices for GPU upgrades are interesting, because you pay them only to prevent Apple from crippling your GPU deliberately. The upgrade from 24 cores to 32 cores is reasonable at $200, while the comparable upgrade from 48 cores to 64 cores costs you $1000.

How are they crippling the GPU deliberately? It's called binning.

JouniS · Mar 8, 2022

huge_apple_fangirl said:
How are they crippling the GPU deliberately? It's called binning.

Binning is usually done by deliberately crippling perfectly functional chips. You can't produce lower-end chips in sufficient quantities if you only wait for dies that have flaws in the right parts. Especially not with chips such as the M1 Max, where GPU cores only take a small fraction of total die area.

crazy dave · Mar 8, 2022

JouniS said:
It's certainly cheaper than I expected, partly because we got a consumer desktop instead of a workstation. And because it's also the replacement for the normal-sized iMac.

A base model (M1 Max / 32 GB / 2 TB), one Studio Display, and basic peripherals is about $4500. The same system with M1 Ultra and 64 GB RAM is ~$6500. That's roughly $1100 and $800 per M1 equivalent, which is at the lower end of what Apple is losing by not being able to sell more M1 devices. I used 2 TB as the minimum reasonable SSD size, because external connectivity is limited and you would need 2 Thunderbolt ports to achieve similar performance with an external drive.

The prices for GPU upgrades are interesting, because you pay them only to prevent Apple from crippling your GPU deliberately. The upgrade from 24 cores to 32 cores is reasonable at $200, while the comparable upgrade from 48 cores to 64 cores costs you $1000.

I'm kind of disappointed with the Mac Studio, but I've also been expecting this from the moment I first saw the specs of the M1. It's a nice machine on the paper, but it's also more expensive than I'm willing to pay and less capable than I need in the aspects I care about. The configuration I would choose costs $5800 without a monitor or peripherals, and you don't even get 256 GB RAM for that price.

True though you don't have to get a studio display if that's over kill - plenty of other displays with decent specs depending on what you want. I agree that the price increase from 48 to 64 cores is high but I also have to admit that the base 20 core price is lower than even I thought it would be. So the absolute price of the 64 core is still roughly the same as I thought and this is pretty much the machine I thought they were going to make though as aforementioned I did think there would be a small PCIe expansion slot or M2.

Just curious did you want more GPU with less CPU? My guess is that kind of modularity won't come until the M2 or more likely M3 based on the rumors when a greater proportion of the SOCs are actually made from chiplets that can be put together like legos. Then there might be more variations of the SOC product stack.

Intel Alder Lake vs. Apple M1

Suspended

macrumors 68000

macrumors 68020

macrumors 603

macrumors 65816

macrumors 6502a

macrumors 68000

macrumors 6502

macrumors regular

macrumors 6502

macrumors 68000

macrumors regular

macrumors 68000

macrumors 68000

macrumors 6502

Suspended

macrumors 68000

macrumors 6502

macrumors G4

macrumors 68000

macrumors 6502a

macrumors 6502a

macrumors 6502a

macrumors 6502a

macrumors 68000

Our Staff