Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

neinjohn

macrumors regular
Nov 9, 2020
107
70
Reading the Anandtech article one doubt crept in. They compare the E-cores (with DDR5) to the old Skylake Core i7-6700K and it's about on the same ballpark (with DDR5) with about a 50% power reduction.

Now I thought a hypothetical Alder Lake SKU with 2 P-cores and 6/8 E-cores to be exchanged directly on a system with the DDR4 and whatever cooling solution would be adequate to cool the Skylake chip. You would just exchange the CPUs, everything else keeping equal. How much a MT would be above the Skylake chip without DDR5 and would be fan hamp out a lot for the default P-Core turbo behaviour on single-thread tasks? Or it's more or less the same as Skylake core on wattage.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
Embree is a set of open source libraries, not a compiler. And I doubt that the ARM port is maintained by Intel.

Only if CB uses Embry and it outputs ARM and X86 binaries?


The Embree open source library, including the non-Intel code paths, is in fact maintained by Intel. They are working on improving performance on Apple’s cores in AVX2 equivalent workloads. There’s an open pull request from what looks like Apple’s team and they are discussing the best way forward. So far there’s a proposal to speed it up by 8% but that broke a bunch of other stuff and the proposer said they had found other issues that need addressing wrt performance but didn’t specify.

However, this may not be the only issue, as embree on some simple ray tracing is the same on the M1 as Intel:


He stresses that complicated ray tracing geometries may perform differently so he’s not testing embree’s performance on the M1 across the full range of possible scenes (which is probably why Apple was able to get speed ups in AVX2 equivalent workloads). And of course a production renderer has other facets that may speed up or slow down depending on how their implemented. None of the renderers including his own seem to behave like CB though. It is an outlier in renderers.
 
Last edited:

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,678
Maybe. But it could also be an intrinsic weakness of the M1 SIMD instructions compared to AVX2. For example, Neon is only 128-bit, while AVX2 is 256.

M1 can do 4x 128-bit operations per cycle, x86 implementations can do 2x 256-bit operations per cycle (most of the time at least). Total throughput is comparable in the end (x86 CPUs might have a slight advantage on long SIMD sequences because they run higher clock), but Apple's implementation is more flexible, as it usually allows more data to be processed on average. This is also illustrated for how well M1 does in various fp-focused benchmarks (in SPEC2017fp M1 Pro/Max are as fast as the full i9-12900K).

AVX2 can be better in some throughput-oriented streaming-style tasks where you know that you will be doing a lot of sequential SIMD processing. Raytracing is not one of them though.
 
  • Like
Reactions: Bodhitree

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
M1 can do 4x 128-bit operations per cycle, x86 implementations can do 2x 256-bit operations per cycle (most of the time at least). Total throughput is comparable in the end (x86 CPUs might have a slight advantage on long SIMD sequences because they run higher clock), but Apple's implementation is more flexible, as it usually allows more data to be processed on average. This is also illustrated for how well M1 does in various fp-focused benchmarks (in SPEC2017fp M1 Pro/Max are as fast as the full i9-12900K).

AVX2 can be better in some throughput-oriented streaming-style tasks where you know that you will be doing a lot of sequential SIMD processing. Raytracing is not one of them though.

Funnily enough Apple’s pull request on Embree is to treat their 4-wide as 8-wide AVX2 which Intel assumed would be slower. Intel now agrees that Apple’s solution is faster but unfortunately it broke a lot of code paths, including… ray tracing. So maybe less clear on its effect on performance there? Anyway Apple says they found other issues and that they’d have to work on it some more. That was the last post.


It should be noted that, despite the above identified performance issues, other renderers and ray tracers can use Embree too and, of those, the worst performing on the M1 is still CB for some reason.
 
Last edited:

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
This conversation got balls deep into bench discussion, I’m gonna wait until people have alder lake laptops to make any opinions. I think this is the best course of action.

Yeah we’re pretty deep in the weeds on this one. 🙃
 

aevan

macrumors 601
Feb 5, 2015
4,539
7,236
Serbia
What I’m interested in is why some people are so hoping that Alder Lake is better. Apple is not going back to Intel CPUs, ever - so if you want to work on a Mac, you won’t benefit from anything Intel does. And if you are a PC user, why are you on a Mac forum? I get competition is good, so I don‘t mind if Intel or AMD catch up - but some of you seem to be actively cheering for them, which - again - makes me wonder what are you doing on a Mac enthusiast forum?

With that said, while I doubt Alder Lake will be better in many workflows, even if it is, it will be a while before we see them in laptops before Q1 2022 - and remember, M1 Pro and Max are still based on a year-old chip. So, basically, some 2022 chip may be faster than an Apple chip based on an architecture launched in 2020. This doesn’t sound like “chipzilla is awake” to me.

And we’re yet to see how fast and responsive the system is in practice - let me tell you, I’ve tried a lot of computers over the years, from i9s to 32-core Threadrippers, I’ve never seen such responsive computers as Apple M1 Macs. Just in everyday stuff. There’s more to actual experience than benchmarks - and even in benchmark land, you can find specific ones where one side has the edge and vice versa. We’ll have to wait and see actual products.

But this was never a race. Mac users care about Macs. If you prefer a Mac, you’ll work on a Mac even if the PC has better performance - as was the case before. My guess is that PC users are the one having an identity crisis, because if they don’t have the performance edge - what do they have? Windows 11?

I am personally glad that new MacBook Pros are much faster then old MacBook Pros, and that they barely turn on their fans and that their battery lasts a long time. If some future Dell or Lenovo is faster, I don’t care - because I‘m not going to get them anyway. I guess it will be good to push Apple to work even harder, though, honestly, Apple has always been in their own bubble anyway.
 

ozthegweat

macrumors regular
Feb 20, 2007
247
229
Switzerland
What I’m interested in is why some people are so hoping that Alder Lake is better. Apple is not going back to Intel CPUs, ever - so if you want to work on a Mac, you won’t benefit from anything Intel does. And if you are a PC user, why are you on a Mac forum? I get competition is good, so I don‘t mind if Intel or AMD catch up - but some of you seem to be actively cheering for them, which - again - makes me wonder what are you doing on a Mac enthusiast forum?
Re: the bolded part: you said it yourself two sentences later – competition is good. Apple will be making better products if they have fierce competition. So as someone who loves his three Macs and who would probably decline any future job offers if working with a Mac wouldn't be possible, I think it's fantastic if Intel produces great products again.

Also, a lot of people are Mac enthusiasts but have no choice but to work with Windows machines. They can be ecstatic that Intel steps up their game, but still love their Macs. The concept of "you're either with us or against us" or, more aptly, "either you're a fanboy or else GTFO" has never benefited anyone.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
I haven't followed the project, but it looks like the M1 support was contributed by an external developer (cbenthin), possibly based on a previous independent ARM port. But good for them.

Cbenthin is also Intel ;)

Developer-Ecosystem-Engineering in the pull request is Apple’s open source support team.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
Oops. I thought it was the other way around. But if Apple itself is active in the project, all the better.

Yeah it looks like they’re getting involved at least as of a month ago.

I should state it might be unfair to put this all on Embree. Other projects that use don’t seem to suffer the same issues at least not to that extent. For what it’s worth Andrei @ Anandtech says he’s been in contact with the CB devs and knows what is going on, but I’m guessing due to the private nature of the conversations probably can’t disclose it since it isn’t an open source program.
 
  • Like
Reactions: Andropov

tomO2013

macrumors member
Feb 11, 2020
67
102
Canada
I have been quietly sitting back and listening (reading) to lots of different really great opinions and viewpoints and it‘s really great to hear all of the different perspectives here. :)

If I may, I’d like to add a viewpoint that I hope contributes to the friendly discussion that is taking place :)

The conversation around ‘lack of optimized software‘ for M1 I think warrants further discussion - particularly against, what does that actually mean!?.

There is definitely room to grow for some of the larger apps like Adobe, AutoCAD, Maxon Cinema 4d etc… However there is also a wealth of apps already today that have been ported at the very least from x86 Rosetta —> to ARM64 and in the best cases have been fully optimized on Apple Silicon to use the accelerators and co-processors that only become available to you when you go through Apple‘s API and Compiler stack. I think that last part and level of optimization is important to differentiate between … ARM64 optimized and Apple Silicon optimized.
Of course, not all problems lend themselves to Apples’ co-processors and accelerators.
Yet, when you consider all the background jobs that are taking place when even running menial tasks on Windows or Mac OS, no doubt farming off instructions to dedicated, fast, high energy efficient co-processors then also has the side benefit of freeing up the ALU/FPU for other more traditional workloads. Everybody wins!

A case in point is utilization of the AMX co-processor in Apple Silicon. Apple’s own native APIs compile to instructions targetting co-processors such as the AMX matrix co-processor that I believe is still not publicly acknowledged by Apple (likely for ARM licensing reasons) but nevertheless is a matrix process that speeds up matrix workloads (as succinct from the neural engine that is advertised by Apple).
Here is a really lovely essay from Erik Engheim that presents this in a vastly superior way to any way that I can regurgitate here!
Link to M1, Co-Processors and Accelerators discussion

Apples SoC architecture and overall ‘own the entire widget’ approach to design uniquely positions them to take a path forwards towards long term scalable performance beyond the traditional ‘throw more wattage, increase clock speed, use longer pipelines, shrink to a smaller die, throw more cores’ approach.

Intel on the other hand (at least from a business perspective) doesn’t have the same ‘ease’ with which a similar SoC approach could be taken towards performance because adding co-processors and accelerators to your silicon design means that you also need to have tighter integration and industry alignment (this one in particularly should not be underestimated from a business perspective). Because Intel and AMD need to partner with large vendors such as Microsoft - they need to ensure partnership, agreement and alignment with their silicon vision so that the entire ecosystem (dev tools, Operating System, right down to the silicon) is aware of these co-processors / accelerators and can take advantage. This takes time. Getting alignment in a single organization is challenging, to do so across companies is incredibly difficult. In that sense, one could argue that this is more a business ‘people’ problem than a technical problem.

Apple can not only drive more efficient (and more powerful designs per wattage) from their approach, but can unilaterally dictate the rollout timeframes (at least to native 1st and 3rd party software) for solutions. They still need to convince 3rd party developers to develop for what is effectively a niche Mac platform as judged by market share. However at least today when a developer builds for Mac, he/she/they are also in a position to port to iPad or iPhone where Apple commands a sizeable market share and in turn a larger revenue stream worth pursuing.

That being said, Apple is making big efforts to contribute to open source projects in order to drive Apple Silicon optimization where possible.
Regarding cinebench and Maxon 4d, I fully expect to see significant performance improvements as and when Maxon 4d optimizes more for Apple Silicon stack.
These numbers that we are seeing today are IMHO a worst case scenario for M1, M1 Pro and M1 Pro Max - and yet we are comparing a laptop chip (very favourably on a raw performance) with the absolute latest and greatest desktop/workstation class Core i9 Desktop/Workstation chip.

Finally I’m not sure if anybody checked out Apples videos on raytracing and ray tracing acceleration during WWDC this year - but there is some nice documentation on how to accelerate ray tracing on Apple Silicon https://developer.apple.com/documentation/metal/accelerating_ray_tracing_using_metal/
Obviously the level or precision may not be sufficient for some of the fine folks here where a fall back to more traditional CPU core execution would be required. Never the less, Apple had a lovely demo during WWDC on acceleration of ray tracing and how to optimize for a TBDR versus an immediate renderer. Again here, I expect to see optimization improvements in 3rd party renderers over time :)

Thanks for humouring a long diatribe! Hope everybody is having a really great Sunday and enjoying their MacBooks and Alderlakes.
 

Colstan

macrumors 6502
Jul 30, 2020
330
711
What I’m interested in is why some people are so hoping that Alder Lake is better. Apple is not going back to Intel CPUs, ever - so if you want to work on a Mac, you won’t benefit from anything Intel does. And if you are a PC user, why are you on a Mac forum? I get competition is good, so I don‘t mind if Intel or AMD catch up - but some of you seem to be actively cheering for them, which - again - makes me wonder what are you doing on a Mac enthusiast forum?
I fully understand this sentiment, to a certain extent, I share it. I personally have an interest in Alder Lake because I am interested in the latest processor technology. It's the same reason I follow what AMD and Nvidia are up to. Also, some of it is simply out of habit. It's been well over a decades since I was a Windows user and built my own PCs, but Apple still used a lot of PC parts. Even though Apple now uses their own components, I still have that interest and habit, even though I'm not going PC anytime soon.

I realize not everyone thinks this way, but for me, the primary question is "does it run macOS"? If not, then no purchase. That doesn't mean it isn't of interest. I'm not operating a telescope on a mountaintop or a space probe, but I'm still curious about Titan and Enceladus.

The one thing I don't get are people who are adamant that Intel is handily beating Apple because a 240w chip is beating a 40w Apple design. They are going to trade blows, in various workloads, but the Intel tribalism on a Mac forum strikes me as odd. All tribalism is bad, but it seems even more out of place at MacRumors.
 

Zdigital2015

macrumors 601
Jul 14, 2015
4,143
5,622
East Coast, United States
Re: the bolded part: you said it yourself two sentences later – competition is good. Apple will be making better products if they have fierce competition. So as someone who loves his three Macs and who would probably decline any future job offers if working with a Mac wouldn't be possible, I think it's fantastic if Intel produces great products again.

Also, a lot of people are Mac enthusiasts but have no choice but to work with Windows machines. They can be ecstatic that Intel steps up their game, but still love their Macs. The concept of "you're either with us or against us" or, more aptly, "either you're a fanboy or else GTFO" has never benefited anyone.
Honestly, ever since the MacBook Pros and the M1 Pro/Max were released, it seems like a floodgate of people who are determined to talk both down at any costs. That tells me they're either paid actors or just a**holes who need to get a life. I don't give two s***s about Alder Lake at this point. It's the same old formula from Intel...modest IPC improvements and some speed gains that are mostly purchased through throwing additional voltage at the problem the way Intel has always sought to increase performance. Like it or not x86 is at the end of the line architecture-wise and the only thing that keep it going is the back catalog of software and cheap companies who will never rewrite for modern hardware. Intel has nothing I want anymore and frankly hasn't since Ivy Bridge. Everything after that is just hash being rehashed. Believe me, it's not like Apple is "innovating" the personal computer itself, but they have finally thrown Intel in the bin where they belong. Good riddance to bad rubbish. As far as wanting to spend time listening or reading the "idiots rule" raving of simpletons who tell me that Intel is back and Alder Lake is better than M1, yeah, could care less if they have a voice here. They should go back to TweakTown, WCCFTECH or whatever other Hellmouth they sprang from, I could care less about their opinions, they're irrelevant to this conversation and really need to be treated as such.
 

ADGrant

macrumors 68000
Mar 26, 2018
1,689
1,059
12900k set to 30w to match M1 Max and it was still able to beat it despite using an inferior process node. Alder Lake is clearly superior, and Meteor Lake will completely destroy everything in the market. Chipzilla is fully woke.

A 12900k set to 30w (whatever that means), is not the same as the mobile part. I will wait for a laptop to be released with the mobile Alder Lake i9 and an article from AnandTech before drawing any conclusions.

The desktop Alder Lake SKU has 8 performance and 8 efficiency cores, a mobile i9 may have a reduced core count. The i9-10900k in the current 2020 27" Intel Mac has 10 cores, the equivalent i9010980HK mobile CPU has 8 cores.
 

Rigby

macrumors 603
Aug 5, 2008
6,257
10,215
San Jose, CA
What I’m interested in is why some people are so hoping that Alder Lake is better. Apple is not going back to Intel CPUs, ever - so if you want to work on a Mac, you won’t benefit from anything Intel does. And if you are a PC user, why are you on a Mac forum?
Am I allowed on the forum when I use both Macs and other platforms? :p

Besides, you should direct your ire at Macrumors. They keep posting Intel-related articles on the front page, no doubt because they generate clicks.
 
  • Like
Reactions: singhs.apps

Technerd108

macrumors 68040
Oct 24, 2021
3,062
4,313
Well I have an HP Spectre with 1165G7 i7 and 16gb ram. If we do Geekbench scores on single core around 1400. Multi core is around 4500 and compute is around 15000. Now this is not the fastest 11th gen Intel processor but it is one of the best mobile processors from Intel you can buy. I think this is a good reference to decent mobile device. Oh and that is plugged in with the fans running on high.

Now with Alder Lake we will add possibly 4 light weight cores and 4 new faster high power cores for a total of 12 threads? I would figure maybe a 10-15% improvement in single core and maybe a 20-25% in multi core. That would make single core around 1600 and say around 6000 at peak frequency and power use.

This is a very unscientific hypothesis but I think my numbers will hold up over time and may even be worse for Intel. So if Intel releases a very high power mobile cpu like the 1185G7 those numbers might be a little better but under sustained load and without being plugged into a wall socket I don't think Intel is catching up this year.

Geekbench single core score on M1 Pro 16 core gpu with 16gb ram is 1760. Multi core is 12600 and compute open cl is around 38000. Intel may be able to get close on single core but I don't see them coming close in multi core or compute?
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
Well I have an HP Spectre with 1165G7 i7 and 16gb ram. If we do Geekbench scores on single core around 1400. Multi core is around 4500 and compute is around 15000. Now this is not the fastest 11th gen Intel processor but it is one of the best mobile processors from Intel you can buy. I think this is a good reference to decent mobile device. Oh and that is plugged in with the fans running on high.

Now with Alder Lake we will add possibly 4 light weight cores and 4 new faster high power cores for a total of 12 threads? I would figure maybe a 10-15% improvement in single core and maybe a 20-25% in multi core. That would make single core around 1600 and say around 6000 at peak frequency and power use.

This is a very unscientific hypothesis but I think my numbers will hold up over time and may even be worse for Intel. So if Intel releases a very high power mobile cpu like the 1185G7 those numbers might be a little better but under sustained load and without being plugged into a wall socket I don't think Intel is catching up this year.

Geekbench single core score on M1 Pro 16 core gpu with 16gb ram is 1760. Multi core is 12600 and compute open cl is around 38000. Intel may be able to get close on single core but I don't see them coming close in multi core or compute?

The top ADL mobile chip is said to be 6 P-cores and 8 E-cores, 20 threads. Wattage unknown.
 

Technerd108

macrumors 68040
Oct 24, 2021
3,062
4,313
The top ADL mobile chip is said to be 6 P-cores and 8 E-cores. Wattage unknown.
Okay then they should possibly match on single core again while plugged in. I still don't see them matching multi core or compute.

Since this is a very similar architecture to Tiger Lake I don't see how these chips are going to handle the heat? 6 performance cores together are going to run hot and add 8 efficiency cores and it adds even more heat. So no matter these chips just aren't going to match up.

That being said as long as Intel continues to advance it's process node and architecture with big/little heterogeneous scheduling and more cores they have a lot of room to work with and still improve. A few years from now should be very interesting!
 

Colstan

macrumors 6502
Jul 30, 2020
330
711
Am I allowed on the forum when I use both Macs and other platforms? :p

Besides, you should direct your ire at Macrumors. They keep posting Intel-related articles on the front page, no doubt because they generate clicks.
I never claimed that that you shouldn't use other platforms. I use Windows on Bootcamp to play computer games, and try out Linux distros, from time to time. Nor am I upset, so I am not directing ire toward anyone. My only point was that I am confused by Intel partisans who visit a Mac forum for reasons of promoting their particular technological tribe. It's a strange form of concerned trolling, which we unfortunately go through here quite often. It's the reason one of our most valuable members got banned, because he didn't suffer such individuals. Most folks here are already quite versed in the current technological landscape, so I'm not sure what purpose is served, or who they think they can convince. Regardless, I thank you for your considered comment.
 
  • Like
Reactions: psychicist

theorist9

macrumors 68040
May 28, 2015
3,881
3,060
@mr_roboto has a few good posts.



In short, Apple *might* be able to run more power through the core to up the frequency a lot, but even if possible the cost of doing so likely isn’t worth it. This is especially true in bigger chips where what you want is to maximize throughput. Right now Apple has a sweet spot where they can basically just keep adding CPU cores, run them all at pretty close to peak frequency, and not have to be as concerned with power draw or heat. An interesting follow up question though is why Apple only lets their cores run at absolute peak frequency when only one core is active regardless of the thermal capacity of the system? 🤷‍♂️
Thanks for the links. To be sure, Apple is beautifully situated for expanding to large core counts. And I understand increasing single-core performance is a challenge for all chip makers. But since most programs remain single-threaded, until chips get fast enough for these programs to respond with no noticeable CPU-bound delay, increased single-core performance will continue to improve the user experience.* And this would probably improve the experience for more users than those who need high core counts.

*I'd say there's two cutoffs for the maximum amount of time you'd want the program to take to complete routine, repeated tasks. One is the cutoff for "no perceptible delay". The other is "perceptible, but not irritating". I don't know what either of these would be, but I'd guess the former is maybe <4 ms, and the latter is ~ < 100 ms, depending on the type of work you're doing (obviously you'd want much less latency for gaming, but I'm thinking about productivity work).

So it's a conundrum. Chip makers can no longer generationally improve single-core performance the way they could during the sub-GHz days, so most of the improvement has been in multi-core performance, through scaling to higher core counts. Yet, with a few key exceptions (audio/photography/video apps), software makers seem unable or unwilling to parallelize their programs to take advantage of this.
An interesting follow up question though is why Apple only lets their cores run at absolute peak frequency when only one core is active regardless of the thermal capacity of the system? 🤷‍♂️
I did not know this. What are the frequencies it uses when running software that can make use of all cores?
 
Last edited:

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
I did not know this. What are the frequencies it uses when running software that can make use of all cores?

Here it is for the iPhone:


The regular M1 is similar:


I’m not sure about the Pro/Max, but I think it’s the same. If so: it’s 3GHz for running all P-cores (obviously less for E-cores) and 7% boost to 3.2GHz for when running a single thread.
 

theorist9

macrumors 68040
May 28, 2015
3,881
3,060
Here it is for the iPhone:


The regular M1 is similar:


I’m not sure about the Pro/Max, but I think it’s the same. If so: it’s 3GHz for running all P-cores (obviously less for E-cores) and 7% boost to 3.2GHz for when running a single thread.
Found a reference for the Pro/Max. As you thought, they show the same behavior:

"The CPU cores clock up to 3228 MHz peak, however vary in frequency depending on how many cores are active within a cluster, clocking down to 3132 at 2, and 3036 MHz at 3 and 4 cores active. I say “per cluster”, because the 8 performance cores in the M1 Pro and M1 Max are indeed consisting of two 4-core clusters, both with their own 12MB L2 caches, and each being able to clock their CPUs independently from each other, so it’s actually possible to have four active cores in one cluster at 3036MHz and one active core in the other cluster running at 3.23GHz."

Source: https://www.anandtech.com/show/17024/apple-m1-max-performance-review

Perhaps this is for a reason other than thermals. I dunno, maybe the inter-core communication lanes get jammed when they're all operating at max frequency.
 
Last edited:

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
Found a reference for the Pro/Max. As you thought, they show the same behavior:

"The CPU cores clock up to 3228 MHz peak, however vary in frequency depending on how many cores are active within a cluster, clocking down to 3132 at 2, and 3036 MHz at 3 and 4 cores active. I say “per cluster”, because the 8 performance cores in the M1 Pro and M1 Max are indeed consisting of two 4-core clusters, both with their own 12MB L2 caches, and each being able to clock their CPUs independently from each other, so it’s actually possible to have four active cores in one cluster at 3036MHz and one active core in the other cluster running at 3.23GHz."

Source: https://www.anandtech.com/show/17024/apple-m1-max-performance-review

Perhaps this is for a reason other than thermals. I dunno, maybe the inter-core communication lanes get jammed when they're all operating at max frequency.

Ah yeah they did report it in the Anandtech article for the Pro/Max. As for the why … 🤷‍♂️
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.