Intel Alder Lake vs. Apple M1

mr_roboto · Feb 3, 2022

BigPotatoLobbyist said:
Funny enough Intel actually use RISC-V now for the scheduling microcontroller that aids Thread Director. Nvidia are known (I mean that they state as much, etc) to utilize RISC-V for the little CPU/management GPU node in use for their GPGPU wares I think, and for security in some weird respect. Google does the same with Tensor's M2 chip being built off of RISC-V for privacy-related compute.

I suspect this is what Apple's "high-performance RISC-V engineer" job post was actually about. Sure, Apple always have failsafes and a plan B I imagine should ARM's ISA become limiting in the Very Long Run, but realistically they probably want to drop licensing fees and have greater avenues for customization in future microcontrollers or security chips, or IOT SOC's.

On dropping licensing fees - Apple has an architectural license. My understanding (which could be wrong) is that this means a large sum is paid up front to license a range of ISA spec versions indefinitely. The license holder doesn't need to pay royalties per die or wafer, since what's being licensed is the right to design an implementation of the ISA, rather than licensing Arm-owned IP like a Neoverse N1 core.

So I don't think they'd save any money by shifting just the microcontrollers to RISC-V, they'd have to move away from Arm altogether. Which doesn't seem likely. My bet is that the job position is more about keeping management informed on what a high performance RISC-V core design effort might look like if they ever decided to do it. If they were truly designing a high performance RISC-V core right now, they'd need to hire a lot more than one person, even if poaching internally from Arm design teams.

huge_apple_fangirl · Feb 3, 2022

mr_roboto said:
On dropping licensing fees - Apple has an architectural license. My understanding (which could be wrong) is that this means a large sum is paid up front to license a range of ISA spec versions indefinitely. The license holder doesn't need to pay royalties per die or wafer, since what's being licensed is the right to design an implementation of the ISA, rather than licensing Arm-owned IP like a Neoverse N1 core.

So I don't think they'd save any money by shifting just the microcontrollers to RISC-V, they'd have to move away from Arm altogether. Which doesn't seem likely. My bet is that the job position is more about keeping management informed on what a high performance RISC-V core design effort might look like if they ever decided to do it. If they were truly designing a high performance RISC-V core right now, they'd need to hire a lot more than one person, even if poaching internally from Arm design teams.

Apple's main cores are fully custom, but it's very likely that some of the various microcontrollers they use contain Arm IP. They could be looking to replace that with RISC-V and save on licensing costs.

crazy dave · Feb 3, 2022

BigPotatoLobbyist said:
Am busy at the moment but I will say in quick reply that to be fair current X2 core power consumption is much lower than the A15 Avalanche, like 3-4W vs 5.5.

Also tbf I fully realize Apple's big cores are still miles ahead of Intel's E cores in efficiency, but for peak aggregate throughout controlling on area, depending on how things go it may be nicer if Apple adopted a different strategy at least for the entry-level stuff. Today I think they have nothing to worry about or even in the next 2-4 years minimum (realistically, even with other improvements, they will have the lead in efficient ST which is one of the most important things here).

I agree with most of this, it could be an interesting design point.

The only other caveat I’d add though there is the consideration of how much does the user, especially an entry-level user, make use of that throughput. Having lots of threads per silicon area can look great in multithreaded benchmarks but if users are running mostly single or lightly threaded tasks most of that will go to waste and they would’ve been better off with more P-cores and fewer overall cores/threads even if it scores lower in benchmarks. Many of the most high thread count, parallel jobs where this design would shine would be most beneficial for the power user level chips but there you’ve got the margins to support high numbers of P-cores. Even lower in the stack Apple seems willing to spend on die area to get performance as the economics of their vertical integration are different than say a B2B chipmaker like Intel, Qualcomm, or AMD. That’s one reason why Ice storm and blizzard cores have the characteristics they have which is not really seen on the ARM ltd offerings.

But that’s a point, ARM is adopting midrange cores in their design and previously a lot of people didn’t think Apple would adopt little cores at all given how far ahead they were (and in some ways still are) in big core design. But here we are, they did and so maybe they’ll see the advantage in adding midrange cores too. Then again I don’t *think* any ARM chipmaker has yet applied big.Medium.Little outside of smartphones … yet.

Gerdi · Feb 3, 2022

huge_apple_fangirl said:
Apple's main cores are fully custom, but it's very likely that some of the various microcontrollers they use contain Arm IP. They could be looking to replace that with RISC-V and save on licensing costs.

Why do you think, that licensing an RISC-V design is cheaper than licensing an Cortex M4/M7 design? Besides Apple already has licensed the Cortex M-cores.

crazy dave · Feb 3, 2022

huge_apple_fangirl said:
Apple's main cores are fully custom, but it's very likely that some of the various microcontrollers they use contain Arm IP. They could be looking to replace that with RISC-V and save on licensing costs.

BigPotatoLobbyist said:
I suspect this is what Apple's "high-performance RISC-V engineer" job post was actually about. Sure, Apple always have failsafes and a plan B I imagine should ARM's ISA become limiting in the Very Long Run, but realistically they probably want to drop licensing fees and have greater avenues for customization in future microcontrollers or security chips, or IOT SOC's.

They do. M3 cores are definitely used by Apple (I think M0 too) but Apple also has (I believe fully custom) “chinook” ARM microcontrollers and haven’t (yet) replaced all the standard ARM Cortex-M cores with it. I don't know if the chinook cores are covered by the same license ISA fee as their main cores. If they are, then designing them is effectively free.

https://twitter.com/x/status/1458678394388762624

mr_roboto · Feb 3, 2022

Sydde said:
ARM's AArch64 specifies three page sizes of 4K, 16K or 64K for its MMU, so a strictly implemented ARMv8/9 CPU or SoC should already have this capability. I suspect that Apple's implementation uses only one size, since it is not a general market device, and most likely it is one of the larger granularities, probably 16K since it offers the most flexibility (though 64K is a shallower tree).

Implementations don't need to support all three:

"VMSAv8-64 supports translation granule sizes of 4KB, 16KB, and 64KB. Support for each granule size is optional."

Apple's modern cores only support 16K. From what I've heard, most/all Arm Holdings cores support 4K and 64K, but not 16K.

(Apple also has some odd nonstandard extensions to help Rosetta emulate 4K page size for x86 even though the native hardware page size is 16K. I don't remember the details.)

mr_roboto · Feb 3, 2022

Sydde said:
Who cares about page size? The Pager and to a lesser extent the memory allocation process. These are comparatively modular components of any OS. No other parts of the system or any utility or application processes are even aware of it. It happens transparently to everyone else in the system, so implementing 16K (or even 64K) paging on an ARM SoC is, while not exactly trivial, not all that big an issue.

It's not really true that page size is invisible to applications. I've had to deal with it several times in my career. If you care about high performance file read/write, you're going to end up thinking about page size: it's been my experience that in any OS, you always want to do I/O in integer multiples of the page size, with buffers aligned to page boundaries.

This is important enough that there are standard APIs in UNIX-derived operating systems (including macOS) which let applications ask what the page size is, rather than hard-coding a constant number which might be wrong.

Gerdi · Feb 3, 2022

mr_roboto said:
Implementations don't need to support all three:

"VMSAv8-64 supports translation granule sizes of 4KB, 16KB, and 64KB. Support for each granule size is optional."

Apple's modern cores only support 16K. From what I've heard, most/all Arm Holdings cores support 4K and 64K, but not 16K.

(Apple also has some odd nonstandard extensions to help Rosetta emulate 4K page size for x86 even though the native hardware page size is 16K. I don't remember the details.)

Almost all ARMv8-A cores support 4KB, 16KB and 64KB translation granule! Read the TRMs before guessing!
I am also sure, that M1 supports 4KB translation granule as well.

cmaier · Feb 3, 2022

huge_apple_fangirl said:
Apple's main cores are fully custom, but it's very likely that some of the various microcontrollers they use contain Arm IP. They could be looking to replace that with RISC-V and save on licensing costs.

What licensing costs? Apple provided all of the funding for the creation of Arm. Why would they have agreed to an agreement that results in them paying to use it?

And what happens when you infringe a patent by using RISC-V? You think Hennessy or Patterson is going to indemnify you like Arm will?

crazy dave · Feb 3, 2022

mr_roboto said:
On dropping licensing fees - Apple has an architectural license. My understanding (which could be wrong) is that this means a large sum is paid up front to license a range of ISA spec versions indefinitely. The license holder doesn't need to pay royalties per die or wafer, since what's being licensed is the right to design an implementation of the ISA, rather than licensing Arm-owned IP like a Neoverse N1 core.

So I don't think they'd save any money by shifting just the microcontrollers to RISC-V, they'd have to move away from Arm altogether. Which doesn't seem likely. My bet is that the job position is more about keeping management informed on what a high performance RISC-V core design effort might look like if they ever decided to do it. If they were truly designing a high performance RISC-V core right now, they'd need to hire a lot more than one person, even if poaching internally from Arm design teams.

huge_apple_fangirl said:
Apple's main cores are fully custom, but it's very likely that some of the various microcontrollers they use contain Arm IP. They could be looking to replace that with RISC-V and save on licensing costs.

BigPotatoLobbyist said:
I suspect this is what Apple's "high-performance RISC-V engineer" job post was actually about. Sure, Apple always have failsafes and a plan B I imagine should ARM's ISA become limiting in the Very Long Run, but realistically they probably want to drop licensing fees and have greater avenues for customization in future microcontrollers or security chips, or IOT SOC's.

According to someone on hacker news the job posting is specifically looking for someone with RV and vector extensions experience with machine learning focus:

Apple Exploring RISC-V, Hiring RISC-V ‘High Performance’ Programmers | Hacker News

news.ycombinator.com

So it sound like it’s naught to do with high performance RV cores or microcontrollers in general but high performance computing where RV cores are used - i.e. as a controller for a specific kind of accelerator. Why RV cores might be attractive here vs their own custom ARM-based chinook cores, I dunno. Maybe it isn’t but they’re still exploring it just in case.

mr_roboto · Feb 3, 2022

Gerdi said:
Almost all ARMv8-A cores support 4KB, 16KB and 64KB translation granule! Read the TRMs before guessing!
I am also sure, that M1 supports 4KB translation granule as well.

I wasn't guessing, just passing along what I remembered.

Which might not have been accurate, because memory is fickle. You are right, M1 does support 4KB. I said it didn't because I did accurately remember that some Linux distros have problems on M1 due to missing support for all granule sizes. After some googling, it's that RHEL for Arm assumes the CPU supports the 64KB granule, and thus can't run on M1 without recompiling everything. I conflated that problem into a lack of support for 4KB, as I know that in practice Apple uses 16KB.

Krevnik · Feb 3, 2022

BigPotatoLobbyist said:
Maybe another way to put it, given Apple's margin concern, I would like it if Apple's Icestorms or Blizzards were closer to the A710 and A78 than they currently are, and they threw more into laptops or desktops at the expense of the bigs, and even axed big cores on the phones, which are currently just too much for the utterly trash-tier thermal design of the modern iPhone, even keeping race to idle in mind. Maybe not currently RE: M-X SOC's, but in a few years I could see this being beneficial.

As a bit of a nit, if you can stress an iPhone’s thermals, you aren’t exactly in a “race to idle” situation.

And it is definitely a trade off. Based on the comments in the XNU scheduler code, it seems Apple likes the low latency that their setup provides on the very bursty (and time sensitive) work that is user interaction. Which makes a kind of sense. But I do agree that the large cores are at the point where they aren’t really ideal for running full tilt for very long on the iPhone. That said, I would be surprised if Apple fully ditches them on the iPhone unless they start producing “medium” cores that can take over and still have the performance profile they want. Partly because mainstream apps are still very single or dual threaded these days, with bursts where they might use more.

leman · Feb 3, 2022

Sydde said:
Who cares about page size? The Pager and to a lesser extent the memory allocation process. These are comparatively modular components of any OS. No other parts of the system or any utility or application processes are even aware of it. It happens transparently to everyone else in the system, so implementing 16K (or even 64K) paging on an ARM SoC is, while not exactly trivial, not all that big an issue.

You’d be surprised how many applications rely on page size assumptions… custom high-performance memory allocators, databases etc. - actually, hard coded 4K pages were one of the pain points when compiling for native Apple Silicon. Also, some APIs (e.g. some GPU functionality) require memory allocations to be page-aligned.

P.S. Didn't it take a while for Chrome to get a native version because they hardcoded 4K pages?

crazy dave · Feb 4, 2022

mr_roboto said:
I wasn't guessing, just passing along what I remembered.

Which might not have been accurate, because memory is fickle. You are right, M1 does support 4KB. I said it didn't because I did accurately remember that some Linux distros have problems on M1 due to missing support for all granule sizes. After some googling, it's that RHEL for Arm assumes the CPU supports the 64KB granule, and thus can't run on M1 without recompiling everything. I conflated that problem into a lack of support for 4KB, as I know that in practice Apple uses 16KB.

More than just RHEL/64KB, it’s also the case that the IOMMUs only support 16K. So while the CPU may support 4K, in practice Hector has written that the Linux distros would still have to ship a 16K kernel to meaningfully use Linux on the M1. Most don't.

mi7chy · Feb 4, 2022

One picture is worth a thousand words.

Stratus Fear · Feb 4, 2022

mi7chy said:
One picture is worth a thousand words.

View attachment 1954603

Oh no!

Anyway...

leman · Feb 4, 2022

BigPotatoLobbyist said:
Also tbf I fully realize Apple's big cores are still miles ahead of Intel's E cores in efficiency, but for peak aggregate throughout controlling on area, depending on how things go it may be nicer if Apple adopted a different strategy at least for the entry-level stuff. Today I think they have nothing to worry about or even in the next 2-4 years minimum (realistically, even with other improvements, they will have the lead in efficient ST which is one of the most important things here).

Apple's goal so far is overall usability. IMO, peak aggregate throughput is a good thing in HPC market, but not that useful for general-purpose everyday computing. How many applications can scale well off the large amount of slower cores?

crazy dave · Feb 4, 2022

leman said:
Apple's goal so far is overall usability. IMO, peak aggregate throughput is a good thing in HPC market, but not that useful for general-purpose everyday computing. How many applications can scale well off the large amount of slower cores?

Yeah I’ll admit that I was basically reiterating points you had made earlier in a different context when I wrote the above in reply to @BigPotatoLobbyist:

crazy dave said:
The only other caveat I’d add though there is the consideration of how much does the user, especially an entry-level user, make use of that throughput. Having lots of threads per silicon area can look great in multithreaded benchmarks but if users are running mostly single or lightly threaded tasks most of that will go to waste and they would’ve been better off with more P-cores and fewer overall cores/threads even if it scores lower in benchmarks. Many of the most high thread count, parallel jobs where this design would shine would be most beneficial for the power user level chips but there you’ve got the margins to support high numbers of P-cores

leman · Feb 4, 2022

mi7chy said:
One picture is worth a thousand words.

View attachment 1954603

Which is exactly as predicted. Note that the graph is not correct in respect to the Apple CPU. M1 Max CPU cluster only draws 30W (35W package power including RAM) to reach 12K in R23 multicore.

So what we have here is a benchmark that is pretty much optimal case for Intel (sales very well with multiple cores and SMT), is know to underuse Apple hardware (runs an Intel SIMD optimised library with a SSE2-to-NEON software layer, using a suboptimal SIMD width on M1), where Intel's 6+8 SKU at 40W shows the same performance as Apple's 8+2 SKU at 30W. That's a real-world difference in perf-per-watt of around 40%. Intel still has a long way to go until they can catch up with Apple in the mobile space.

Where Intel of course has an obvious edge is performance scaling with power. Which is again not surprising given their architecture and the focus on power-hungry desktop applications. If you are a desktop HPC user that benefits from multicore scaling, ADL is very good product. But we already knew that. It will also undoubtedly be very popular in laptops, just don't expect those level of performance inside the usual multimedia chassis.

Anyway, all of this confirms that Apple's decision to move on was the correct one. Personally, I am happy to have my 15+ hour battery life and 2x better performance than Intel while working away from my desk, and I have no problems conceding the R23 or stockfish scores to the x86 folks if that makes them happy

mi7chy · Feb 4, 2022

Intel next year with node shrink will be killer.

Now to see what AMD has to offer this year with Ryzen 6000 mobile. They hinted at 1.3x multithreaded performance so it'll be close.

And, what Apple will do going forward. Wouldn't be surprised to see a refreshed 2019/2020 16" Macbook Pro with Alder Lake and something better than 5600m unless they want to halve their <10% marketshare.

MayaUser · Feb 4, 2022

Intel from now on will offer again for the next decade from gen to gen an increase of 10%-15%
Expect again this kind of gains like Alder L, in 2030
No intel in that 2019 16" Mbp enclosure for sure it would be a toaster...Apple will not update with Intel any other macs besides maybe the Mac Pro.

mi7chy · Feb 5, 2022

45W in a 16" MBP chassis is easy and make it the default which is still faster than M1 Pro/Max 10CPU with an optional 55W performance mode. So, two lines of laptops, 16" Macbook Pro(fessional) with Alder Lake for people that aren't ready to give up native x64 compatibility and bootcamp. And, separate 16" Macbook Pro(sumer) with upsized iPad SoC.

jeanlain · Feb 5, 2022

leman said:
Which is exactly as predicted. Note that the graph is not correct in respect to the Apple CPU. M1 Max CPU cluster only draws 30W (35W package power including RAM) to reach 12K in R23 multicore.

So they haven’t measured the power themselves?

crazy dave · Feb 5, 2022

leman said:
Where Intel of course has an obvious edge is performance scaling with power.

I’m not sure that’s actually true (unless you mean pumping additional power through the M1 Pro/Max CPU which Apple doesn’t allow). Given the differences in architecture, I would say that a higher core count M-series chip would scale far better on perf/W than Alder Lake on this graph. Admittedly it’s different from taking a single configuration and running more or less power through it vs changing core configurations but that is the benefit of Apple’s design. Individual P-cores are small and efficient with great single threaded performance. Want more multicore performance, just add more. Obviously not *that* simple since Apple scales other parts of the SOC up as well. Even so it provides a more granular but efficient method of adding multicore performance that Intel can only sort of copy with middle cores.

crazy dave · Feb 5, 2022

jeanlain said:
So they haven’t measured the power themselves?

If they did or didn’t, they screwed it up and reported wall power.

Intel Alder Lake vs. Apple M1

macrumors 6502a

macrumors 6502a

macrumors 68000

macrumors 6502

macrumors 68000

macrumors 6502a

macrumors 6502a

macrumors 6502

Suspended

macrumors 68000

macrumors 6502a

macrumors 601

macrumors Core

macrumors 68000

Suspended

macrumors 6502a

macrumors Core

macrumors 68000

macrumors Core

Suspended

macrumors 68040

Suspended

macrumors 68020

macrumors 68000

macrumors 68000

Our Staff