Intel Alder Lake vs. Apple M1

cmaier · Jan 28, 2022

diamond.g said:
@cmaier could Intel see a performance increase going from 4/6-wide to 8-wide? What would be the downsides of doing so?

How would they accomplish that? Seems like you’d have to add pipestages and a lot of decode hardware to maybe be able to do that. Then whenever you have a mispredicted branch or a context switch you’d have a higher branch penalty. You would probably want to add even more branch prediction hardware to compensate. So power consumption would be bad, and the die area would be bad.

Xiao_Xi · Jan 28, 2022

cmaier said:
Seems like you’d have to add pipestages and a lot of decode hardware to maybe be able to do that. Then whenever you have a mispredicted branch or a context switch you’d have a higher branch penalty. You would probably want to add even more branch prediction hardware to compensate. So power consumption would be bad, and the die area would be bad.

Is mispredicted branch in ARM a less significant problem than in x64? Do Intel/AMD CPUs have deeper pipelines than Apple SOCs?

cmaier · Jan 28, 2022

Xiao_Xi said:
Is mispredicted branch in ARM a less significant problem than in x64? Do Intel/AMD CPUs have deeper pipelines than Apple SOCs?

Generally you would have more pipeline stages in x64 then in arm, but the difference is probably not too big with modern arm implementations. But if Intel wanted to issue, say 8 instructions at a time, it would need more pipeline stages to do that than arm would. The decoding on x86 is a real problem, because the instructions can vary in length (by non-power-of-2 multiples, even), and when you fetch a chunk of instruction memory you don’t even know where each instruction starts. You need to figure out where instruction 1 ends before you can really decode instruction 2. You get around that in x86 decoders by speculation or parallelism. You can guess where instruction 2 starts and decode instruction 2 while decoding instruction 1, then paying a penalty when you guess wrong, or you can try every possible instruction 2 starting point in parallel, and throw away the ones that aren’t needed (needlessly burning power and taking up die area). Or you add pipeline stages - figure out starting and ending places in one stage, then finish the decoding in stage 2 (or 3, or 4 - you may have to read a microcode ROM).

JMacHack · Jan 28, 2022

BigPotatoLobbyist said:
Yes. Latter in particular are vehemently opposed to ARM in general I've found, be it custom cores like Apples or reference from ARM ltd. Weird conservative impulses all around with that lot.

They’re opposed because it’s Apple, simply put. Look through any comment section with a cross section of Apple news and gamers and you’ll see anti-Apple commentary everywhere. ARM is secondary to that.

diamond.g said:
It doesn't offer the ability to Overclock, or use phase change cooling for bragging rights, etc.

Like any of those idiots overclock, lol. PCMR types are like car enthusiasts in that 98% of them are bench racing high performance stuff while daily driving econoboxes.

senttoschool said:
FYI, the wall that Intel is hitting might be near Apple Silicon too. A15 increased its efficiency cores much more so than its high-performance cores and we're no longer seeing the drastic increases in raw CPU performance that we previously saw with the A-series. I think Apple will likely have to increase the number of smaller cores to compensate for the slow down in high-performance cores in the near future.

Even still, Intel's Alder Lake is delivering 15-20% faster ST over last-gen. That's nothing to scoff at and they're the king of the consumer x86 world again.

I’m just an armchair engineer, but that doesn’t necessarily mean they improved A15 as much as they could. i.e, they have the fastest mobile chip, and the second fastest is their gen before that, why would you increase speed again over battery life?

The case is different with the M series, with much larger batteries and tougher competition in performance.

And, from my viewpoint, most of the detractors from the M series has been critical of gpu performance. Where A15 saw a bigger leap iirc.

I think Apple has more room to grow than Intel here. I can’t imagine that they’re holding anything back.

JMacHack · Jan 28, 2022

cmaier said:
How would they accomplish that? Seems like you’d have to add pipestages and a lot of decode hardware to maybe be able to do that. Then whenever you have a mispredicted branch or a context switch you’d have a higher branch penalty. You would probably want to add even more branch prediction hardware to compensate. So power consumption would be bad, and the die area would be bad.

Honestly this seems like a likely direction for Intel. They’re probably willing to sacrifice die space and heat/power to retain the performance crown.

Krevnik · Jan 28, 2022

senttoschool said:
I'm not sure why it's hard for you and @leman to understand.

On desktop, yes, ADL's efficiency cores are mainly used to boost MT performance. And they accomplish that goal well because they're matching and beating AMD's best but at a cheaper price.

And then on mainstream laptops, where efficiency matters more than having the most multithreaded performance, Intel configures ADL to have more efficiency cores than power cores. This is exactly how all big.Little designs are (the exception being M1 Pro/Max). All Qualcomm and Apple A series big.Little chips have more efficiency cores than high power cores. In fact, most Android SoCs have just 1 high power core, a few mid-tier cores and more low-power cores.

Keep in mind the scheduler is important here too. That is informing the different designs and how cores are assigned. Ironically, Apple’s design is pretty bare bones compared to others, but it works for their core designs and platforms.

Apple’s scheduler is pretty braindead. Low priority? Get an E core. Not low priority? Get a P core. iOS can shunt background processes to E cores by forcing background priority on the whole process when it backgrounds, while macOS currently doesn’t do anything similar. This is why the M1 Pro is skewed towards P cores, BTW.

Intel’s Thread Director is trying to offload work to the E cores based on heuristics. So it can take advantage of more efficiency cores. Windows 11 adds the trick of pushing background windows to the E cores and elevating the foreground window to P cores, on top of the Thread Director recommendations. In many ways it’s more advanced, but behavior relies on the Thread Director heuristics and making sure there aren’t any false positives. But Thread Director also tends to want to put threads classified as “mainstream apps” on the E cores when possible, which does hurt latency if it wasn’t for Windows 11‘s additional trick.

Honestly, Apple’s approach has the benefit of a simpler scheduler and lower latency for bursty workloads of all kinds. User interactive/initiated work always gets a P core out of the gate. This works for Apple because the P cores themselves are already efficient. However, Intel’s taking a high power CPU and trying to extract more efficiency from an existing microarch, which means it has to shunt work to the E cores for this efficiency to happen, and will want to keep the P cores unpowered as much as possible. While the heterogenous layout is generally similar on the surface, the details make them different in interesting ways, and their actual behaviors are quite different. Apple wouldn’t benefit from a 2+8 design in their laptops. Intel wouldn’t benefit from an 8+2 design in a laptop chip.

I can’t speak to the Android scheduler in an AMP setup, since I simply don’t have experience with it, nor have I read up enough on it.

cmaier · Jan 28, 2022

JMacHack said:
Honestly this seems like a likely direction for Intel. They’re probably willing to sacrifice die space and heat/power to retain the performance crown.

They would be better off if they would provide Microsoft with a static translator that can be run on binaries and convert x86 code to fixed length instructions, and then just fix the damned ISA. Rosetta this b****.

Edited: thought the forum software would blur out the naughty word, but it didn’t.

JMacHack · Jan 28, 2022

cmaier said:
They would be better off if they would provide Microsoft with a static translator that can be run on binaries and convert x86 code to fixed length instructions, and then just fix the damned ISA. Rosetta this b****.

Edited: thought the forum software would blur out the naughty word, but it didn’t.

With the cross-licensing agreement wouldn’t they have to share that with AMD?

leman · Jan 28, 2022

Krevnik said:
Apple’s scheduler is pretty braindead. Low priority? Get an E core. Not low priority? Get a P core. iOS can shunt background processes to E cores by forcing background priority on the whole process when it backgrounds, while macOS currently doesn’t do anything similar. This is why the M1 Pro is skewed towards P cores, BTW.

Intel’s Thread Director is trying to offload work to the E cores based on heuristics. So it can take advantage of more efficiency cores. Windows 11 adds the trick of pushing background windows to the E cores and elevating the foreground window to P cores, on top of the Thread Director recommendations. In many ways it’s more advanced, but behavior relies on the Thread Director heuristics and making sure there aren’t any false positives. But Thread Director also tends to want to put threads classified as “mainstream apps” on the E cores when possible, which does hurt latency if it wasn’t for Windows 11‘s additional trick.

Honestly, Apple’s approach has the benefit of a simpler scheduler and lower latency for bursty workloads of all kinds. User interactive/initiated work always gets a P core out of the gate. This works for Apple because the P cores themselves are already efficient. However, Intel’s taking a high power CPU and trying to extract more efficiency from an existing microarch, which means it has to shunt work to the E cores for this efficiency to happen, and will want to keep the P cores unpowered as much as possible. While the heterogenous layout is generally similar on the surface, the details make them different in interesting ways, and their actual behaviors are quite different. Apple wouldn’t benefit from a 2+8 design in their laptops. Intel wouldn’t benefit from an 8+2 design in a laptop

Apple has also the benefit of having a functional thread priority API that is actually being used by software, thanks to their long history of mobile computing and overall richer APIs. Intel and Microsoft have to do a lot of guesswork, where Apple has more data to rely on.

cmaier · Jan 28, 2022

JMacHack said:
With the cross-licensing agreement wouldn’t they have to share that with AMD?

Why? I don’t think so.

BigPotatoLobbyist · Jan 28, 2022

JMacHack said:
They’re opposed because it’s Apple, simply put. Look through any comment section with a cross section of Apple news and gamers and you’ll see anti-Apple commentary everywhere. ARM is secondary to that.

Like any of those idiots overclock, lol. PCMR types are like car enthusiasts in that 98% of them are bench racing high performance stuff while daily driving econoboxes.

I’m just an armchair engineer, but that doesn’t necessarily mean they improved A15 as much as they could. i.e, they have the fastest mobile chip, and the second fastest is their gen before that, why would you increase speed again over battery life?

The case is different with the M series, with much larger batteries and tougher competition in performance.

And, from my viewpoint, most of the detractors from the M series has been critical of gpu performance. Where A15 saw a bigger leap iirc.

I think Apple has more room to grow than Intel here. I can’t imagine that they’re holding anything back.

Eh no it's beyond that. It's not just Apple IME

BigPotatoLobbyist · Jan 28, 2022

cmaier said:
They would be better off if they would provide Microsoft with a static translator that can be run on binaries and convert x86 code to fixed length instructions, and then just fix the damned ISA. Rosetta this b****.

Edited: thought the forum software would blur out the naughty word, but it didn’

Yes, lol. Couldn't they even just literally build an ARM custom core and implement something similar to the TSO memory compatibility the M1 had(?) in order to better emulate X64 binaries? A funny implication of the M1's success in this regard and in terms of IPC & as a microarchitecture is arguably just that Intel and AMD could burn the gains from a fixed-length, modernized ISA on emulating X86 binaries and still come out even relative to natively running old code.

It's not as if Apple's ability to emulate binaries in X64/X86 is any more privileged and yet they did a phenomenal job. I imagine Intel & AMD could even go further in the kind of hardware they could add for enhancing backwards compatibility, not that it would be necessary as Apple showed with proper translation + minimal hardware configuration.

Cliff, at some point what with ARM vendors going wider and deeper (Nuvia/Qualcomm presumably) and Nvidia's likely foray into beefing up custom cores (if you saw their hiring news in Israel) — is it plausible Intel and AMD will just hit a wall even for the most right-tailed cases where they've built processors for extreme clock rates? Surely this legacy debt will become increasingly obvious — moreso than it is today provided competitors do what they cannot?

Edit: On this note, is there any truth to the idea the width imposes a limitation on clock rates? Could Apple (if they wanted to for some strange reason) implement the M1 microarchitecture with a greater composition of TSMC/Synopsys high-performance cell libraries in order to realize higher clock rates albeit at detriment to power consumption — but with their industry-leading IPC remaining constant? So say an MX chip with greater leakage all around but the ability to hit 5GHz and pretty much destroy Intel/AMD chips on performance without any doubt.

Krevnik · Jan 28, 2022

leman said:
Apple has also the benefit of having a functional thread priority API that is actually being used by software, thanks to their long history of mobile computing and overall richer APIs. Intel and Microsoft have to do a lot of guesswork, where Apple has more data to rely on.

Intel brute forced it with an ML-trained heuristic run against actual software. So in a sense they have plenty of data.

But I know what you mean. Apple doesn't use a ton of data to make scheduling decisions, but they don't need to, because they've convinced developers to help out by tagging their threads and tasks. And as a developer, I appreciate the consistent behavior, having some control over scheduling my work and being able to make reasoned decisions about priority.

Swift 5.5's concurrency features including QoS levels for tasks is a good example here. Being able to spin off a concurrent bit of work, give it low priority, and avoid CPU contention as a result with a single line of code? And it works on services I write on Linux too? Yes please. C# can kinda/sorta do something similar, but it's not nearly as elegant.

BigPotatoLobbyist · Jan 28, 2022

Intel brute forced it with an ML-trained heuristic run against actual software. So in a sense they have plenty of data.

Krevnik said:
But I know what you mean. Apple doesn't use a ton of data to make scheduling decisions, but they don't need to, because they've convinced developers to help out by tagging their threads and tasks. And as a developer, I appreciate the consistent behavior, having some control over scheduling my work and being able to make reasoned decisions about priority.

Swift 5.5's concurrency features including QoS levels for tasks is a good example here. Being able to spin off a concurrent bit of work, give it low priority, and avoid CPU contention as a result with a single line of code? And it works on services I write on Linux too? Yes please. C# can kinda/sorta do something similar, but it's not nearly as elegant.

Honestly I think Intel and MS's scheduling will go fine, even Google has made various improvements. People exaggerate how beneficial Apple's vertical integration is in this regard, don't think profiling is something unheard of to the guys at Intel, Qualcomm, MS, Google

jdb8167 · Jan 28, 2022

cmaier said:
They would be better off if they would provide Microsoft with a static translator that can be run on binaries and convert x86 code to fixed length instructions, and then just fix the damned ISA. Rosetta this b****.

Edited: thought the forum software would blur out the naughty word, but it didn’t.

Intel won't do this but I've been thinking about this. Memory is cheap. Converting the standard x86_64 ISA to fixed instruction lengths with minor modifications for anything that is currently a fused micro-op. From what I understand the current micro-op is over 100 bits but that doesn't sound particularly bad. Standardizing on 128 bit wide instructions seems doable. It doesn't compare favorably with AArch64 32-bit wide instructions but it seems like with in memory cache of the translated binary it wouldn't be that noticeable given current hardware.

Gerdi · Jan 28, 2022

BigPotatoLobbyist said:
Could Apple (if they wanted to for some strange reason) implement the M1 microarchitecture with a greater composition of TSMC/Synopsys high-performance cell libraries in order to realize higher clock rates albeit at detriment to power consumption — but with their industry-leading IPC remaining constant? So say an MX chip with greater leakage all around but the ability to hit 5GHz and pretty much destroy Intel/AMD chips on performance without any doubt.

Not sure why this is even a question. Of course they could. If you are willing to sacrifice power efficiency, there is significant clock frequency headroom. As TSMC demonstrated some years ago, you can clock an old Cortex A72 with 4.2GHz, which was maxing out in contemporary mobile designs at around 2GHz. And it is not just leakage but also dynamic power, as you would need to increase the voltage significantly.

cmaier · Jan 28, 2022

jdb8167 said:
Intel won't do this but I've been thinking about this. Memory is cheap. Converting the standard x86_64 ISA to fixed instruction lengths with minor modifications for anything that is currently a fused micro-op. From what I understand the current micro-op is over 100 bits but that doesn't sound particularly bad. Standardizing on 128 bit wide instructions seems doable. It doesn't compare favorably with AArch64 32-bit wide instructions but it seems like with in memory cache of the translated binary it wouldn't be that noticeable given current hardware.

You could easily get away with 64 bit ops, and just convert to multiple of those as needed. I’d stay away from microops, because microops change from chip to chip.

Krevnik · Jan 28, 2022

BigPotatoLobbyist said:
Honestly I think Intel and MS's scheduling will go fine, even Google has made various improvements. People exaggerate how beneficial Apple's vertical integration is in this regard, don't think profiling is something unheard of to the guys at Intel, Qualcomm, MS, Google

I’m not even really trying to trash it. But it is more complicated and more black box to developers who are working on the platform. It also benefits from different P/E core ratios.

UBS28 · Jan 28, 2022

Looks like gaming laptops with Alder Lake are faster than the M1 Max. So you are basically paying for energy efficiency and battery life with the M1 Max, rather than for the best performance.

It is quite interesting how people were saying that nobody needs the power of the M1 Max, while gaming laptops are even more powerful.

But the 16" M1 Max MBP has nothing to worry about as it is much more portable and much better battery life, so it has a different target audience.

cmaier · Jan 28, 2022

UBS28 said:
Looks like gaming laptops with Alder Lake are faster than the M1 Max. So you are basically paying for energy efficiency and battery life with the M1 Max, rather than for the best performance.

It is quite interesting how people were saying that nobody needs the power of the M1 Max, while gaming laptops are even more powerful.

But the 16" M1 Max MBP has nothing to worry about as it is much more portable and much better battery life, so it has a different target audience.

So far Alder Lake beats M1 Max by 4% on one benchmark. Let’s not get ahead of ourselves.

jinnyman · Jan 28, 2022

I'm actually impressed with Intel's comeback or rather lack of breakthrough by AMD at the same time. AMD's still great and they have performance per watt lead in x64, and I still believe Zen4 will be great. But It's been only a few years for AMD to gain superiority over Intel, and I'm already starting to see AMD trying to exploit rather than trying to widen the superiority gap.

Intel's 12th gen is really a slap on AMD, and the results are pretty shocking. Perhaps 12th is not enough for many, but I'm confident 13th and 14th will be great also. They got their **** together pretty fast so to speak.

I also believe many people in here don't give enough credit for Intel. I know, at the end of the day, what matters is what you use, and Intel's mobile chips sucked for last 5 years. But x64 & windows capable is another important specification of chips, and I believe even Apple would have hard time designing M1 Max like chips in x64. Apple owns it's entire ecosystem from top to bottom, and they can easily switch architecture, os, hardwares everything. That also provides advantages in designing their chips. Of course, I still like Apple's M1 MacBook Pro better as a laptop over any windows counterparts, but I won't be giving up on Windows anytime soon. (unless what I require all imported over to Mac)

MBP M1 Max vs any windows Laptop with 12th get, I will vote for MBP wholeheartedly. But for small segment of sectors where performance matters in laptop casing? 12th get looks pretty nice. I wouldn't mind using a 12th gen laptop as my work laptop where it sits on the desk most of time except several times a year.

For most of Mac users where windows ecosystem doesn't matter, I agree. Intel and AMD has lot more to go.
I'm eager to see how it plays out in desktop as Apple has only shown low powered/efficient M1 in its desktops only.
Even for Mac, I want Apple to push it's performance to max while making it silent enough, and don't want them to put top priority on efficiency.

Technerd108 · Jan 28, 2022

I actually think it is pretty amazing how much performance Intel got out of the Skylake architecture. Kaby lake, Coffee Lake, Coffee Lake Refresh, New Architecture and old Ice Lake-Comet Lake, First all 10nm line of chips-Tiger Lake-Rocket Lake.

Intel had 4 gens on 14nm on Skylake until Ice Lake. They were able to get 10-15% each gen optimizing the architecture only and then they also started adding more cores and then getting even higher percentage jumps to like 30-40% jump in one generation. I think all of that is pretty impressive considering they were stuck on one manufacturing node for so long. What pissed me off more than anything was just the misleading marketing and the fact they never were straight that they were stuck on 14nm and why? I still don't really know why Intel was stuck on 14nm for so long and now stuck on 10nm.

I really think it would be a great idea for Intel is TSMC and Intel partnered on a new fab in the US and start figuring out how TSMC does it and then extend that technology to all Intel fabs which would then leapfrog them back in terms of IPC. Considering the security issues in Taiwan and the threats by China I am sure TSMC would like to have an outside factory to diversify their assets. I also think America should wake up. If we can't protect Taiwan we should help TSMC completely move their operations to the USA and provide financial assistance that if they share IP they don't have to pay back any loans the US government would give to TSMC. They could fast track their workers into a path to citizenship here. Offer them a prosperous way out if worse comes to worse. It would benefit us both and offer TSMC protection against incursion by China as China would take TSMC over at least at the very top. All of TSMC IP would become China's. I have said for years that Taiwan would be a flash point for the next major war because of TSMC.

I really hope Taiwan can stay a free Democratic Nation that it is. Taiwan is a wonderful country and they have a lot of high tech infrastructure that makes the US look like a third world country. I am really pulling for them after seeing what has happened to poor Hong Kong. I don't want to lose US troops but as a mission keeping Taiwan free has several tactical advantages which is why China wants Taiwan back.

Back to Intel I really hate a lot of their business practices but I still want them to succeed and I would love to see a TSMC/Intel collaboration as it would benefit both countries and push both forward.

Alder Lake seems to be very impressive for a 10nm chip. Again they are optimizing the architecture and adding cores and the results are impressive. For Gaming on the Desktop Alderlake will be excellent chips as long as you can keep them cool and have enough power supply. I love to see Intel putting so many cores on a single chip!! The big little scheduling should also be interesting. If it is done right it could actually be a benefit in performance.

Unfortunately Intel is not doing as well in the mobile space. I know the most powerful HK Alderlake chip is 4% faster multicore than the M1 Max but also significantly faster in single core which is most import for the average user or gamer at a much higher wattage and heat dissipation. If cooled properly and plugged in these are very powerful chips.

The thing is these are the wrong metrics to only compare the performance of a device. The CPU is but one of the more important parts of a laptop but there are a lot of other factors such as ssd speed, ram, ram speed, build quality, software optimization for specific hardware, etc. M1 Max has so many advantages over Alderlake due to the difference in the unified everything on the M1 Max versus socketed RAM and SSD and off die gpu. So it is possible that in very specific situations Alderlake is faster than M1 max but as others have noted on Battery the benchmarks are MUCH more in favor of the M1 Max and it matters a lot on a mobile device. The ability to use a high powered device the same way on battery as plugged in and it lasts a long time on battery it is a complete game changer for many reasons. Intel is still going to run hot and use a lot of energy on battery and even more plugged in. So actual in actual use I think the performance of Alderlake will be significantly better than Rocket Lake/Tiger Lake which is a big win for everyone!!

I can't wait to get a device with Alder Lake. I know I have made a lot of comments about Scalderlake but it is because of all these cherry picked benchmarks and people trashing M1 because of it. They completely ignore the huge advantages of Apple Silicon.

As I see it a 16" MacBook Pro base Model is pretty hard to beat in terms of cost. What other Windows device offers similar performance and specs. It is not just the cpu but so much more. When you start looking at high powered windows laptops or Workstations they start around $3k and go up. You will find a few for $2k and up but they will have old processor or something else. I have seen workstations from Lenovo starting at 4k with a plastic non touch screen and otherwise similar specs. Dell was 5k. I am sure HP would be best value. My point is these MBP 14 and 16 offer a lot of value when you consider all of the specs. Similar Windows Laptops just do not offer anything close. Of course all that matters is Intel beat M1 Max, Intel beat M1 Max, on and on.

crazy dave · Jan 28, 2022

jinnyman said:
I'm actually impressed with Intel's comeback or rather lack of breakthrough by AMD at the same time. AMD's still great and they have performance per watt lead in x64, and I still believe Zen4 will be great. But It's been only a few years for AMD to gain superiority over Intel, and I'm already starting to see AMD trying to exploit rather than trying to widen the superiority gap.

Intel's 12th gen is really a slap on AMD, and the results are pretty shocking. Perhaps 12th is not enough for many, but I'm confident 13th and 14th will be great also. They got their **** together pretty fast so to speak.

I also believe many people in here don't give enough credit for Intel. I know, at the end of the day, what matters is what you use, and Intel's mobile chips sucked for last 5 years. But x64 & windows capable is another important specification of chips, and I believe even Apple would have hard time designing M1 Max like chips in x64. Apple owns it's entire ecosystem from top to bottom, and they can easily switch architecture, os, hardwares everything. That also provides advantages in designing their chips. Of course, I still like Apple's M1 MacBook Pro better as a laptop over any windows counterparts, but I won't be giving up on Windows anytime soon. (unless what I require all imported over to Mac)

MBP M1 Max vs any windows Laptop with 12th get, I will vote for MBP wholeheartedly. But for small segment of sectors where performance matters in laptop casing? 12th get looks pretty nice. I wouldn't mind using a 12th gen laptop as my work laptop where it sits on the desk most of time except several times a year.

For most of Mac users where windows ecosystem doesn't matter, I agree. Intel and AMD has lot more to go.
I'm eager to see how it plays out in desktop as Apple has only shown low powered/efficient M1 in its desktops only.
Even for Mac, I want Apple to push it's performance to max while making it silent enough, and don't want them to put top priority on efficiency.

Zen 3 is a year old and Zen 4, barring delays, will be here at the end of the year. That’s a reasonable cadence. Intel’s next chip Raptor Lake will be a reorganized Alder Lake similar to A15 relative to A14. It’s important to remember how much ground AMD had to catch up to Intel, that Zen 3 surpassed them by as much as it did surprised everyone, including AMD. Further AMD does have Zen 3+ coming to the desktop soon. For laptops they’re still competitive in terms of perf/W. Finally it’s important to remember that Intel’s total solution is not quite as cheap as it first appears: DDR5 is expensive and necessary if Alder Lake’s cores are to actually stretch their legs and motherboard prices have gone up.

The issue is that Alder Lake’s Golden Cove cores are still too big and power hungry to match AMD in core count. AMD’s performance cores are still too far ahead of Intel’s here. Thus Intel introduced midrange Gracemont cores to up the number of cores that could fit on a single die without blowing up power or die size. Now these midrange cores are actually quite nifty but should not be confused with traditional little or E-cores. This is what @Andropov and @leman were trying to explain earlier in thread with @senttoschool. Yes Alder Lake is heterogeneous in core size (and unfortunately ISA, that’s one indicator that Alder Lake’s design was a bit rushed), but Intel’s heterogeneity’s raison d’être is different from say Apple’s. In some ways it is more similar to ARM’s tri-level designs … just without the little cores. The focus of such midrange cores is on multithreaded throughput perf/W while the focus of little cores is to as efficiently as possible keep housekeeping threads off the main cores. Midrange Gracemont and A7x cores *can* do that housekeeping just as A5x and Icestorm cores can be used for multithreaded throughput. But in neither case is it their primary function. (Icestorm is a weird case because it actually exists somewhere between A5x cores and A7x cores. But that’s a whole ‘nother topic.) Bottom line is though: while AMD may adopt heterogeneous CPUs they don’t face quite the same problems as Intel. They might decide that it also makes sense to go with midrange cores for themselves but they might not.

Overall I wouldn’t put the relationship between Intel 12 Gen and AMD Zen 3 as a lack of progress from AMD but Intel finally unf***ing themselves and moving to counter AMD’s surprise resurrection.

mr_roboto · Jan 28, 2022

cmaier said:
You could easily get away with 64 bit ops, and just convert to multiple of those as needed. I’d stay away from microops, because microops change from chip to chip.

Yup. To give folks an example, any fast modern CPU has a register renamer. An instruction which wants to read from architectural register R5 has to be transformed to a uop which reads from physical register PRx. The value of 'x' depends on factors like the CPU's microarchitecture, the history of decisions made by the renamer, and the size of the physical register file. (note that the value of 'x' for an instruction contained in a loop can and often will change from one iteration of the loop to the next! Also note that the physical register file is usually much larger than the actual register file, so you need more bits to address a register in the uop than you do in the original instruction.)

This kind of implementation-dependent, dynamic, runtime-determined information is why uops should never be thought of as something like the programmer-visible machine instructions. (It's also part of why it's a mistake to think of the uops in x86 CPUs as a pseudo-RISC ISA.)

senttoschool · Jan 28, 2022

crazy dave said:
Zen 3 is a year old and Zen 4, barring delays, will be here at the end of the year. That’s a reasonable cadence. Intel’s next chip Raptor Lake will be a reorganized Alder Lake similar to A15 relative to A14. It’s important to remember how much ground AMD had to catch up to Intel, that Zen 3 surpassed them by as much as it did surprised everyone, including AMD. Further AMD does have Zen 3+ coming to the desktop soon. For laptops they’re still competitive in terms of perf/W. Finally it’s important to remember that Intel’s total solution is not quite as cheap as it first appears: DDR5 is expensive and necessary if Alder Lake’s cores are to actually stretch their legs and motherboard prices have gone up.

Zen4 will be 2 years after Zen3. I don't think that's a reasonable cadence. That's slow compared to Intel's current roadmap.

I wouldn't use the word "reorganized" to describe A14 to A15. That would imply that they use the same cores but just organized differently. We see from Anandtech's breakdown of the A15 that the cores did in fact change, especially the efficiency cores which received a massive upgrade.

Raptor Lake is expected to have upgrades to ADL cores.

Zen 3+ desktop isn't a refresh. It's a single SKU (5800X) that has 3D cache glued on. It's targeted at gaming only as its clock speeds needed to decrease in order to accommodate extra heat. Perhaps you're confusing it with AMD's 6nm mobile Zen3 refresh?

You don't need DDR5 for Alder Lake. It can work with DDR4. In some applications, DDR4 was faster than DDR5 and vice versa.

crazy dave said:
The issue is that Alder Lake’s Golden Cove cores are still too big and power hungry to match AMD in core count.

This isn't an issue. This is a design choice. Golden Cove beats Zen3 in ST by nearly 20%. That's 1-2 generation difference. This is why it's big.

crazy dave said:
Thus Intel introduced midrange Gracemont cores to up the number of cores that could fit on a single die without blowing up power or die size.

Again, this is a design decision. ADL is primarily aimed at laptops but works well on desktop too. On desktop, the little cores do indeed massively boosts MT in a smart way. I don't see anything wrong with the design vs AMD when the results speak for themselves.

crazy dave said:
Now these midrange cores are actually quite nifty but should not be confused with traditional little or E-cores. This is what @Andropov and @leman were trying to explain earlier in thread with @senttoschool.

I don't think they were trying to explain it to me. I'm well aware of ADL's power ratings since I invest in semiconductor companies and follow every product closely. I'm also typing on an M1 Pro laptop right now with an A15 iPhone 13 next to me.

@Andropov and @leman were a bit confused. They were trying to say that ADL's little cores aren't designed to be as low power as little cores inside Apple Silicon. I was merely trying to point out that ADL was designed to compete against AMD, and to improve efficiency on laptops. I think ADL accomplishes both. It does not matter if ADL's little cores aren't "traditional" little cores.

crazy dave said:
Bottom line is though: while AMD may adopt heterogeneous CPUs they don’t face quite the same problems as Intel. They might decide that it also makes sense to go with midrange cores for themselves but they might not.

The computer world has moved to big.Little in virtually every category except servers. big.Little makes too much sense for phones, laptops, and desktops. It's not an advantage for AMD that they don't a big.Little design right now.

crazy dave said:
Overall I wouldn’t put the relationship between Intel 12 Gen and AMD Zen 3 as a lack of progress from AMD but Intel finally unf***ing themselves and moving to counter AMD’s surprise resurrection.

I don't think people are saying AMD is lacking progress. I think people are saying that ADL is hugely impressive in the x86 world and comfortably beats AMD's products on desktop and laptops at the moment.

I expect Zen4 to beat Raptor Lake in perf/watt in Q4 of this year but I expect Meteor Lake to surpass Zen4 two quarters later. If Zen5 takes two years to come out like Zen3 to Zen4, I think AMD will be in huge trouble.

Intel Alder Lake vs. Apple M1

Suspended

macrumors 68000

Suspended

Suspended

Suspended

macrumors 601

Suspended

Suspended

macrumors Core

Suspended

macrumors 6502

macrumors 6502

macrumors 601

macrumors 6502

macrumors 601

macrumors 6502

Suspended

macrumors 601

macrumors 68030

Suspended

macrumors 6502a

macrumors 68040

macrumors 68000

macrumors 6502a

macrumors 68030

Our Staff