3nm Intel chips

leman · Jul 5, 2021

pshufd said:
The X86 decoder penalty is insurmountable which is why everyone is looking at ARM or RISC.

That’s what people like to say and still x86 CPUs are significantly faster than other, with Apple being a singular exception. It takes more than using a trivial decoder to make a fast CPU.

pshufd · Jul 5, 2021

leman said:
That’s what people like to say and still x86 CPUs are significantly faster than other, with Apple being a singular exception. It takes more than using a trivial decoder to make a fast CPU.

It does.

It also takes $$$$.

And a strong reason to do so. There are lots of companies that make chips that aren't the fastest around because they don't need to be. A lot of people use old hardware and are perfectly fine doing so.

Apple has done fine with Macs without having had to design or make their own chips which means that they can do fine without owning the fastest chip.

leman · Jul 5, 2021

pshufd said:
Apple has done fine with Macs without having had to design or make their own chips which means that they can do fine without owning the fastest chip.

I don’t understand this part. Yes, they used Intel chips, but those Intel chips were always the fastest available consumer chips on market (in their respective category). In fact, a fast CPU was one of the main selling points of the prosumer MacBook Pros, and the main criticism of these machines has always been the high price lackluster GPU performance compared to some other brands.

If the next generation MacBook Pros actually end up being less performant than their PC competitors, it’s going to be a failure on all fronts. The point of Apple Silicon is that it offers much better value compared to an x86 PC in the same category - better CPU, better GPU, better battery. If Apple can’t maintain this advantage, it will open a free gate to criticisms along the lines “Apple doesn’t care about the pros”, “Apple just wants to seize control by offering you a phone CPU”, “Apple thinks that an oversized iPad is sufficient for pros”. That’s exactly what they don’t want. They want Apple Silicon machines to be the best products, without competition, in the premium market.

pshufd · Jul 5, 2021

leman said:
I don’t understand this part. Yes, they used Intel chips, but those Intel chips were always the fastest available consumer chips on market (in their respective category). In fact, a fast CPU was one of the main selling points of the prosumer MacBook Pros, and the main criticism of these machines has always been the high price lackluster GPU performance compared to some other brands.

If the next generation MacBook Pros actually end up being less performant than their PC competitors, it’s going to be a failure on all fronts. The point of Apple Silicon is that it offers much better value compared to an x86 PC in the same category - better CPU, better GPU, better battery. If Apple can’t maintain this advantage, it will open a free gate to criticisms along the lines “Apple doesn’t care about the pros”, “Apple just wants to seize control by offering you a phone CPU”, “Apple thinks that an oversized iPad is sufficient for pros”. That’s exactly what they don’t want. They want Apple Silicon machines to be the best products, without competition, in the premium market.

Apple may have decided to work a lot harder on the volume market. It's clear that the Apple Silicon rollout has taken this approach.

Fastest consumer chips on the market? I don't think that any of the iMacs can beat a Ryzen 5950X.

leman · Jul 5, 2021

pshufd said:
Apple may have decided to work a lot harder on the volume market. It's clear that the Apple Silicon rollout has taken this approach.

I see no indication of this. To me, it seems that their basic strategy has remained the same — deliver best possible all-round premium device at a given price level. It's just that with M1 they can make better machines than what was possible with x86 chips, so it's a purchase decision that's easier to make.

pshufd said:
Fastest consumer chips on the market? I don't think that any of the iMacs can beat a Ryzen 5950X.

Now you are just being polemic for the sake of it. The iMac was refreshed in mid 2020, and it did offer the fastest (sub 100W) available Intel CPU at the time. Zen3 didn't make an appearance until several months later. Besides, AMD only started catching up like two years ago, and Zen3 is there first design that is truly competitive with Intel at the high-end level. I am instead talking about what what was happening in the last decade.

There were only two instances I know where Apple did not use latest and greatest Intel CPUs, and both were to avoid a significant GPU degradation (one was when Intel banned Nvidia from making chipsets and so Apple sticked with the older CPU and a faster Nvidia iGPU, and another one was when Intel dropped Iris Pro graphics).

senttoschool · Jul 5, 2021

pshufd said:
Apple may have decided to work a lot harder on the volume market. It's clear that the Apple Silicon rollout has taken this approach.

What does this mean and why is it so clear?

pshufd said:
Fastest consumer chips on the market? I don't think that any of the iMacs can beat a Ryzen 5950X.

Not compeltely right now. M1 can't compete with Ryzen 5950X in raw CPU performance. But it beats it in machine learning, some video editing, and some video encoding/decoding. But the M1 is not meant to compete with the 5950x. Wait for the iMac Pro/Mac Pro.

pshufd · Jul 5, 2021

leman said:
I see no indication of this. To me, it seems that their basic strategy has remained the same — deliver best possible all-round premium device at a given price level. It's just that with M1 they can make better machines than what was possible with x86 chips, so it's a purchase decision that's easier to make.

Now you are just being polemic for the sake of it. The iMac was refreshed in mid 2020, and it did offer the fastest (sub 100W) available Intel CPU at the time. Zen3 didn't make an appearance until several months later. Besides, AMD only started catching up like two years ago, and Zen3 is there first design that is truly competitive with Intel at the high-end level. I am instead talking about what what happening in the last decade.

They could have picked 5, 6, 7, 8, 9 performance cores on the M1.

They could have had access to AMD's plans far in advance of AMD's launch.

pshufd · Jul 5, 2021

senttoschool said:
What does this mean and why is it so clear?

Not compeltely right now. M1 can't compete with Ryzen 5950X in raw CPU performance. But it beats it in machine learning, some video editing, and some video encoding/decoding. But the M1 is not meant to compete with the 5950x. Wait for the iMac Pro/Mac Pro.

The M1 is a volume chip.

Wait for Zen 4.

leman · Jul 5, 2021

pshufd said:
They could have picked 5, 6, 7, 8, 9 performance cores on the M1.

M1 is perfectly fine in it’s current configuration for its intended use. It’s an entry-level chip and as such, it is superior to any comparable offering. It was never intended to compete with performance-oriented machines in the first place.

Apple decided to start Apple Silicon rollout from low end instead of high end. It’s a reasonable business strategy that reduces risks, gives them time to prepare more advanced technology and helps generate a lot of hype. Just because they employ this particular transition strategy does not mean that their long-term strategy is going to sacrifice performance.

pshufd said:
They could have had access to AMD's plans far in advance of AMD's launch.

I am sure they did. There is still very little reason to switch vendors for a one-time limited product configuration. Just the cost alone would have been prohibitive.

pshufd · Jul 5, 2021

leman said:
M1 is perfectly fine in it’s current configuration for its intended use. It’s an entry-level chip and as such, it is superior to any comparable offering. It was never intended to compete with performance-oriented machines in the first place.

Apple decided to start Apple Silicon rollout from low end instead of high end. It’s a reasonable business strategy that reduces risks, gives them time to prepare more advanced technology and helps generate a lot of hype. Just because they employ this particular transition strategy does not mean that their long-term strategy is going to sacrifice performance.

That's basically what I was saying.

leman · Jul 5, 2021

pshufd said:
That's basically what I was saying.

Then it seems we are in agreement 🙂

dmccloud · Jul 5, 2021

pshufd said:
The X86 decoder penalty is insurmountable which is why everyone is looking at ARM or RISC.

This is a big advantage of an ARM-based architecture over x86. Because x86 has variable-length instructions, the decoder pipeline has to check every bit to determine whether it is the beginning or end of an instruction. This is why AMD has stated that the practical limit is four-wide in the decoding pipeline. In contrast, the ARM instruction set is fixed length, so going wider with decoders (the M1 has 8) means the SoC can process more instructions simultaneously. The other piece of the puzzle is the Re-Order Buffer (ROB), which parks instructions prior to being run through the decoding pipeline. From Anandtech:

A +-630 deep ROB is an immensely huge out-of-order window for Apple’s new core, as it vastly outclasses any other design in the industry. Intel’s Sunny Cove and Willow Cove cores are the second-most “deep” OOO designs out there with a 352 ROB structure, while AMD’s newest Zen3 core makes due with 256 entries, and recent Arm designs such as the Cortex-X1 feature a 224 structure.

JMacHack · Jul 5, 2021

pshufd said:
Apple is running at 3.2 Ghz while Intel and AMD are running at 5 Ghz. It's pretty clear that Apple cares more about power efficiency than absolute single-core performance.

There’s no guarantee that the M1 can run at higher frequencies

pshufd · Jul 5, 2021

JMacHack said:
There’s no guarantee that the M1 can run at higher frequencies

Do you think that it could run at 3.3 Ghz? 3.4? 3.5? The Intel lead is razor thin. On the other hand, what would Intel's performance look like at 3.2 Ghz?

JMacHack · Jul 5, 2021

pshufd said:
Do you think that it could run at 3.3 Ghz? 3.4? 3.5? The Intel lead is razor thin. On the other hand, what would Intel's performance look like at 3.2 Ghz?

Oh I’m not questioning the performance of anything. I just felt that I should point out that cpus can’t always be clocked higher reliably.

pshufd · Jul 5, 2021

JMacHack said:
Oh I’m not questioning the performance of anything. I just felt that I should point out that cpus can’t always be clocked higher reliably.

Yup. I'm aware of that.

There's also binning because there are variances in how chips perform. And I'd expect that the chips that Apple makes have some being able to run faster than 3.2 Ghz.

cmaier · Jul 5, 2021

dmccloud said:
This is a big advantage of an ARM-based architecture over x86. Because x86 has variable-length instructions, the decoder pipeline has to check every bit to determine whether it is the beginning or end of an instruction. This is why AMD has stated that the practical limit is four-wide in the decoding pipeline. In contrast, the ARM instruction set is fixed length, so going wider with decoders (the M1 has 8) means the SoC can process more instructions simultaneously. The other piece of the puzzle is the Re-Order Buffer (ROB), which parks instructions prior to being run through the decoding pipeline. From Anandtech:

Hard to say whether a deep ROB is, itself, a good thing - it’s a consequence of thE number of pipelines and the pipeline depth. Once you’d decided to have N pipelines with X execution stages, you need a ROB big enough to keep track of them all. Bigger than that does no good. Smaller than that means you can’t issue more instructions even though you have available pipelines.

leman · Jul 5, 2021

cmaier said:
Hard to say whether a deep ROB is, itself, a good thing - it’s a consequence of thE number of pipelines and the pipeline depth. Once you’d decided to have N pipelines with X execution stages, you need a ROB big enough to keep track of them all. Bigger than that does no good. Smaller than that means you can’t issue more instructions even though you have available pipelines.

Do you know of any research into how much ILP can be extracted from typical code? I have always wondered whether there is a practical limit of going wide. Intuitively I'd think that data dependencies (as well as branches) will lead to diminished returns fairly soon.

cmaier · Jul 5, 2021

leman said:
Do you know of any research into how much ILP can be extracted from typical code? I have always wondered whether there is a practical limit of going wide. Intuitively I'd think that data dependencies (as well as branches) will lead to diminished returns fairly soon.

We always do that research 🙂. The answer varies widely depending on workload and changes over the years.

Even if the number were infinite, there would be practical limitations - you incorrectly predict a branch and you have to clear all those in-flight instructions and rewind.

dgdosen · Jul 5, 2021

Have any of you chip designers ever read a book called "Principles of Product Development Flow"? It's one of my favorite books about making decisions on "how" to do work (from a product developer point of view). It's math heavy; focuses on queues, variability, wip, etc. I think there are similar (but different) problems facing chip designers, I think problems facing both environments provide helpful illustrations and best practices on how to address them.

leman · Jul 5, 2021

cmaier said:
We always do that research 🙂. The answer varies widely depending on workload and changes over the years.

Even if the number were infinite, there would be practical limitations - you incorrectly predict a branch and you have to clear all those in-flight instructions and rewind.

Any papers you would recommend?

Sydde · Jul 5, 2021

leman said:
That’s what people like to say and still x86 CPUs are significantly faster than other, with Apple being a singular exception. It takes more than using a trivial decoder to make a fast CPU.

Apple has customized the A/M series processors for iOS and macOS, which is why they run so well. They most likely left a bunch of stuff out (32-bit, obviously, and options that off-the-rack CPUs offer that they do not need to implement) and probably added in a feature or two that makes the OSes run better (like object/method management shortcuts). Mostly tiny gains when viewed closely but that add up over gigacycles. I suspect that A- and M-series SoCs run software written for Apple systems much more efficiently than they would other systems/software.

pshufd · Jul 5, 2021

Sydde said:
Apple has customized the A/M series processors for iOS and macOS, which is why they run so well. They most likely left a bunch of stuff out (32-bit, obviously, and options that off-the-rack CPUs offer that they do not need to implement) and probably added in a feature or two that makes the OSes run better (like object/method management shortcuts). Mostly tiny gains when viewed closely but that add up over gigacycles. I suspect that A- and M-series SoCs run software written for Apple systems much more efficiently than they would other systems/software.

They seem to run Windows on ARM better than anything else out there. I wonder if Microsoft has bought a bunch of M1 systems for their WARM development team.

pshufd · Jul 5, 2021

Sydde said:
Apple has customized the A/M series processors for iOS and macOS, which is why they run so well. They most likely left a bunch of stuff out (32-bit, obviously, and options that off-the-rack CPUs offer that they do not need to implement) and probably added in a feature or two that makes the OSes run better (like object/method management shortcuts). Mostly tiny gains when viewed closely but that add up over gigacycles. I suspect that A- and M-series SoCs run software written for Apple systems much more efficiently than they would other systems/software.

Do Intel and AMD provide a neural engine in their chips? I thought I saw something that they had neural instructions. Neural engine sounds more like a big set of CISC instructions.

leman · Jul 5, 2021

Sydde said:
Apple has customized the A/M series processors for iOS and macOS, which is why they run so well. They most likely left a bunch of stuff out (32-bit, obviously, and options that off-the-rack CPUs offer that they do not need to implement) and probably added in a feature or two that makes the OSes run better (like object/method management shortcuts). Mostly tiny gains when viewed closely but that add up over gigacycles. I suspect that A- and M-series SoCs run software written for Apple systems much more efficiently than they would other systems/software.

Apple Silicon is not fast because it runs optimized OS and optimized tooling, it's fast because it has twice as many low-latency execution units as any x86 CPU, an insanely huge out of order execution buffer that can track over 600 scheduled instructions, aggressive move elimination, better batch predictors and huge caches, among other things. These are simply silly fast chips. They will excel at most workloads, no matter which programing language or tooling you use (it doesn't have to be Apple stuff). I mean, my M1 run legacy Fortran code better than my Intel i9, despite using an unsupported version of Apple-hostile GCC compiler.

The few Apple-specific hardware optimizations (like the Objective-C method dispatch predictor you mention) are just the icing on the cake.

3nm Intel chips

macrumors Core

macrumors G4

macrumors Core

macrumors G4

macrumors Core

macrumors 68030

macrumors G4

macrumors G4

macrumors Core

macrumors G4

macrumors Core

macrumors 68040

Suspended

macrumors G4

Suspended

macrumors G4

Suspended

macrumors Core

Suspended

macrumors 68030

macrumors Core

macrumors 68030

macrumors G4

macrumors G4

macrumors Core

Our Staff