Interesting recap of M1 chip progress - You Tube clip - Max Tech

Marty_Macfly · Aug 22, 2021

Hi All,

I came across the below video

Enjoy 😀

Martin

jdb8167 · Aug 22, 2021

Note that the Intel Alder Lake performance core has a 6-wide decoder. So x86-64 is no longer limited to a 4-wide decoder.

cmaier · Aug 22, 2021

jdb8167 said:
Note that the Intel Alder Lake performance core has a 6-wide decoder. So x86-64 is no longer limited to a 4-wide decoder.

What;s the issue width?

jdb8167 · Aug 22, 2021

Details here: https://www.anandtech.com/show/16881/a-deep-dive-into-intels-alder-lake-microarchitectures

Reads a bit like an Intel ad but anandtech is usually technically accurate anyway.

jdb8167 · Aug 22, 2021

cmaier said:
What;s the issue width?

12-ports

JMacHack · Aug 22, 2021

jdb8167 said:
Note that the Intel Alder Lake performance core has a 6-wide decoder. So x86-64 is no longer limited to a 4-wide decoder.

Interesting, AMD was wrong apparently. Unless the decoders aren’t able to be utilized.

jdb8167 · Aug 22, 2021

JMacHack said:
Interesting, AMD was wrong apparently. Unless the decoders aren’t able to be utilized.

According to the anandtech article when they asked Intel if some of the decoders were simplified, Intel declined to answer.

leman · Aug 22, 2021

JMacHack said:
Interesting, AMD was wrong apparently. Unless the decoders aren’t able to be utilized.

It’s not that AMD was wrong, it’s just that x86 CPUs have been limiting the decode to windows of 16 bytes. Intel is now doubling that, which makes having more decoders feasible.

I wonder how much die area the Adler Lake decode logic takes, and what is the implication for power consumption. The leaked PL limits for Adler Lake look intimidating. It’s possible that Intel is trading performance for power.

jdb8167 · Aug 22, 2021

leman said:
It’s not that AMD was wrong, it’s just that x86 CPUs have been limiting the decode to windows of 16 bytes. Intel is now doubling that, which makes having more decoders feasible.

I wonder how much die area the Adler Lake decode logic takes, and what is the implication for power consumption. The leaked PL limits for Adler Lake look intimidating. It’s possible that Intel is trading performance for power.

For reference, I copied the leaked (rumored?) Alder Lake power levels here:

https://forums.macrumors.com/threads/m1x-mac-minis-leaked-schematics-fake-or.2307038/post-30171993

Joelist · Aug 22, 2021

Here is the relevant part of the Anandtech article:

Starting off with the directly most obvious change: Intel is moving from being a 4-wide decode machine to being a 6-wide microarchitecture, a first amongst x86 designs, and a major design focus point. Over the last few years there had been a discussion point about decoder widths and the nature of x86’s variable length instruction set, making it difficult to design decoders that would go wider, compared to say a fixed instruction set ISA like Arm’s, where adding decoders is relatively easier to do. Notably last year AMD’s Mike Clarke had noted while it’s not a fundamental limitation, going for decoders larger than 4 instructions can create practical drawbacks, as the added complexity, and most importantly, added pipeline stages. For Golden Cove, Intel has decided to push forward with these changes, and a compromise that had to be made is that the design now adds an additional stage to the mispredict penalty of the microarchitecture, so the best-case would go up from 16 cycles to 17 cycles. We asked if there was still a kind of special-case decoder layout as in previous generations (such as the 1 complex + 3 simple decoder setup), however the company wouldn’t dwell deeper into the details at this point in time. To feed the decoder, the fetch bandwidth going into it has been doubled from 16 bytes per cycle to 32 bytes per cycle.

Intel states that the decoder is clock-gated 80% of the time, instead relying on the µOP cache. This has also seen extremely large changes this generation: first of all, the structure has now almost doubled from 2.25K entries to 4K entries, mimicking a similar large increase we had seen with the move from AMD’s Zen to Zen2, increasing the hit-rate and further avoiding going the route of the more costly decoders.

Sounds like a halfway house between typical x86 and what Apple has done.

jdb8167 · Aug 22, 2021

Joelist said:
Here is the relevant part of the Anandtech article:

Sounds like a halfway house between typical x86 and what Apple has done.

Also, read the comments (I can't believe I just said that). There are some very knowledgeable posters on the article. Maynard Handley (goes by @name99) is particularly knowledgeable about the M1 architecture and compares what Intel is doing vs. Apple Silicon.

Edit: Apparently has an account here as well.

Digitalguy · Aug 22, 2021

jdb8167 said:
Also, read the comments (I can't believe I just said that). There are some very knowledgeable posters on the article. Maynard Handley (goes by @name99) is particularly knowledgeable about the M1 architecture and compares what Intel is doing vs. Apple Silicon.

Edit: Apparently has an account here as well.

Very interesting stuff, but also very technical and probably too technical for the vast majority of people....
I thought I knew a bit better than average this stuff, but the level of technicality in those comments is extremely high and several things are too technical for me too....

cmaier · Aug 22, 2021

leman said:
It’s not that AMD was wrong, it’s just that x86 CPUs have been limiting the decode to windows of 16 bytes. Intel is now doubling that, which makes having more decoders feasible.

I wonder how much die area the Adler Lake decode logic takes, and what is the implication for power consumption. The leaked PL limits for Adler Lake look intimidating. It’s possible that Intel is trading performance for power.

Did they have to add a decode stage to the pipeline? Do they just do pre-decode and hide some decoding in the scheduling pipeline stages?

Back in the day when I was doing it, the decoder took about 15-20% of the area of a core (ignoring cache, if you consider L1 cache part of the core). To get to 32 bytes, probably need to come close to doubling the area (unless they are cheating and it’s not fully decoding).

mtneer · Aug 22, 2021

That video seems like an awful case of "Hindsight Bias".

Joelist · Aug 22, 2021

cmaier said:
Did they have to add a decode stage to the pipeline? Do they just do pre-decode and hide some decoding in the scheduling pipeline stages?

Back in the day when I was doing it, the decoder took about 15-20% of the area of a core (ignoring cache, if you consider L1 cache part of the core). To get to 32 bytes, probably need to come close to doubling the area (unless they are cheating and it’s not fully decoding).

The article indicates that they did have to add stages to the pipeline.

cmaier · Aug 22, 2021

Joelist said:
The article indicates that they did have to add stages to the pipeline.

It seemed a little vague to me because it was implied in the article that the extra pipeline stage only happens when there is an exception (which doesn’t make a lot of sense to me). By the way, 17 or 18 pipeline stages is a big problem. x86 cruft is crufty.

Joelist · Aug 23, 2021

cmaier said:
It seemed a little vague to me because it was implied in the article that the extra pipeline stage only happens when there is an exception (which doesn’t make a lot of sense to me). By the way, 17 or 18 pipeline stages is a big problem. x86 cruft is crufty.

Me as well. The funny part is Intel seems to be going back to longer pipes which was what got them in severe trouble performance and thermal wise in the old Netburst days.

mj_ · Aug 23, 2021

That's one of the most clickbaity titles I have ever seen, and Max Tech is no stranger when it comes to click bait. Just for that I refuse to watch it because you know it'll be nothing but baseless speculations, assumptions, and rumors.

Search

Search

Interesting recap of M1 chip progress - You Tube clip - Max Tech

Marty_Macfly

macrumors 6502a

jdb8167

macrumors 601

cmaier

Suspended

jdb8167

macrumors 601

jdb8167

macrumors 601

JMacHack

Suspended

jdb8167

macrumors 601

leman

macrumors Core

jdb8167

macrumors 601

Joelist

macrumors 6502

jdb8167

macrumors 601

Digitalguy

macrumors 601

cmaier

Suspended

mtneer

macrumors 68040

Joelist

macrumors 6502

cmaier

Suspended

Joelist

macrumors 6502

mj_

macrumors 68000

Our Staff