Why does Apple act like Intel is always holding them back?

dmccloud · Jan 5, 2021

pshufd said:
I wish that Apple broke out Mac sales by model. I'd love to know how many MacBook Air, Pro 13 and Mini M1 models they sold in Q4 compared to Q4 of 2019.

The closest we'll get to that is the standard year to year comparison. Personally, I think the real point of comparison will be Q3 2020 vs. Q4 2020, since Mac sales have outpaced 2019 since the pandemic began. The only way to really even draw any sort of inference between the new Macs and their impact on sales is to compare the growth between Q3 and Q4 2020 to the growth for the same timeframe in 2019. Since Apple does not break sales numbers down by specific models or product lines, any further analysis will be (un)educated guessing.

pshufd · Jan 6, 2021

I didn't get any feedback on my writeup which leaves me wondering if the explanation is understandable by non-specialists. It seems really obvious to me but I used to write in machine code and assembler.

There's another explanation at

- this one is spoken instead of written and it has a few diagrams which might make it a bit easier to understand but it may be that it's still difficult for the non-specialist to understand.

pshufd · Jan 6, 2021

For those who prefer research papers:

Appears in the 19th IEEE International Symposium on High Performance Computer Architecture (HPCA 2013)

Power Struggles: Revisiting the RISC vs. CISC Debateon Contemporary ARM and x86 Architectures
Emily Blem, Jaikrishnan Menon, and Karthikeyan Sankaralingam
University of Wisconsin - Madison{blem,menon,karu}@cs.wisc.edu

RISC vs. CISC wars raged in the 1980s when chip area andprocessor design complexity were the primary constraints anddesktops and servers exclusively dominated the computing land-scape. Today, energy and power are the primary design con-straints and the computing landscape is significantly different:growth in tablets and smartphones running ARM (a RISC ISA)is surpassing that of desktops and laptops running x86 (a CISCISA). Further, the traditionally low-power ARM ISA is enter-ing the high-performance server market, while the traditionallyhigh-performance x86 ISA is entering the mobile low-power de-vice market. Thus, the question of whether ISA plays an intrinsicrole in performance or energy efficiency is becoming important,and we seek to answer this question through a detailed mea-surement based study on real hardware running real applica-tions. We analyze measurements on the ARM Cortex-A8 andCortex-A9 and Intel Atom and Sandybridge i7 microprocessorsover workloads spanning mobile, desktop, and server comput-ing. Our methodical investigation demonstrates the role of ISAin modern microprocessors’ performance and energy efficiency.We find that ARM and x86 processors are simply engineeringdesign points optimized for different levels of performance, andthere is nothing fundamentally more energy efficient in one ISAclass or the other. The ISA being RISC or CISC seems irrelevant.

https://research.cs.wisc.edu/vertical/papers/2013/hpca13-isa-power-struggles.pdf

Screen Shot 2021-01-06 at 9.18.40 AM.png

This paper comes to the conclusion that there is no difference in power efficiency between CISC and RISC from the testing that they did on processors a decade ago. Their paper does, though, identify the advantages and disadvantages of the two approaches.

Serious kudos for Apple to understand that you could really press the advantages of RISC over CISC when the research community didn't see this and actually take advantage of it. I think that a good chunk of the world believed that x86 had inherent advantages which couldn't be overcome by RISC or that there was no difference. I was certainly in this camp until M1.

All of the other parts are easy to understand - Apple adding special-purpose IP to their chips to perform specific and common tasks well. But the RISC vs CISC aspect is just so Edison.

jdb8167 · Jan 6, 2021

pshufd said:
I didn't get any feedback on my writeup which leaves me wondering if the explanation is understandable by non-specialists. It seems really obvious to me but I used to write in machine code and assembler.

There's another explanation at

- this one is spoken instead of written and it has a few diagrams which might make it a bit easier to understand but it may be that it's still difficult for the non-specialist to understand.

This video is excellent. Very cogent explanation for a very complex subject. Great overview.

crashnburn · Jan 12, 2021

pshufd said:
I didn't get any feedback on my writeup which leaves me wondering if the explanation is understandable by non-specialists. It seems really obvious to me but I used to write in machine code and assembler.

There's another explanation at

- this one is spoken instead of written and it has a few diagrams which might make it a bit easier to understand but it may be that it's still difficult for the non-specialist to understand.

That video was a great refresher of my Computer Architecture / CPU & Co Processor fundamentals from college, grad school & ASM/ Device Driver programming days; ALU

If I was Intel or AMD, and had the money to throw at Fabrication + Architecture pipeline boosting, would it be accurate to say its still primarily handicapped by inability or inefficiency of sequencing & fulfilling instruction sets/ OpCodes (longer ones);

Question is.. what could the Chip world folks "copy" from Apple the way Microsoft used to copy from Apple?

OR - Do they have to start fresh from the drawing board, to be competitive.. in the long run.. ?

Would I be off to make the analogy of Y2K - The choice of 2 digits instead of 4 to represent Year caused the Y2K decades down the line. - But at the non chip/ software level.

Now, having chosen a CISC / Longer Opcodes instead of Fixed Shorter & easier to sequence ones is causing pains decades down the line - But at the chip / hardware level.

Opposite design "choice / compromise", but painful after decades.

Would I be wrong to say that Simplified Optimized Atomic hardware code +
Complexity abstracted to above those layers
AND / OR with abstracted/ delegated to their own specialized Co Processors is what made M1 and might be the Design Approach/ Maxim going forward?

pshufd said:
For those who prefer research papers:

Appears in the 19th IEEE International Symposium on High Performance Computer Architecture (HPCA 2013)

Power Struggles: Revisiting the RISC vs. CISC Debateon Contemporary ARM and x86 Architectures
Emily Blem, Jaikrishnan Menon, and Karthikeyan Sankaralingam
University of Wisconsin - Madison{blem,menon,karu}@cs.wisc.edu

RISC vs. CISC wars raged in the 1980s when chip area andprocessor design complexity were the primary constraints anddesktops and servers exclusively dominated the computing land-scape. Today, energy and power are the primary design con-straints and the computing landscape is significantly different:growth in tablets and smartphones running ARM (a RISC ISA)is surpassing that of desktops and laptops running x86 (a CISCISA). Further, the traditionally low-power ARM ISA is enter-ing the high-performance server market, while the traditionallyhigh-performance x86 ISA is entering the mobile low-power de-vice market. Thus, the question of whether ISA plays an intrinsicrole in performance or energy efficiency is becoming important,and we seek to answer this question through a detailed mea-surement based study on real hardware running real applica-tions. We analyze measurements on the ARM Cortex-A8 andCortex-A9 and Intel Atom and Sandybridge i7 microprocessorsover workloads spanning mobile, desktop, and server comput-ing. Our methodical investigation demonstrates the role of ISAin modern microprocessors’ performance and energy efficiency.We find that ARM and x86 processors are simply engineeringdesign points optimized for different levels of performance, andthere is nothing fundamentally more energy efficient in one ISAclass or the other. The ISA being RISC or CISC seems irrelevant.

https://research.cs.wisc.edu/vertical/papers/2013/hpca13-isa-power-struggles.pdf

View attachment 1707903

This paper comes to the conclusion that there is no difference in power efficiency between CISC and RISC from the testing that they did on processors a decade ago. Their paper does, though, identify the advantages and disadvantages of the two approaches.

Serious kudos for Apple to understand that you could really press the advantages of RISC over CISC when the research community didn't see this and actually take advantage of it. I think that a good chunk of the world believed that x86 had inherent advantages which couldn't be overcome by RISC or that there was no difference. I was certainly in this camp until M1.

All of the other parts are easy to understand - Apple adding special-purpose IP to their chips to perform specific and common tasks well. But the RISC vs CISC aspect is just so Edison.

crashnburn · Jan 12, 2021

Did someone explain to the OP that if the evolution of your product is constrained & handicapped by your supplier and you wish to make leaps ahead you have no choice but to jump - to another or upstream to make your own.

Apparently Apple/ Jobs had approached Intel for mobile chips for the iPhone - Intel.. well acted like a pricey well.. and they were probably in a position to, at the time.

So apple got back into the CPU game they had gotten out of when they dropped the Power PC for Intel.

There is no ONE WAY to do this, its lot of variables and context at a GIVEN TIME, that cause Shifts.

Brian1230 · Jan 12, 2021

iluvmacs99 said:
Majority of the sales revenue Apple made over the years are from iOS devices, namely the iPhone and they all use Apple chips. So it becomes obvious that the end result is a transition from Intel to Apple silicon where they can integrate the Macs into their full line of A series chips. Mac sales has been stagnant for years, so in putting the ARM chip inside future Macs, they are hoping the people who would have no issues spending $1000 and up on an iPhone 11 would buy a Macbook with an ARM chip and allow them to run iOS apps natively on the Mac. There are many amazing and impressive apps on iOS (music and video related) that would benefit a bigger screen and connectivity of a normal Mac, but is not feasible with Intel macs due to thermal issues and battery life. The Mac Mini 2018 and the Macbooks have all face similar thermal issues when pushed, so it is clear Apple wants to release Macs that are not impeded by thermal issues and consume less watts. They believe the time is now for Macs. Apple wants to sell more macs to customers that are using their iOS devices. It is the way it is. The reason people like myself use an Apple is not only because they use Intel chips, but the tight integration of all devices; phone, tablet, watch, wireless earbuds, input devices and now Macs into one system.

There are 2 main reasons I only use Apple products, security, and everything just works the way it is expected to, and the iPhone, iPad, and watch all integrate together as they should.

the one reason that isn't a main reason is the beautiful and awesome designs of their products.

another reason is resale value of the devices, I just purchased my 2015 MacBook Air for $250.00. If I wanted to sell it to get a newer MacBook Air or the brand new one like my boyfriend just purchased, I could sell this one for at least the $250 I paid for it and likely closer to $400. Try that with a windows computer or anything android.

pshufd · Jan 12, 2021

crashnburn said:
That video was a great refresher of my Computer Architecture / CPU & Co Processor fundamentals from college, grad school & ASM/ Device Driver programming days; ALU

If I was Intel or AMD, and had the money to throw at Fabrication + Architecture pipeline boosting, would it be accurate to say its still primarily handicapped by inability or inefficiency of sequencing & fulfilling instruction sets/ OpCodes (longer ones);

Question is.. what could the Chip world folks "copy" from Apple the way Microsoft used to copy from Apple?

OR - Do they have to start fresh from the drawing board, to be competitive.. in the long run.. ?

Would I be off to make the analogy of Y2K - The choice of 2 digits instead of 4 to represent Year caused the Y2K decades down the line. - But at the non chip/ software level.

Now, having chosen a CISC / Longer Opcodes instead of Fixed Shorter & easier to sequence ones is causing pains decades down the line - But at the chip / hardware level.

Opposite design "choice / compromise", but painful after decades.

Would I be wrong to say that Simplified Optimized Atomic hardware code +
Complexity abstracted to above those layers
AND / OR with abstracted/ delegated to their own specialized Co Processors is what made M1 and might be the Design Approach/ Maxim going forward?

The articles that I've read attribute Apple's performance per watt to these two things:

Fixed vs variable length instructions and greater decoder efficiency, and,
Special-purpose transistors to perform frequent tasks done by Apple customers.

I suppose that they have a process lead on AMD and Intel and they can pack in more transistors but the M1 is efficient with transistors. One of the big motivations for CISC was that it packs more instructions in less memory. That was a big issue in the 60s, 70s, 80s, 90s, and, in the 00s, memory prices kept going down while capacities kept going up.

So AS has an inherent advantage over x86 which I don't see a way of AMD or Intel overcoming in terms of IPC. Sure, Intel's new Rocket Lake can run at 7 Ghz with liquid nitrogen but that's not practical for most people and certainly not useful for laptops.

Intel and AMD make general purpose CPUs and adding special purpose transistors would maybe help some customers and not others. I think that AMD and Intel would prefer to use transistors to make more cores whereas Apple is using them for special purposes that Apple knows will benefit their customers more than more cores. The special purpose stuff should be super-efficient in terms of performance per watt.

Intel and AMD are fortunate that Apple's marketshare is low and that it's a niche market. The question is will it grow to more than niche? We have one and my daughter loves it and I'm jealous as my old MacBook Pro spends a lot of time on the charger.

Intel has been doing a lot of band-aids lately. AMD is making great x86 CPUs and I wouldn't mind having one while waiting for M1X. Unfortunately you can't get them or you need to pay $300 to $500 above MSRP.

pshufd · Jan 12, 2021

Brian1230 said:
There are 2 main reasons I only use Apple products, security, and everything just works the way it is expected to, and the iPhone, iPad, and watch all integrate together as they should.

the one reason that isn't a main reason is the beautiful and awesome designs of their products.

another reason is resale value of the devices, I just purchased my 2015 MacBook Air for $250.00. If I wanted to sell it to get a newer MacBook Air or the brand new one like my boyfriend just purchased, I could sell this one for at least the $250 I paid for it and likely closer to $400. Try that with a windows computer or anything android.

Apple hasn't even really been trying on Mac hardware. Their refresh cycles are laughable as are the hardware finishes. Apple could have done far more on the Mac this past decade but they focused on phones and tablets. The absolute right thing to do for shareholders but we poor Mac customers have missed out on a lot of things that would have been nice. Maybe higher margins on Macs (that they won't be paying to Intel) will translate to bigger R&D budgets.

leman · Jan 12, 2021

pshufd said:
The articles that I've read attribute Apple's performance per watt to these two things:

Fixed vs variable length instructions and greater decoder efficiency, and,
Special-purpose transistors to perform frequent tasks done by Apple customers.

This doesn't make much sense though. Other ARM CPUs implement the same ISA, yet they are considerably slower. Similarly, modern Intel Atom cores can match the power efficiency of some ARM CPUs, even though they have to decode the same variable-width ISA. And finally, attributing performance increase in general-purpose software to "special-purpose" transistors doesn't make much sense.

Clearly, there must be something deeper at the architectural level that Apple is doing.

pshufd · Jan 12, 2021

leman said:
This doesn't make much sense though. Other ARM CPUs implement the same ISA, yet they are considerably slower. Similarly, modern Intel Atom cores can match the power efficiency of some ARM CPUs, even though they have to decode the same variable-width ISA. And finally, attributing performance increase in general-purpose software to "special-purpose" transistors doesn't make much sense.

Clearly, there must be something deeper at the architectural level that Apple is doing.

They are using a high-power process too. ARM has generally been a low-power process.

Do other ARM chips provide 8 decoders? It is possible that nobody thought of the idea of using so many of them.

leman · Jan 12, 2021

pshufd said:
They are using a high-power process too. ARM has generally been a low-power process.

Do other ARM chips provide 8 decoders? It is possible that nobody thought of the idea of using so many of them.

No, they do not, because it's probably not as easy as "build 8 decoders". You need a way to deal with the data dependencies between individual instructions.

Apple makes it possible by using a very large out of order execution window, excellent branch prediction that allows it to execute code well ahead of time and advanced memory prefetching which makes sure that little CPU time is spent waiting for data. It's these properties of it's architecture that constitute the secret source, not the width itself. So far, Appel is the only one who has successfully built a very wide OOO architecture. If it were easy, others would have probably done it as well already.

pshufd · Jan 12, 2021

leman said:
No, they do not, because it's probably not as easy as "build 8 decoders". You need a way to deal with the data dependencies between individual instructions.

Apple makes it possible by using a very large out of order execution window, excellent branch prediction that allows it to execute code well ahead of time and advanced memory prefetching which makes sure that little CPU time is spent waiting for data. It's these properties of it's architecture that constitute the secret source, not the width itself. So far, Appel is the only one who has successfully built a very wide OOO architecture. If it were easy, others would have probably done it as well already.

Yes, that's what the reorder buffer is for.

Yes, this is described in the video.

Prefetching is available in other CPUs as well.

thejadedmonkey · Jan 12, 2021

auralux said:
Fair enough, but you still didn’t answer my question as to why they don’t at least use Intel’s current yearly lineup? What would be the harm? It would at least be minor spec bumps and look much better overall.

Different CPUs means keeping more inventory in stock. Instead of 1 model, they need 3. Newer CPUs have a higher price, reducing margins for Apple. Finally, since Intel CPUs haven't really changed in years, there's no real reason to upgrade them.

darngooddesign · Jan 12, 2021

I was at SCAD in the early 90s doing graphic design and Apple had someone from the local dealer come to campus and demo the first three RISC Macs. Even the low-end 6100 blew us away.

Krevnik · Jan 12, 2021

pshufd said:
Do other ARM chips provide 8 decoders? It is possible that nobody thought of the idea of using so many of them.

So, Apple’s ARM implementation isn’t quite like you see in other ARM cores, since they design it in house. Qualcomm only uses reference designs from ARM themselves for the CPU cores, so they are at the behest of ARM itself as to what they get.

The M1 has no 32-bit support. The CPU cores only understand ARM64, so no dealing with the older ISA that makes up 32-bit, or Thumb. So the decoders are simpler, making room for more of them. ARM64 is also supposedly purpose built for some of the OOO stuff Apple is now doing, but ARM32/Thumb isn’t. So reference designs from ARM that include 32-bit and Thumb aren’t going to be able to take full advantage like Apple is doing, since any OOO for microops will need to still be correct for these other modes, or be able to switch ordering modes on the fly, which eats space.

In effect, ARM64 was designed for a future where ARM32 was deprecated and support removed, opening up opportunities when that happens. Apple, being Apple, pushed to deprecate ARM32 on iOS as quickly as possible, and now they are reaping the benefits ahead of everyone else in the A14/M1.

There’s still also the fact that Apple has been putting together a very good engineering team that has been able to chart this course, and make it happen. One thing I see there is that Apple had to be thinking at least 5 years ahead in their planning. A lot of planning I see in the companies I’ve worked for are quarter-focused to the point that they could never invest in such ”long-pole” efforts such as this. But a lack of those long-term investments are ultimately what lead to stagnation when short-term projects stop producing results.

pshufd · Jan 12, 2021

Krevnik said:
So, Apple’s ARM implementation isn’t quite like you see in other ARM cores, since they design it in house. Qualcomm only uses reference designs from ARM themselves for the CPU cores, so they are at the behest of ARM itself as to what they get.

The M1 has no 32-bit support. The CPU cores only understand ARM64, so no dealing with the older ISA that makes up 32-bit, or Thumb. So the decoders are simpler, making room for more of them. ARM64 is also supposedly purpose built for some of the OOO stuff Apple is now doing, but ARM32/Thumb isn’t. So reference designs from ARM that include 32-bit and Thumb aren’t going to be able to take full advantage like Apple is doing, since any OOO for microops will need to still be correct for these other modes, or be able to switch ordering modes on the fly, which eats space.

In effect, ARM64 was designed for a future where ARM32 was deprecated and support removed, opening up opportunities when that happens. Apple, being Apple, pushed to deprecate ARM32 on iOS as quickly as possible, and now they are reaping the benefits ahead of everyone else in the A14/M1.

There’s still also the fact that Apple has been putting together a very good engineering team that has been able to chart this course, and make it happen. One thing I see there is that Apple had to be thinking at least 5 years ahead in their planning. A lot of planning I see in the companies I’ve worked for are quarter-focused to the point that they could never invest in such ”long-pole” efforts such as this. But a lack of those long-term investments are ultimately what lead to stagnation when short-term projects stop producing results.

AMD has said that having more than four decoders would provide no practical benefit on x86. Intel also has four so that looks like the practical limit for x86. AS also has a reorder buffer three times larger than that of the x86 chips implying a big bump in parallel microops operations.

I was told by a chip designer that you're working on designs for chips that will be implemented several years down the road. Yes, M1 is A14 or A15 - Apple is not the new kid on the block.

So yes, all of this stuff is known and most of it is in the video and article that I linked.

leman · Jan 12, 2021

pshufd said:
Yes, that's what the reorder buffer is for.

Yes, this is described in the video.

Prefetching is available in other CPUs as well.

True, but all modern CPUs are out-of-order designs with reorder buffers, prefetching, branch prediction etc... what stops from other chips designers from "pulling an Apple" and making a very wide core? Apple CPU designs are nothing new by now. But even the brand-new ARM Cortex X1 that is targeting "high performance" only has 5 instruction decoders and a 220 something ROB.... and it still can't match the A13. Why didn't ARM decide to make X1 wider?

All I am saying that it's more complex than "give it more instruction decoders and a larger OOO window". There has to be an additional trick to how Apple managed to do it and nobody else can.

pshufd said:
AMD has said that having more than four decoders would provide no practical benefit on x86. Intel also has four so that looks like the practical limit for x86.

Did AMD say that exactly, or did someone from AMD say that they didn't notice any practical benefit going wider for their specific CPU design? I still haven't seen any good reason why a wider x86 design would be impossible if one approaches the topic from a different angle.

leman · Jan 12, 2021

Krevnik said:
ARM64 is also supposedly purpose built for some of the OOO stuff Apple is now doing, but ARM32/Thumb isn’t. [..] There’s still also the fact that Apple has been putting together a very good engineering team that has been able to chart this course, and make it happen. One thing I see there is that Apple had to be thinking at least 5 years ahead in their planning. A lot of planning I see in the companies I’ve worked for are quarter-focused to the point that they could never invest in such ”long-pole” efforts such as this. But a lack of those long-term investments are ultimately what lead to stagnation when short-term projects stop producing results.

An interesting opinion I keep hearing here and there (unfortunately, with no factual proof) is that ARM64 has been designed with quite a bit of Apple's input. Allegedly folks at Apple had an idea how to make a fast and energy efficient CPU and they have approached ARM with the task of cleaning up the ISA to make this CPU possible. Basically, according to this story Apple started designing their architecture before the ARM64 ISA was finalized. I don't think this is very far fetched either, because they managed to deliver first mobile 64-bit device almost two years before anyone else. A7 has caught everybody by quite a surprise.

pshufd · Jan 12, 2021

leman said:
True, but all modern CPUs are out-of-order designs with reorder buffers, prefetching, branch prediction etc... what stops from other chips designers from "pulling an Apple" and making a very wide core? Apple CPU designs are nothing new by now. But even the brand-new ARM Cortex X1 that is targeting "high performance" only has 5 instruction decoders and a 220 something ROB.... and it still can't match the A13. Why didn't ARM decide to make X1 wider?

All I am saying that it's more complex than "give it more instruction decoders and a larger OOO window". There has to be an additional trick to how Apple managed to do it and nobody else can.

Did AMD say that exactly, or did someone from AMD say that they didn't notice any practical benefit going wider for their specific CPU design? I still haven't seen any good reason why a wider x86 design would be impossible if one approaches the topic from a different angle.

And again, I'm aware of this.

Why couldn't they pull an Apple? Maybe nobody thought of it. Until Apple did. There are technical breakthroughs all the time.

Back in 2004, I was building Firefox images and playing around with seeing where I could speed it up. I found out that using SSE2 on JPEG IDCT resulted in a considerable performance improvement. Why hadn't anyone else done this? Vector acceleration was pretty new back then and nobody had tried it.

How did Apple build the iPhone and leave the entire phone industry flatfooted? How did they build the MacBook Air and leave the industry flatfooted?

Sometimes you get really lucky. Sometimes you make your own luck.

thadoggfather · Jan 12, 2021

years of battery life being at a standstill, thermals, and performance -- I think its safe to say they have held Apple back for years.

Now Apple can only blame Apple for delays in their silicon. But they set their own roadmap which is exciting

leman · Jan 12, 2021

pshufd said:
Why couldn't they pull an Apple? Maybe nobody thought of it. Until Apple did. There are technical breakthroughs all the time.

Apple's basic recipe (very wide cores) was already apparent in 2014, as was their advantage over the usual ARM designs. But somehow, 6 years later, ARM's newest high-performance design, Cortex-X1 is narrower than now ancient Apple A7. So they either deliberately decided agains using a wider design, or simply couldn't figure out how to do it. I tend to believe the later.

Anyway, just to make it clear — I am not trying to argue for the sake of arguing. I am genuinely curious about how Apple pulled it off. I don't understand anything about practical hardware design, but I think I have a fairly good grasp of algorithmic complexity of the task, and it does seem like Apple is the only company so far that has managed to break through some sort of OOO scalability plateau. They must be doing something differently from the others and I don't believe that it's a simple trick.

pshufd · Jan 12, 2021

leman said:
Apple's basic recipe (very wide cores) was already apparent in 2014, as was their advantage over the usual ARM designs. But somehow, 6 years later, ARM's newest high-performance design, Cortex-X1 is narrower than now ancient Apple A7. So they either deliberately decided agains using a wider design, or simply couldn't figure out how to do it. I tend to believe the later.

Anyway, just to make it clear — I am not trying to argue for the sake of arguing. I am genuinely curious about how Apple pulled it off. I don't understand anything about practical hardware design, but I think I have a fairly good grasp of algorithmic complexity of the task, and it does seem like Apple is the only company so far that has managed to break through some sort of OOO scalability plateau. They must be doing something differently from the others and I don't believe that it's a simple trick.

I don't think that it's simple either. I posted a research paper from about 2012 or 2013 that did a lot of testing with x86 and ARM chips on the market and they concluded that there isn't a real difference. And then six years later Apple releases the M1.

Think about Materialized Views - not a simple technology - actually very complex. But someone thought it up (I know the guy) and implemented and now it's a standard technology in databases. I would also like to know more but I'm fine if Apple chooses not to reveal everything that they did or how they did it. You can't keep anything a secret over the long run these days. I don't know if there are any hints in patents either. I am holding 3,600 shares of AAPL, though, so I own a piece of the technology. And I'm sure that Apple isn't standing still.

TynH · Jan 12, 2021

pshufd said:
There's another explanation at

- this one is spoken instead of written and it has a few diagrams which might make it a bit easier to understand but it may be that it's still difficult for the non-specialist to understand.

Only just read your post, great video indeed! It did, among other things, remind me of what Hermann Hauser said ten years ago:

"The reason why ARM is going to kill the microprocessor is not because Intel will not eventually produce an Atom [Intel's low-power microprocessor] that might be as good as an ARM, but because Intel has the wrong business model," said Dr. Hauser. "People in the mobile phone architecture do not buy microprocessors. So if you sell microprocessors you have the wrong model. They license them. So it's not Intel vs. ARM, it is Intel vs. every single semiconductor company in the world."

WSJ

jdb8167 · Jan 12, 2021

leman said:
True, but all modern CPUs are out-of-order designs with reorder buffers, prefetching, branch prediction etc... what stops from other chips designers from "pulling an Apple" and making a very wide core? Apple CPU designs are nothing new by now. But even the brand-new ARM Cortex X1 that is targeting "high performance" only has 5 instruction decoders and a 220 something ROB.... and it still can't match the A13. Why didn't ARM decide to make X1 wider?

All I am saying that it's more complex than "give it more instruction decoders and a larger OOO window". There has to be an additional trick to how Apple managed to do it and nobody else can.

Did AMD say that exactly, or did someone from AMD say that they didn't notice any practical benefit going wider for their specific CPU design? I still haven't seen any good reason why a wider x86 design would be impossible if one approaches the topic from a different angle.

I doubt neither Intel nor AMD is ever going to be explicit on their decision making process but I think the assumption is that Intel and AMD have very smart engineers making these decisions. If a wider decode was worth the transistor budget I expect that they would do it. The fact that they haven't is a pretty good indication that 4-wide is about as good as it is going to get.

The problem is probably a basic computer science issue of cascading complexity. An x86_64 instruction can be anywhere from 1 byte to 15 bytes wide. When you start the decode stream, you have no idea how long the instruction will be. You have to finish the decode to find out. So each decoder has to start somewhere in the middle of another decoder's stream. You start to see the problem as you add decoders, the problem gets more and more complex. There is no easy way to know which decoder will reach the end of an instruction. So interrupting a decoder and restarting at the next probable location has to happen across all available decoders. The combinations start to get really ugly. And that is just part of the problem. Now add in branch prediction, register renaming, instruction reordering, and other architectural issues and the complexity probably quickly gets out of control.

Now compare what the M1 has to do. Each instruction is exactly 4 bytes wide. Each decoder can just start with an offset with no ambiguity. This has to be tremendously simplifying. We've reached the limit of my technical knowledge but I think it isn't hard to see that the ISA makes a pretty significant difference in the complexity of decoding instructions.

Why does Apple act like Intel is always holding them back?

macrumors 68040

macrumors G4

macrumors G4

macrumors 601

macrumors 6502

macrumors 6502

macrumors member

macrumors G4

macrumors G4

macrumors Core

macrumors G4

macrumors Core

macrumors G4

macrumors G3

macrumors Core

macrumors 601

macrumors G4

macrumors Core

macrumors Core

macrumors G4

macrumors P6

macrumors Core

macrumors G4

macrumors newbie

macrumors 601

Our Staff