Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

pasamio

macrumors 6502
Jan 22, 2020
356
297
Apple has some key advantages that will be a challenge to best:
  1. Focused ecosystem. Both macOS and processor architectures do not carry long backward compatibility. Both Windows and Intel are saddled with the accompanying complexities resulting in reduced performance.
  2. Vertical integration. Tight coupling of SoC architecture and OS requirements. Apple defines the iOS & macOS capabilities and builds the hardware specifically for those requirements. Processors, systems, compilers, and OS can trade off optimization and functionality.

Above the operating system and their push to remove backwards compatibility, Apple also have highly integrated developer tools that make the platform and architecture transitions easier. Their desktop strategy leverages the existing tooling for ARM that works on their mobile phone platforms. Their integration in addition to work on making the transitions work with technology like Rosetta and Universal binaries give them an advantage over just Intel on it's own (see other posts in this thread on Itanium's failure) or Microsoft with their attempts to move the desktop to ARM (originally no ability to run earlier apps, then 32-bit only with poor performance and now 64-bit). Microsoft however will be able to copy Apple's strategy and adapt but I don't think Intel has that ability. The question is what Microsoft ultimately decides to do.

However, the Intel Windows desktop ecology is still vastly bigger than Apple's macOS business and will stay that way. Apple will just harvest the high-margin part of the business.

I believe there was something like 300 million Windows PC shipments worldwide, which obviously also includes PC's running AMD (around 20% I think). Best as I can find there were 200 million iPhone shipments and 60 million iPad shipments and around 20 million Mac shipments. Apple's device shipments total to 280 million, likely actually eclipsing Intel. The desktop market might be vastly bigger (20 million vs 300 million), Apple leverages the same core operating system functionality between the desktop and mobile devices and soon the same CPU designs. Intel's world is increasingly small when compared to Apple's combined ecosystem.

BTW, I would not be surprised if Apple does not introduce ANY new Intel based devices this year. Their stance of a slow transition was simply a marketing decision to encourage the purchase of the systems still in the pipe.

Of course they won't, they're following the same playbook they did with the move from PowerPC to Intel. They've released the low end devices last year, this year is the mid tier devices and next year will be the Mac Pro move maybe announced with WWDC2022 or perhaps earlier.

Though it wasn't a two year transition, it's closer to a three year transition when you consider the entire set of changes. The release of the Mac Pro in late 2019 gave them a final Intel platform to ship at the Mac Pro level giving them the space to release a new Mac Pro on Apple Silicon in around three years. The release of Catalina removing support for 32-bit applications removed the deadwood and associated the compatibility negativity with Catalina. It meant that anyone who had been lagging on updating from 32-bit that actively maintained an application were forced to update to more modern APIs. Big Sur introduces M1 support and initial devices for developers with the DTK followed by updates to the lower tier SKUs (MacBook, Mac Mini, low end MBP), updates to the mid-tier a year later (iMac, larger screen MBP) and finally releasing an updated Mac Pro, rounding out the three years since the Mac Pro last released and hitting the two year transition.
 

Gigjobs32

macrumors newbie
Apr 5, 2021
2
1
I plan to buy the 16 inch MacBook Pro later on but I would buy that map that comes with an intel chip. I'm not completely sold on the silicon Mac
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
I was under impression that ADD can have at most one memory operand? Regardless, I don’t think that it changes your argument (we still have a dependent load, op and store). Still, I can’t say I get it - you have dependent operations, sure, but modern OOE CPUs will reorder them anyway. If anything, spreading the load and store operations too far apart without knowing the microarchitectural details can be detrimental to performance as the operation might be done earlier than the compiler anticipates. I think its’s almost always better to just leave the things in natural order and let the reordering machinery figure things out - unless you really know the properties of your CPU and are going for some very specific optimizations.
My main point is that there is a locality problem. The CPU may or may not be able to reorder within each ISA op, but even if it can, it’s always better to first break down all ISA ops into microops and reorder them. Even if an x86 cpu does that, the window it can see at once is small. It can’t look at the entire instruction stream at once and reorder it. A compiler can. Think of it this way - how many incoming ISA instructions do you think an x86 CPU even knows about at once? How many can it look at simultaneously to figure out if there’s an instruction in the queue that does not have any dependencies on prior instructions? Not a lot. It looks at however many fit in its buffer, which is tiny compared to the instruction cache. And then consider that you could have conditional branches, etc. And the scheduling works on the microops, not ISA ops, level. And you have to decode it before you can determine its dependencies. So you are looking at, say, a handful of ISA instructions that the scheduling unit can choose from (corresponding to, say 16 micropops). So the instruction that could be issued in parallel ALU pipeline #5 or #6 is quite likely to be just out of reach . It’s far better to, at compile time, rearrange the microops, where you can put ops back to back from what would have been far-separated x86 ISA instructions, move conditional branches to the most efficient place, etc.

It’s the principle of locality run amok. Instructions that are near each other in an instruction stream are more likely to depend on each other. This is DOUBLY so in a CISC architecture, where instructions that are near each other (as seen by the scheduler) actually all likely derived from a single ISA instruction (and it’s very common for microops corresponding to a single ISA instruction to have dependencies).

It’s the same problem with hardware. We had a rule for one of our designs - no positive logic. In other words, you could do NAND gates and NOR gates, but not AND gates or OR gates. Some designers hated that. But it was smart. Because an AND gate is a NAND gate plus an inverter. And by disallowing AND gates, human designers, who acted as “logic compilers,” were more likely to get rid of needless inverters that way. Instead of AND followed by OR, maybe it’s NAND followed by NOR plus one inverter someplace, instead of two inverters. (Of course it depends on whether your inputs are available as negative polarity signals, which is usually the case, since most flip-flops can output both polarities for free). And instead of putting the inverter right next to the NAND, you may put an inverter half way in between, to act as a signal buffer and improve speed. These are all optimizations that can happen when you get rid of “complex” gates and only allow “reduced” gates.

CISC instructions are prepackaged subroutines. They are convenient. But they may do work that doesn’t need to be done, the compiler may use them because it is optimizing the wrong thing, and they need to be cracked open before they can be dealt with by the scheduling hardware in the CPU, which is expensive. And the scheduling hardware in *any* CPU is hyper-local, with a small window of visibility into the instruction stream.
 

Gerdi

macrumors 6502
Apr 25, 2020
449
301
Nope, that passage makes no sense. First, it is confused about what the decoder does. The decoder has nothing to do with parallelism. The decoder‘s job (in an x86 machine) is to, among other things, convert the incoming variable-length instructions into fixed length micro ops. It does this by finding the beginning and end of each instruction, and using a state machine and microcode ROM to determine a sequence of simple, fixed-length, instructions.

The passage makes a lot of sense. Not sure if you ever looked into decoders of a fixed length instruction set architecture like ARM. The instruction are all 4 wide and 4byte aligned, making it possible to fetch a large memory block per cycle and feed them literally into parallel decoders. This is just not possible with x64, as you correctly pointed out, because you need to figure out where each instruction starts and ends - which is largely of sequential nature.
Not sure why you claim the decode has nothing to do with parallelism either - you should know better despite having mostly worked on the backend side.
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
The passage makes a lot of sense. Not sure if you ever looked into decoders of a fixed length instruction set architecture like ARM. The instruction are all 4 wide and 4byte aligned, making it possible to fetch a large memory block per cycle and feed them literally into parallel decoders. This is just not possible with x64, as you correctly pointed out, because you need to figure out where each instruction starts and ends - which is largely of sequential nature.

I designed SPARC, PowerPC, MIPS and x86, so I’m familiar with this. The reason the passage makes no sense is that the instructions, as seen by the instruction issue logic, are aligned and constant-width. The job of the instruction decoder is to convert the variable length instructions into fixed length Instructions. And the decoder width is not the same width as the ALU effective width. You can decode more or less instructions than you issue - in fact, you usually set it up that way, because not every instruction is an ALU op. So it is more efficient to decode more instructions than you can simultaneously issue to the ALUs. Yes, on x86 it is more expensive, because the decode logic is more expensive. But the idea that it can’t be done is what I’m pushing back against.
 
  • Like
Reactions: BigMcGuire

cocoua

macrumors 65816
May 19, 2014
1,011
628
madrid, spain
I would like ARM to displace the x86 instruction set. Early in my career I found myself writing x86 assembler. It was a pretty ugly architecture compared to 68000 (in the original Mac & Amiga) or the 6502 (Apple II, Commodore 64 & BBC Micro).

Hopefully Apple inspires Microsoft and PC manufactures to get serious about Windows on ARM. Linux on ARM is already in wide use on Chrome Books, Android phones and the Raspberry Pi and available on Cloud providers like AWS. I think once ARM Macs are common and Docker for the ARM Macs is production ready, more companies will deploy their cloud workloads on ARM VMs and Docker clusters.
Bet your glasses that microsoft is fueling labs to get windows competitive on ARM.

i keep reading in this forums how good are the rayzen and how great is still xx86, pc industry has been shaked for good. X86 is dead.

ARM single core is doubling performance every 2 years. X86 takes 8 years for single core.

just watch A series chips performance graphics over the years. They are rocketing while x86 keeps struggling with improvements and keep burning power as toasters.


Just wait one year. When apple will be released already high end macs and microsoft will come with all this work they are putting in ARM transition.

in mid 2020 tech insiders wuld
Say Intel wasnt scary because their server bussines was more profitable than cosumer, but servers are migrating to ARM too, obviously. ARM is Cheaper in every sense and same performance in its early stages plus is fully customizable, this last point is something nobody talks but its the key.

Intel knows, they are scared to **** and thats why they will are investing tons of money in new facilities for ARM manufacturing.

just face reallity. Unless windows would make universal binaries too, X86 era will end in a few years.

for apple it will be dead as soon they remove the last macintel of the store, as no developer will be interested in check intel compatibility beyond some legacy support
 

Gerdi

macrumors 6502
Apr 25, 2020
449
301
I designed SPARC, PowerPC, MIPS and x86, so I’m familiar with this. The reason the passage makes no sense is that the instructions, as seen by the instruction issue logic, are aligned and constant-width. The job of the instruction decoder is to convert the variable length instructions into fixed length Instructions. And the decoder width is not the same width as the ALU effective width.

This is not the point at all. The point is, if your backend can issue instructions in parallel to the execution units, the decoder needs to keep up with the throughput of the backend.
Therefore it absolutely required for wide superscalar architectures, that you can decode several instructions in a single cycle (possibly pipelined, not literally) - hence the notion of parallel decoding.

That having said, it is very problematic, if you have a variable length instruction set, to decode several instructions in parallel, because the underlying algorithm is just not parallel. It is a very fundamental issue.
 
Last edited:

Bandaman

Cancelled
Aug 28, 2019
2,005
4,091
This is not the point at all. The point is, if your backend can issue instructions in parallel to the execution units, the decoder needs to keep up with the throughput of the backend.
Therefore it absolutely required for wide superscalar architectures, that you can decode several instructions in a single cycle - hence the notion of parallel decoding.

That having said, it is very problematic, if you have a variable length instruction set, to decode several instructions in parallel, because the underlying algorithm is just not parallel. It is a very fundamental issue.
I feel like I'm reading Chinese between you guys, but it's very interesting!
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
This is not the point at all. The point is, if your backend can issue instructions in parallel to the execution units, the decoder needs to keep up with the throughput of the backend.
Therefore it absolutely required for wide superscalar architectures, that you can decode several instructions in a single cycle - hence the notion of parallel decoding.

That having said, it is very problematic, if you have a variable length instruction set, to decode several instructions in parallel, because the underlying algorithm is just not parallel. It is a very fundamental issue.

Sure, but keep in mind that each ISA instruction that is fetched can (and does, more often than not) result in multiple instructions fed to the issue logic. So even if you can only decode two instructions at a time, you probably average at least 4 microops. Yes, increasing that requires a wider bus from the instruction cache and a larger buffer, but it’s certainly nowhere near impossible.
 

Gerdi

macrumors 6502
Apr 25, 2020
449
301
Sure, but keep in mind that each ISA instruction that is fetched can (and does, more often than not) result in multiple instructions fed to the issue logic. So even if you can only decode two instructions at a time, you probably average at least 4 microops.

I do not think this is the case nowadays anymore in x64 land - as the trend is to simpler instructions with just register operands. This is both from the viewpoint of modern extensions like AVX as well as from the viewpoint of what instruction modern compilers chose. Or are you saying that something like "ADD EAX, EBX" results into 4uops?

Yes, increasing that requires a wider bus from the instruction cache and a larger buffer, but it’s certainly nowhere near impossible.

With my argument i assume the wide bus is given. Typically from L1I$ you read with cache-line-size granularity. My argument is more of an algorithmic property - with fixed length instruction set you can decode with O(1) circuit depth with respect to the number of instructions you want to decode - with variable length instruction set it is rather O(n) or O(logn) perhaps* - but certainly not O(1) depth.

That having said, it gets increasingly harder to get the required throughput with variable length instruction sets. I don't think there is a hard limit of 4 - as Anandtech assume. Still i am not disregarding the argument.

*Update: When thinking about it, it looks to be O(n) depth complexity - unless someone has an idea to get this faster.
 
Last edited:

9927036

Cancelled
Nov 12, 2020
472
460
apple doesn’t make you pee in a cup to get a job. apple doesn’t have a ceo who walks around the parking lot taking down license plates of cars that arrive late. apple doesn’t introduce you to the guy who was responsible for the FDIV bug when you go there for a job interview, and then once he passes out of hearing range start bad mouthing him to you. Apple doesn’t have coworkers yelling at each other in the hallway in front of the conference room where the job interview is occurring.

so, yeah, intel is full of jackasses and is a terrible place to work.
LOL Steve Jobs did just that...walked around the parking lot checking license plates. And Apple does have coworkers yelling at each other.
 

9927036

Cancelled
Nov 12, 2020
472
460
1) Early on, Intel was too focused on raw speed, at the expense of power management. This meant that they were never a viable contender to provide CPUs for the iPhone, or subsequent android smartphones as well. Because of this, Intel was effectively shut out of the growing smartphone market, at a time when PC sales were starting to stagnate. They basically locked themselves out of the next big thing.

2) Because of their early financial success in providing cheap x86 processors for servers and data centres, Intel never felt the need to innovate and move beyond x86 instruction. It's classic disruption theory - a company doubles down on that which made it successful in the first place, at the expense of missing the next big thing.

Intel now faces threats on multiple fronts.

1) They have already lost Apple's business.
2) AMD is proving superior to Intel, performance-wise.
3) Cloud providers like Amazon are starting to design and manufacture their own ARM chips, threatening their dominance in this aforementioned area.
4) People are holding on to PCs for longer, which means stagnating or even declining computer sales, which means fewer processors sold. Which is a double whammy because processor design is an extremely capital-intensive process, which is typically offset by huge volume sales. This means that Intel has even less incentive to work on improving their processor designs, if they didn't think the market was there for one.
5) TSMC is eventually opening a fab in the US. And guess what (1) and (3) have in common? They all rely on TSMC to manufacture their ARM chip designs.

The TL;DR is that what started out as Intel's greatest strength, that they integrate both design and manufacturing, is now becoming Intel's biggest weakness, because nobody cares about Intel's chip designs, since companies are now increasingly moving towards designing their own processors that are optimised for very specific tasks (which Intel will never be able to do for any one specific company). And to add insult to injury, they are contracting TSMC to mass-produce these chips for them.

This is basically what Intel has sown.
Thank you. All good points.
 

9927036

Cancelled
Nov 12, 2020
472
460
Unlikely to ever be their business model.

However Qualcomm just bought Nuvia filled with ex-Apple chip designers and are set to release a laptop chip late next year designed by them (current and near future releases are standard ARM cores). While I would imagine they’ll be selling to OEMs, that’s probably your best for the future of building your own performance ARM system. I don’t know if Qualcomm has any interest in doing so but they might if they want to take on AMD and Intel in all sectors.
Actually Nuvia designs datacenter processors. Actually the CPU architect is originally and ironically from Intel and TI, then ARM and lastly Apple.
 

theorist9

macrumors 68040
May 28, 2015
3,881
3,060
I agree with what you (the OP) are saying: Competition is healthy, and benefits consumers.

But, as an Apple user, I would selfishly prefer that Apple continue to stay ahead of Intel.

Let me explain why:

Historically, as a general rule, if you wanted the best performance/$ for hardware (as opposed to software), you would need to buy a PC. Apple's new entry-level M1 models have begun to change that.

It's my hope that, with that change, some that would otherwise buy a PC might now instead buy an Apple. That increases Apple's user base, which benefits me (and all Apple users) through likely future increases in sofware availability. It also means that if you take a job with a company, there's a larger likelihood they'll be an Apple shop instead of a PC shop.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
Actually Nuvia designs datacenter processors. Actually the CPU architect is originally and ironically from Intel and TI, then ARM and lastly Apple.

That was their original goal. Qualcomm bought them and now they will be releasing laptop chips based on their cores instead and future server chips are a “maybe”.
 

EntropyQ3

macrumors 6502a
Mar 20, 2009
718
824
It's my hope that, with that change, some that would otherwise buy a PC might now instead buy an Apple. That increases Apple's user base, which benefits me (and all Apple users) through likely future increases in sofware availability. It also means that if you take a job with a company, there's a larger likelihood they'll be an Apple shop instead of a PC shop.
The question is how many Windows customers who transition, and how fast.

Unfortunately, with the pricing policies Apple has, I can’t see the needle moving a lot when it comes to the overall market. Apple were already strong in the pricing segment they have chosen to stay in, and don’t compete at all in the segments with higher volume.
Also corporations and administrations both prioritize bulk pricing and seamless software compatibility.

That said, by Apples MacOS standards, there is still room for growth of course.
I’ve unsuccessfully tried to figure out just how large (small?) the market for computer sales to private individuals is. It’s not data easily found unfortunately.

It will be interesting to see where Apple goes with the ARM transition. Putting their iPad chip in their low power (and high volume) laptops was taking money lying on the table. Other segments require more effort and are much lower volume. It remains to be seen if Apple prioritises being competitive in these segments
a) at all, (we’ll see what they eventually release)
b) over time.

In consumer space the greater threat to x86 Windows PCs remains iOS/Android, not MacOS-on-AS being somewhat more competitive in the niches where Apple offer anything at all.
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
Do you remember the alpha processor?
alpha was a beast. At exponential i always thought of the alpha as our benchmark - let’s see how close we can get (benchmarking by running windows NT, which back then was available on PowerPC, intel, and alpha, so it made a nice comparison test).
 
  • Like
Reactions: johnsc3

jerryk

macrumors 604
Nov 3, 2011
7,421
4,208
SF Bay Area
I plan to buy the 16 inch MacBook Pro later on but I would buy that map that comes with an intel chip. I'm not completely sold on the silicon Mac
Not a bad machine, but even the M1 Air can beat it at some tasks. Depends on your needs. I sold my 16" 32 GB, 1 TB 16" MBP and moved to the M1 Air. Nice not to listen to fans all the time and have a small, light notebook. Not as fast with graphics, but that is not something I need all the time. Plus I have a Windows deskside system with dual 27 monitors, RTX 2070s, 64 GB of memory, and 4 TB of SSD for graphics and Machine Learning tasks.
 
  • Like
Reactions: BigMcGuire

iHorseHead

macrumors 68000
Jan 1, 2021
1,594
2,003
Yes, competition is a very healthy thing. AMD's Ryzen might be the M1's stiffest competition at the moment.
My Ryzen Windows laptop is faster than MacBook Air M1. No joke
What's so "haha" I don't get it. I tested out from boot times to opening chrome and other apps. Windows is faster. I have a feeling that people here don't use Windows daily and are biased.
 
Last edited:
  • Haha
Reactions: Maconplasma

Joelist

macrumors 6502
Jan 28, 2014
463
373
Illinois
I was a designer on some of AMDs CPUs, and at one point i owned the integer execution units and dispatch. Not much reason x86 can’t go wider, other than the fact that code wouldn’t probably benefit too much from it, due to too much instruction interdependency, I suppose. Microcode is a disadvantage - when you send a complex instruction to the instruction decoder and it replaces it with a sequence of N microops, those microops will tend to have interdependencies which require them to be at least partially sequenced. If, instead, you have Arm, you can let the compiler do some of the work of ordering the instruction stream to take advantage of multiple pipelines, and the instruction stream that reaches the instruction decoder will tend to have fewer clumps of interdependent instructions.
Hi Cmaier!

I found the issue - I typed decoder but meant 8 wide decode block - Apple Silicon cores appear to have both giant caches and be 8 wide. Anandtech has written about it as did a developer. Seeing as Anandtech is the brainchild of Anand Lal Shimpi of the Apple Silicon team I think they have a good knowledge source.

 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.