Will x86 be dead in a decade?

uller6 · Jul 28, 2023

mi7chy said:
People have been predicting that since the 1980s but it's the most stable platform while Apple has switched from MOS -> 68K -> PowerPC -> x64 -> ARM. Only way that's maybe possible if Apple bought out TSMC and Samsung foundries for internal use only and paid all developers to stop developing for x64. Just for shitz and giggles we'll play along and say 'yes'.

View attachment 2207682

You forgot the (short lived) core duo 32 bit CPUs right after the intel transition. They weren’t great - core 2 cpus brought 64 bit about 6 months later.

VivienM · Jul 28, 2023

uller6 said:
You forgot the (short lived) core duo 32 bit CPUs right after the intel transition. They weren’t great - core 2 cpus brought 64 bit about 6 months later.

Apple's timing was weird - x64 (amd64) had been around for a little while on the desktop-class chips, wasn't around on the laptop chips until the Core 2s in the summer/fall of 2006. I guess things were really really hurting in PPC land or they were worried about the Osborne effect, but it just seems odd to think that they could have avoided all of 32-bit Intel, the switch between 32-bit and 64-bit kernels, etc by waiting six months and yet they shipped those initial Core Solo/Duo models.

Crazy thought - I wonder if the software wasn't there for a 64-bit version of Intel OS X in 2006. All the NeXT x86 stuff long predated x64, so did the whole Project Marklar effort which presumably used off the shelf, currently-shipping x86 hardware (you wouldn't exactly be getting engineering samples and whatnot from Intel for a secret project). So... while I suspect the Pentium 4s in the DTKs maybe could do x64 just fine (I can't find the model number, and there were both x64-capable 3.6GHz Prescotts and 32-bit-only chips - edit: I found someone who says it's a 660 which is 64-bit capable), maybe there just wasn't the time to port OS X to x64, at which point... if you're going to ship a 32-bit version anyways, you might as well not wait 7 months for a 64-bit capable CPU. Especially when you know that the original Yonah Core Duos will be enough to shut up all the naysayers.

vladi · Jul 30, 2023

dmccloud said:
Any Intel or AMD part aimed at gaming or content creation is going to have a significantly higher TDW than Apple's SoCs. For example, the i7-13620H has a minimum wattage of 35W, with the base being 45W and max being 115W. Unless you have a massive laptop (17" or larger) with a battery taking up most of the interior, battery life won't be anywhere near what Apple Silicon is running. The Ryzen 7 7745HX has a base TDP of 55W, but is configurable between 45W-75W. So while the AMD part does lead to better battery life than a comparable Intel CPU, it still lags behind Apple's solutions...

Laptops are all about balance, having battery life when you need it to do your twitter and having enough firepower to do your job. Determining how much firepower you need is up to you and your job. I know lots of people with M1 Pro chips and they never ever use one fourth of it in their daily routine but they would go on and claim how battery lasts a full day. Pretty sure 13700HX would give you decent battery life if you use it for office stuff.

Content creation jargon is such a stretch. People who use Word or Final Draft are content creators and they don't need nothing more than $300 i3u PC. Audio recording on the go can be done on any laptop as long as you have a proper IO. Cutting vlog videos and applying luts for social media upload is not serious nor demanding content creation either, but even then your Resolve clutter of UI can't fit properly even on 15" especially if you are into node workflow. It's a waste of time actually unless your pipeline is automated.

Size of your laptop is the most important thing if you are into heavy visual and audio content creation on the go. So 17" is not massive, it's bare minimum much like 16" MBP. Doing desktop publishing such as magazine design and layout is a super serious content creation and it's efficiency relies on screen estate to go track the layout consistency and hunt down errors. Time you would be wasting trying to do that on a 13 or even 15 laptop is huge. Music arrangement (not mixing cause you can't mix **** on headphones) inside any DAW requires a lot of screen estate. Node compositing such as Nuke requires lots of screen estate as well. Freakin Rhino requires tons of screen estate. Rendering 3D requires you to have enough screen estate to fit both preview and material node graph window. Those tasks are waste of time on small laptops.

Anyway if you are into serious content creation you would want your laptop to be plugged in all the time so TDW doesn't really matter. I can max out M1 Max MBP with my pipeline and kill its battery within hour and half from 100%. It's an extreme rare case but it's a real life scenario.

But we completely missed the point here. My point was that no ARM soc designer besides Apple will be able to threaten x86 anytime soon cause they will not have the performance, only battery efficiency.

mr_roboto · Jul 31, 2023

vladi said:
Laptops are all about balance, having battery life when you need it to do your twitter and having enough firepower to do your job. Determining how much firepower you need is up to you and your job. I know lots of people with M1 Pro chips and they never ever use one fourth of it in their daily routine but they would go on and claim how battery lasts a full day. Pretty sure 13700HX would give you decent battery life if you use it for office stuff.

It won't match Apple's battery life. The reason why people claim Apple's battery lasts the whole day isn't just Apple hype, it's the actually revolutionary performance per watt of Apple's CPU cores at their maximum performance state. Apple's P cores use at most about 5 watts each, Intel's can consume over 20 watts each. On an Intel chip, even light loads consisting of 1 to 2 intermittently running active threads (office stuff, or web browsing) frequently spike CPU power use to 40W, but the same loads will never break 10W on Apple Silicon. This makes a big difference for battery life and fan noise. Light loads don't ever even turn M1/M2 MBP fans on.

(This difference gets even more stark when some of the light load can run on Apple's E cores, which top out at less than 0.5W each. They are astonishingly power efficient.)

dmccloud · Aug 1, 2023

vladi said:
Laptops are all about balance, having battery life when you need it to do your twitter and having enough firepower to do your job. Determining how much firepower you need is up to you and your job. I know lots of people with M1 Pro chips and they never ever use one fourth of it in their daily routine but they would go on and claim how battery lasts a full day. Pretty sure 13700HX would give you decent battery life if you use it for office stuff.

Content creation jargon is such a stretch. People who use Word or Final Draft are content creators and they don't need nothing more than $300 i3u PC. Audio recording on the go can be done on any laptop as long as you have a proper IO. Cutting vlog videos and applying luts for social media upload is not serious nor demanding content creation either, but even then your Resolve clutter of UI can't fit properly even on 15" especially if you are into node workflow. It's a waste of time actually unless your pipeline is automated.

Poor comparisons and expansion of the term "content creator" aside, you seriously underestimate the resources nneeded for some of these "vlog videos" and luts. With a lot of content creators using 4K+ cameras for their vlogs, you need more than a $300 laptop with 4-8GB RAM to even start editing properly. Hell, I can tell you from personal experience that once you get into multiple 4k stream editing 8GB becomes a massive bottleneck. But go ahead and gatekeep based off your mistaken assumptions.

The 13700HX can not drop down into the low-wattage states that Apple Silicon can. According to Intel's spec sheet, minimum wattage is 45W, the base wattage is 55W. As I'm typing this, my M2 Max is only pulling 4.4W (while downloading some video assets I need for a project), so there's no way that Intel CPU could even approach the battery life on a Mac unless it was a massive laptop that was 90% battery.

vladi said:
Size of your laptop is the most important thing if you are into heavy visual and audio content creation on the go. So 17" is not massive, it's bare minimum much like 16" MBP. Doing desktop publishing such as magazine design and layout is a super serious content creation and it's efficiency relies on screen estate to go track the layout consistency and hunt down errors. Time you would be wasting trying to do that on a 13 or even 15 laptop is huge. Music arrangement (not mixing cause you can't mix **** on headphones) inside any DAW requires a lot of screen estate. Node compositing such as Nuke requires lots of screen estate as well. Freakin Rhino requires tons of screen estate. Rendering 3D requires you to have enough screen estate to fit both preview and material node graph window. Those tasks are waste of time on small laptops.

There are a lot of people who will use a 13"-15" display to do the very work you claim can't be done. It's a matter of personal preference rather than some arbitrary minimum requirement. When desktop publishing first became a thing, a LOT of people were doing so on displays that were smaller than 13". Even the Radius full-page display (which was the top of the line for desktop publishing at the time was only a 15" display, and at a comically low resolution by today's standards...

leman · Aug 1, 2023

dmccloud said:
The 13700HX can not drop down into the low-wattage states that Apple Silicon can.

Why would you say that? Of course Intel CPUs have low power states (even if their average power consumption is higher than Apples).

dmccloud said:
According to Intel's spec sheet, minimum wattage is 45W, the base wattage is 55W.

That refers to power draw under prolonged sustained operation with all CPU cores active and running at base frequency. This doesn’t mean this is the lowest power the CPU can consume. On idle, these processors only use 2-3 watts, on low-intensity workloads dominated by idle state the average power consumption will be under 15-20 watts. Still a far cry from Apple, who can do it under 2 watts average power but not as bad as what you describe.

And there are plenty of Intel laptops with battery life of 10 or more hours, usually not with the fastest processors around though (Intel really sacrifices power efficiency at high end).

vanc · Aug 1, 2023

x86 based CPUs dominate data center market, and HPC.

In the current top 10 fastest super computers, 6/10 are x86 based, one is ARM based (Fugaku), and one is custom RISC CPU from China, and the other 2 are IBM PowerPC based.

Will x86 be dead in a decade? Absolute no. Apple ditched PowerPC (last one was Power Mac G5 from 2005) over 17 years ago, and PowerPC based CPUs are still kicking.

leman · Aug 2, 2023

vanc said:
Apple ditched PowerPC (last one was Power Mac G5 from 2005) over 17 years ago, and PowerPC based CPUs are still kicking.

More on life support than kicking, but yeah…

Data center is actually where I think x86 will be losing the most. Mainstream HPC becomes increasingly about ML and parallel abstractions, this is where custom hardware and massive data-parallel interfaces will reign supreme. Of course, x86 will stay relevant for decades in some mission critical niche domains where legacy software support is more important than cost or performance.

vladi · Aug 2, 2023

dmccloud said:
Poor comparisons and expansion of the term "content creator" aside, you seriously underestimate the resources nneeded for some of these "vlog videos" and luts. With a lot of content creators using 4K+ cameras for their vlogs, you need more than a $300 laptop with 4-8GB RAM to even start editing properly. Hell, I can tell you from personal experience that once you get into multiple 4k stream editing 8GB becomes a massive bottleneck. But go ahead and gatekeep based off your mistaken assumptions.

The 13700HX can not drop down into the low-wattage states that Apple Silicon can. According to Intel's spec sheet, minimum wattage is 45W, the base wattage is 55W. As I'm typing this, my M2 Max is only pulling 4.4W (while downloading some video assets I need for a project), so there's no way that Intel CPU could even approach the battery life on a Mac unless it was a massive laptop that was 90% battery.

There are a lot of people who will use a 13"-15" display to do the very work you claim can't be done. It's a matter of personal preference rather than some arbitrary minimum requirement. When desktop publishing first became a thing, a LOT of people were doing so on displays that were smaller than 13". Even the Radius full-page display (which was the top of the line for desktop publishing at the time was only a 15" display, and at a comically low resolution by today's standards...

View attachment 2240216

Sure everything can be done on anything. You can come to my shop anytime and I will fire up 1992 LCII for you so you can play with Photoshop 2. I'm pretty sure you can retouch photos in the same manner as you would in Photoshop CC 2022 and you would have a blast doing it on an oldschool computer. But if that was some paying gig I'm not sure it would be as fun anymore

I've helped out tons of "look@me while Im talking" vloggers to startup their own business. I can tell you that most of them only do the rinse and repeat of the cookie cutter I've set them up to. Same cuts, same framing, heck even same color sweaters and shirts they wear

I even had to explain them how to work their cameras and get most out of it. Usually they just come at like Oh Vlad I bought this Sony awesome ZV-1 lets shoot something. And then I would be like why did you buy ZV-1 II instead of original ZV-1 and how are you so sure it's awesome? .. Blank stare.. Then you have to explain to them that they wasted $200 which they could have used to buy some used lights cause lighting is THE MOST IMPORTANT thing when you film anything! Most of them don't even know how to deliver the message once they start talking so you have to be with them all the time. Huge chunk of them dropped out after six months or so because vlogging takes time, not to shoot or edit but to think through whats next you are going to say. You need a freakin writer a lot more than you need a director or an editor.

OK I've ranted enough. Dont take vlog noobs even if they throw money at you

You will not get paid properly unless you enjoy being in charge and putting a stamp on their content.

When it comes to editing vlgos there are some who even film in 6K using REDs to be future proof. And what do they film? They film themselves talking behind some amateur decorated wall or desk most of the time. How does it look? Marginally better compared to ZV-1. If you are regular viewer you would have no clue they've shooting this on $40K loaded camera.

kasakka · Aug 4, 2023

vladi said:
But we completely missed the point here. My point was that no ARM soc designer besides Apple will be able to threaten x86 anytime soon cause they will not have the performance, only battery efficiency.

Totally agree. While Apple's chips are fantastic for laptops, they just don't scale to desktop needs in the same way. While they still do extremely well for power efficiency, in a desktop system x86 still reigns supreme when power efficiency does not matter that much.

On a desktop system I could look at e.g AMD vs Intel and see that AMD is more power efficient, but I still went Intel last time I upgraded my PC because it was a more cost effective solution thanks to AM5 boards being very expensive back then. The extra power use was not a factor in this scenario.

Maybe in the future it will be different if Apple is able to pivot to a separate design for their desktop systems, one that sacrifices some power efficiency for more performance.

theluggage · Aug 4, 2023

kasakka said:
Totally agree. While Apple's chips are fantastic for laptops, they just don't scale to desktop needs in the same way.

Most of the limitations of Apple Silicon on higher-end desktops (particularly the 2023 Mac Pro) are down to Apple's choice to double down on integrated GPU and on-package 'unified' memory - which makes perfect sense for everything up to MacBook Pros and the Studio Max. Beyond that, the raw CPU power of the M2 Ultra is pretty good, but the wheels come off because a large segment of "workstation users" need/want/expect > 512GB RAM with ECC and (multiple) discrete GPUs (preferably NVIDIA) and many applications at that level are bottlenecked by GPU and RAM rather than CPU. Its got nothing to do with the ARM ISA.

There's no reason why Apple (or anyone else - even Intel) couldn't make a "more traditional" workstation class ARM CPU with external RAM and oodles of PCIe bandwidth for GPUs other than the investment - investment that doesn't make sense for Apple because they have barely had a horse in the workstation race since about 2010, just a shrinking pool of customers committed to a MacOS workflow. Apple silicon works great for the MacBooks and iPads that make Apple's money. For better or worse, Apple are more worried about having a chip they can build into their goggles and still get reasonable battery life from than flogging the dead horse that is the Mac Pro.

ARM is never going to be night-and-day more powerful than x86 (Its not like its some amazing new computing paradigm - RISC and CISC have been converging for years) unless you take power consumption into account - but it removes the need for a bunch of complex instruction translation gubbins - which saves silicon as well as saving power. Currently, nobody cares much about the power consumption of a personal workstation CPU (when the GPUs are pulling in 500W a piece) any change there will probably trickle up from the laptop/mobile market or down from the data centre market (where Ampere, Amazon, NVIDIA and others are seriously working on server-grade ARM chips - but Apple walked away from that market when they dropped the XServe). The NVIDIA Grace/Hopper stuff is superficially similar in concept for Apple Silicon, except that it wasn't developed by gluing together laptop chips.

I think the tipping point for x86 will come once enough "legacy" Windows software - the sort where assumptions about the CPU are baked into the source code - has finally died out. The Unix/Linux world has been supporting multiple ISAs for years - the idea of "one true ISA" that makes it OK to write nonportable high-level code is really an artefact of the Wintel monopoly era which has now been broken by the rise of ARM chips for mobile and Linux in the datacentre.

leman · Aug 4, 2023

theluggage said:
Most of the limitations of Apple Silicon on higher-end desktops (particularly the 2023 Mac Pro) are down to Apple's choice to double down on integrated GPU and on-package 'unified' memory - which makes perfect sense for everything up to MacBook Pros and the Studio Max.

I really don’t see it this way. If we talk about the performance of current Apple Silicon, it’s fundamentally limited by low operating clocks, relatively slow caches, monolithic design (can only fit so much compute on single die), as well as lack of scalability (max two dies, no scalability beyond two die packaging). Neither integrated GPU nor UMA are limiting factors, as demonstrated by Nvidia’s Grace/Hopper.

@kasakka is 100% correct here - M1/M2 family are mobile-first chip which are simply not meant for high-end performance in unconstrained thermal scenario. This is at microarchitecture level, not system vision level. From myself I would also add that these chips are direct applications of Apple‘s smartphone technology and all the limitations on desktop stem from this fact. That Apple is even able to deliver a performance-competitive system (M2 Ultra) is a testimony to how advanced iPhone cores are.

theluggage said:
There's no reason why Apple (or anyone else - even Intel) couldn't make a "more traditional" workstation class ARM CPU with external RAM and oodles of PCIe bandwidth for GPUs other than the investment - investment that doesn't make sense for Apple because they have barely had a horse in the workstation race since about 2010, just a shrinking pool of customers committed to a MacOS workflow.

One good reason is that the industry is moving away from this paradigm. As the data set sizes are getting larger, the problem increasingly be ones moving the data between memory and compute. Tighter integration and wider interfaces are the answer. Apple Silicon approaches this from mobile perspective while Nvidia approaches this from the data center perspective, but the recipe is the same (as you note yourself).

I fully agree with you that it’s about what kind of computer one wants to build, and not about x86/ARM/RISC-V etc. We still don’t really know what Apple ultimately wants to build. Will they continue gluing smartphone cores together or will a future microarchitecture bring scalability (faster clocks/caches/memory) needed for a high-end desktop platform? I think M3 family will be a good indicator of their plans, as it should bring more substantial design changes rather than incremental tweaks we saw until now.

JouniS · Aug 4, 2023

leman said:
As the data set sizes are getting larger, the problem increasingly be ones moving the data between memory and compute. Tighter integration and wider interfaces are the answer.

In my experience, memory latency is a much bigger bottleneck than bandwidth when working with large amounts of data. Apple has made two architectural choices that make Apple Silicon worse for this kind of work than traditional x86 hardware.

First, once working set size is larger than about 100 MB, memory latency is higher with Apple Silicon than with the Intel hardware that preceded it. As a consequence, while my M2 Max MBP is much faster at compiling code than my Intel iMac, there is often no noticeable performance difference in running the code.

Second, Apple uses a few large and fast CPU cores instead of many smaller and slower ones, which is the wrong choice for workloads constrained by memory latency. There is also no support for SMT, which can improve CPU utilization with workloads like this. As a consequence, performance per die area is lower with Apple Silicon than with x86 for these workloads.

leman · Aug 4, 2023

JouniS said:
In my experience, memory latency is a much bigger bottleneck than bandwidth when working with large amounts of data. Apple has made two architectural choices that make Apple Silicon worse for this kind of work than traditional x86 hardware.

I would say that if your workload is constrained by memory latency, you are pretty much screwed regardless. Even the fastest RAM has latency in ballpark of 60-70ns, which are hundreds of instructions for any modern performance oriented CPU. You just can’t afford stalls like that without performance tanking to zero.

For inherently parallel problems latency is usually not an issue. GPUs for example can be understood as devices that trade latency for compute througtout, and they have been very successful. So I would first look for bottlenecks elsewhere before I blame RAM latency.

Then again, maybe I am misunderstanding what you mean by “latency”? You seem to refer to RAM latency, but your post appears to be more concerned with latency of dependent operactions.

JouniS said:
First, once working set size is larger than about 100 MB, memory latency is higher with Apple Silicon than with the Intel hardware that preceded it.

The latency is higher simply because LPDDR has higher latency. Also, with Apple Sikicon there are more memory controllers which will also increase latency. As mentioned above, I doubt that this is really your problem, unless of course the nature of your memory accesses is truly unpredictable and the data window is large than the cache. But again, if thst would be the case you’d see very bad performance on any hardware.

JouniS said:
As a consequence, while my M2 Max MBP is much faster at compiling code than my Intel iMac, there is often no noticeable performance difference in running the code.

I would guess your code has long data dependency chains and does not play well with massively wide Apple cores (they need high ILP to get high performance)? Or maybe it’s bound by some other constraint, e.g. SIMD throughout (Apples weakness compared to x86)?

JouniS said:
Second, Apple uses a few large and fast CPU cores instead of many smaller and slower ones, which is the wrong choice for workloads constrained by memory latency. There is also no support for SMT, which can improve CPU utilization with workloads like this. As a consequence, performance per die area is lower with Apple Silicon than with x86 for these workloads.

I don’t really see how using more cores will address the latency problem, wouldn’t bottleneck be with the memory controller/RAM interface anyway? Maybe you mean something like a barrel processor, where threads are switched out to hide latency? GPUs work thst way (kind of, at least). But I am not aware of any SMT implementation thst can do that. If a thread is stalled on memory access, it stays stalled. From what I understand, SMT can usually help to hide the latency of dependent operations, as you can “share” the execution resources between multiple threads, so if one can’t fully saturate the OOO machinery, two hopefully can.

Or do you mean thst with more smaller cores meniry stalls become “cheaper” and you still make more forward progress than if stalling fewer large cores? I’d need to think more about it. The entire thing is made more complicated by the fact that a single OOO core (especially one as large as Apple) can track hundreds of computation chains at once, hiding latency as it makes forward progress.

What’s also interesting is that Intel aparrentoy plans to drop SMT for their future hardware. Apparently they run into issues with performance scaling/power consumption. There is an in-depth discussion about SMT happening right now on RTW (most of it is way beyond me, but still interesting to read): https://www.realworldtech.com/forum/?threadid=212944&curpostid=213086

gpat · Aug 4, 2023

huge_apple_fangirl said:
Rumors of x86’s death have been greatly exaggerated.

Rumors about x86's death are literally older than most of us.

X86 was supposed to be killed in the 80s by the first ARM CPUs, then in the 90s by PPCs, then in the 2000s the Pentiums were running too hot and were seen again as an evolutionary dead-end, then in the 2010s we were supposed to enter the post-PC era, we've been through a pandemic and here we are.

Double-click your 20 year old exe, it will still launch.
AMD 7840U and 7940HS can trade blows with M2 and M2 Pro just fine, win some lose some, give or take.
I'll expect Zen4 to be competitive with M3 on pretty much the same degree.

leman · Aug 4, 2023

gpat said:
AMD 7840U and 7940HS can trade blows with M2 and M2 Pro just fine, win some lose some, give or take.
I'll expect Zen4 to be competitive with M3 on pretty much the same degree.

Only because Apple chooses to run at significantly lower power levels and with fewer cores. The fact that Apples passively cooled tablet achieves essentially the same single-core performance as AMDs performance-oriented SKU should be food for though enough.

gpat · Aug 4, 2023

leman said:
Only because Apple chooses to run at significantly lower power levels and with fewer cores. The fact that Apples passively cooled tablet achieves essentially the same single-core performance as AMDs performance-oriented SKU should be food for though enough.

Moot point. Single core performance is the same across the board for M2, Pro, Max, Ultra.
It's just their design choice, you could design a 1-core SoC for even less power with killer performance on that single core but it wouldn't be a viable alternative to what we have today.

The end products are largely competitive with each other, Apple doesn't have a multi-year advantage over x86 (like fanboys say all the time) and doesn't have a disadvantage either, each one has their strengths, that's all I'm saying, and coincidentally this is also the best scenario for end users like us.

JouniS · Aug 4, 2023

leman said:
For inherently parallel problems latency is usually not an issue. GPUs for example can be understood as devices that trade latency for compute througtout, and they have been very successful. So I would first look for bottlenecks elsewhere before I blame RAM latency.

I work in sequence bioinformatics. A typical job contains some number of independent tasks. There may be tens of tasks, or there could be billions of tasks. A task often iterates a simple loop: retrieve a piece of data, compute something from it, and use the result to determine what to retrieve next. The computational part is often cheap and/or optimized enough that CPU speed does not matter much anymore. As a first approximation, overall performance is the number of logical CPU cores divided by memory latency.

Ironically, you see this more often with optimized code. CPU speed matters more when you don't use the right algorithms and the implementations are not particularly good. But even when the code is good enough, you still have to face the unavoidable sequential bottlenecks in accessing the data. You can make some of the bottlenecks go away by rearranging the data and using more memory. But it's not a good idea yet, because using hundreds of gigabytes of RAM per job is still much more cost-effective than using terabytes per job.

People have been trying to use GPUs to accelerate this for 15+ years, but with limited success. Partly because GPUs still don't have enough RAM. Partly because GPUs are not a good match for a large number of fully independent tasks. And partly because developing GPU algorithms is much harder than developing CPU algorithms. By the time someone finally develops a good GPU algorithm, the task itself has often changed enough to make the algorithm obsolete.

magbarn · Aug 4, 2023

VivienM said:
Apple's timing was weird - x64 (amd64) had been around for a little while on the desktop-class chips, wasn't around on the laptop chips until the Core 2s in the summer/fall of 2006. I guess things were really really hurting in PPC land or they were worried about the Osborne effect, but it just seems odd to think that they could have avoided all of 32-bit Intel, the switch between 32-bit and 64-bit kernels, etc by waiting six months and yet they shipped those initial Core Solo/Duo models.

Crazy thought - I wonder if the software wasn't there for a 64-bit version of Intel OS X in 2006. All the NeXT x86 stuff long predated x64, so did the whole Project Marklar effort which presumably used off the shelf, currently-shipping x86 hardware (you wouldn't exactly be getting engineering samples and whatnot from Intel for a secret project). So... while I suspect the Pentium 4s in the DTKs maybe could do x64 just fine (I can't find the model number, and there were both x64-capable 3.6GHz Prescotts and 32-bit-only chips - edit: I found someone who says it's a 660 which is 64-bit capable), maybe there just wasn't the time to port OS X to x64, at which point... if you're going to ship a 32-bit version anyways, you might as well not wait 7 months for a 64-bit capable CPU. Especially when you know that the original Yonah Core Duos will be enough to shut up all the naysayers.

I bought one of those core duo machines and regretted it until some enterprising individuals found out you can drop in a 64 bit capable core 2 duo with a bios upgrade which was awesome.

leman · Aug 4, 2023

gpat said:
The end products are largely competitive with each other, Apple doesn't have a multi-year advantage over x86 (like fanboys say all the time) and doesn't have a disadvantage either, each one has their strengths, that's all I'm saying, and coincidentally this is also the best scenario for end users like us.

I do believe that when it comes to core technology, Apple does have a multi-year advantage over AMD, but I also agree with you that this is moot point because we are comparing SKUs and not core technology.

leman · Aug 4, 2023

JouniS said:
I work in sequence bioinformatics. A typical job contains some number of independent tasks. There may be tens of tasks, or there could be billions of tasks. A task often iterates a simple loop: retrieve a piece of data, compute something from it, and use the result to determine what to retrieve next. The computational part is often cheap and/or optimized enough that CPU speed does not matter much anymore. As a first approximation, overall performance is the number of logical CPU cores divided by memory latency.

Are you running each task on a separate thread or are you using a loop over the tasks? The former is suboptimal on a deep OOO machine, the latter will be able to schedule dozens or hundreds of tasks on a single core simultaneously, hiding latency.

JouniS said:
People have been trying to use GPUs to accelerate this for 15+ years, but with limited success. Partly because GPUs still don't have enough RAM. Partly because GPUs are not a good match for a large number of fully independent tasks.

Fully independent tasks = each task runs a dufferent program, right? GPUs are built for running multiple instances of the same program (with different data per instance). If your workload doesn’t have this characteristic, using GPU will be indeed tricky.

ArkSingularity · Aug 4, 2023

mr_roboto said:
It won't match Apple's battery life. The reason why people claim Apple's battery lasts the whole day isn't just Apple hype, it's the actually revolutionary performance per watt of Apple's CPU cores at their maximum performance state. Apple's P cores use at most about 5 watts each, Intel's can consume over 20 watts each. On an Intel chip, even light loads consisting of 1 to 2 intermittently running active threads (office stuff, or web browsing) frequently spike CPU power use to 40W, but the same loads will never break 10W on Apple Silicon. This makes a big difference for battery life and fan noise. Light loads don't ever even turn M1/M2 MBP fans on.

(This difference gets even more stark when some of the light load can run on Apple's E cores, which top out at less than 0.5W each. They are astonishingly power efficient.)

What's especially amazing is that those E-cores are full superscalar out-of-order cores, and yet they're running at about 0.5 watts each (which is barely more than a typical Cortex-A53, a fully in-order design with only a small fraction of the performance). Apple's E-cores really perform much more closely with Intel's Skylake-era cores, and they do so at a tiny fraction of the power.

How Apple managed to pull that off I do not know.

leman · Aug 4, 2023

ArkSingularity said:
What's especially amazing is that those E-cores are full superscalar out-of-order cores, and yet they're running at about 0.5 watts each (which is barely more than a typical Cortex-A53, a fully in-order design with only a small fraction of the performance). Apple's E-cores really perform much more closely with Intel's Skylake-era cores, and they do so at a tiny fraction of the power.

How Apple managed to pull that off I do not know.

Probably optimizing the hell out of circuit layouts and data moves, giving up SIMD throughput, and aggressively turning parts of the core off. It’s very impressive though. And it’s almost unbelievable that the E-cores also get their own version of AMX blocks. Talk about dedication! Given the trajectory of E cores since A14 I wouldn’t be surprised if Apple will slowly evolve them into area-efficient throughout cores (like Intels). I don’t think much of this, but that’s where the industry seems to be going and if that’s how you win benchmarks, then so be it…

Speaking of this, AMDs recent success story with Zen 4c comes to mind. Just by targeting lower clocks they managed to reduce the area needed by a Zen4 core by half without sacrificing IPC.

ArkSingularity · Aug 4, 2023

leman said:
Probably optimizing the hell out of circuit layouts and data moves, giving up SIMD throughput, and aggressively turning parts of the core off. It’s very impressive though. And it’s almost unbelievable that the E-cores also get their own version of AMX blocks. Talk about dedication! Given the trajectory of E cores since A14 I wouldn’t be surprised if Apple will slowly evolve them into area-efficient throughout cores (like Intels). I don’t think much of this, but that’s where the industry seems to be going and if that’s how you win benchmarks, then so be it…

Speaking of this, AMDs recent success story with Zen 4c comes to mind. Just by targeting lower clocks they managed to reduce the area needed by a Zen4 core by half without sacrificing IPC.

Zen 4C also reduces the amount of L3 cache if I recall, which probably doesn't have too much of a damaging impact given that AMD is already incredibly generous with its cache. If you aren't going for absolute maximum possible performance, it's a pretty viable way for them to substantially cut core size without causing too large of a dent in performance.

What's interesting is that Apple is also incredibly generous with cache amounts on their chips. They've got to be aggressively power gating some of it somewhere, I don't see any other way they could make these chips sip less than a watt of power on these kinds of workloads. It's insane.

leman · Aug 4, 2023

ArkSingularity said:
What's interesting is that Apple is also incredibly generous with cache amounts on their chips. They've got to be aggressively power gating some of it somewhere, I don't see any other way they could make these chips sip less than a watt of power on these kinds of workloads. It's insane.

I’ve been told by people much more knowledgeable that this is also part of their strategy to reduce power consumption. Apparently getting data from cache costs less energy than a DRAM request. That’s also a reason why some iPhones have more cache than M-series chips.

Will x86 be dead in a decade?

Will the x86 architecture be fully outdated in 10 years

Yes

No

Possibly

macrumors 65816

macrumors 6502a

macrumors 65816

macrumors 6502a

macrumors 68040

macrumors Core

macrumors 6502a

macrumors Core

macrumors 65816

macrumors 68020

macrumors G3

macrumors Core

macrumors 6502a

macrumors Core

macrumors 68000

macrumors Core

macrumors 68000

macrumors 6502a

macrumors 68040

macrumors Core

macrumors Core

macrumors 6502a

macrumors Core

macrumors 6502a

macrumors Core

Our Staff