Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Krevnik

macrumors 601
Sep 8, 2003
4,101
1,312
You don't need a literal monopoly but you need a big moat. And being one of many ARM chipmakers just ain't it.

It depends a lot on the margins you can command. If everyone in the market is playing a game of chicken with their margins to the point that you can’t invest in any major moves, than sure. But that’s not required to be the case (ex. Apple). And you can fund integration via debt if you think it will give you a proper advantage in the long run, but it is risky.

But I still think you are putting the cart before the horse here. The integration is part of the “big moat”. You do it to make your moat bigger. It’s one of the plays used to claw out larger and larger chunks of a competitive market on its way to a duopoly/monopoly, along with consolidation. But it is certainly easier to do as a conglomerate with many revenue streams to pull on (Samsung and Apple both doing in-house chip work for mobile devices for example), or if you are willing to simply fund everything with debt like Amazon.

Honestly, one of the better arguments for Intel staying out of the mobile ARM space is that the biggest potential customers in the space are either: already vertically integrated with their own chip designs (Apple, Samsung) or prepping to do so (Xiaomi).

Because TSMC is pure-play, they have a symbiotic relationship with their customers. Price-gouging them would be a terrible idea- it would end up harming TSMC itself. They are just as dependent on their customers as the inverse.

I didn’t suggest price gouging, per se, but we do see it across industries where folks down the chain don’t have good alternatives. Qualcomm‘s business terms being a good example.

TSMC doesn’t have to rent seek, but so long as they are aware that customers can’t simply walk to a competitor with a comparable node, they can certainly charge a premium for those nodes, and pocket savings of scale for themselves so long as demand remains high, rather than passing it along to their customers. You don’t get a war chest to fend off competitors by being kind to your business partners.

Creating a monopoly in your supply chain to get economies of scale is not in the long-term interests of anyone, really. Unless you happen to own a large enough share of that monopoly to be able to steer it in some way.

Intel may not have a choice. But they aren't guaranteed any success just for recognizing reality.

I’m not trying to suggest that they will have success. But the simple reality that Intel doesn’t have to only fab their own stuff, and with such a pivot, it doesn’t matter as much if Intel is the primary party in monopoly/duopoly situation, or one of many in a market. The fabs get fed so long as the chips are being made for someone. And with the demand for fabs being high right now, the opportunity is there.

Pivoting could also be a good way for Intel to maintain their position in a shifting server landscape. Laptops are a bit more up in the air.
 
  • Like
Reactions: Melbourne Park

JouniS

macrumors 6502a
Nov 22, 2020
638
399
While that's an extreme example, I've heard of similar things happening at my university's cluster due to causes way easier to predict than the compiler sneaking multithreading in. People who use it aren't necessarily expected to know the vicissitudes of the compiler they're using. Arguably they should, but it's unrealistic in practice as they have other priorities.
Shared systems crash once in a while, but it's not a big deal. System administrators restart the system, fix any issues, and figure out what went wrong. And then someone may learn something new.

People who use HPC clusters usually use them for work. When you do something for living, you are expected to know the tools you use. Software tools are full of hidden traps, and you can often learn to use them properly only by making mistakes first, but you are still supposed to learn them. People who are not interested in learning the tools because their priorities are elsewhere are simply trying to do their jobs badly.
 

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,867
EDIT: Found it!



So 1 microcontroller, but not all sorts. The rest are indeed off the shelf/custom ARM as far as I can tell. But my larger point still stands, sometimes these idiosyncratic decisions are made for “reasons” that you just have to guess at. It made sense to someone: cost, familiarity, some piece of code that only works with this chip - hard to know sometimes
That converter is someone else's chip, not Apple's:


And I misspoke in saying that only Intel would use x86 that way, though I do maintain only Intel (and maybe AMD) are likely to use x86 microcontrollers in a high performance SoC.

See this for the likely history of that V186 core in the HDMI converter chip:


Essentially, the 186 was designed as an embedded microcontroller, got designed into a lot of stuff in the 1980s, and then had an afterlife starting in the 1990s when VAutomation designed a synthesizable Verilog RTL clone of the 186, signed a deal with Intel to license the necessary patents, and licensed their core to others as IP.

It makes sense to see something like this in a DP-to-HDMI converter chip, which might have its own design lineage dating back to when a 186 looked OK. All kinds of weird cores show up in miscellaneous "glue" chips which need a microcontroller. But you wouldn't expect a complex high performance Arm SoC like Apple's designs to be done that way.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,454
1,230
That converter is someone else's chip, not Apple's:


And I misspoke in saying that only Intel would use x86 that way, though I do maintain only Intel (and maybe AMD) are likely to use x86 microcontrollers in a high performance SoC.

See this for the likely history of that V186 core in the HDMI converter chip:


Essentially, the 186 was designed as an embedded microcontroller, got designed into a lot of stuff in the 1980s, and then had an afterlife starting in the 1990s when VAutomation designed a synthesizable Verilog RTL clone of the 186, signed a deal with Intel to license the necessary patents, and licensed their core to others as IP.

It makes sense to see something like this in a DP-to-HDMI converter chip, which might have its own design lineage dating back to when a 186 looked OK. All kinds of weird cores show up in miscellaneous "glue" chips which need a microcontroller. But you wouldn't expect a complex high performance Arm SoC like Apple's designs to be done that way.

I didn’t mean in the SOC itself, just that the ARM based products still has 1 ancillary controller with x86. Your last paragraph is exactly what I’m talking about. All the weird **** that’s in some product is sometimes because well it’s always been there in previous products. That’s what I was trying to get across to the other poster as to “why is something the way it is, doesn’t seem optimal?” Sometimes it’s because of historical reasons and indeed it is isn’t optimal.

In this case, could Apple have swapped it out for something with an ARM based M0 or M3 and maybe save a tiny bit of power? Probably. I would imagine there are such converters on the market. Actually worth swapping out? Evidently not.
 
Last edited:

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,867
I didn’t mean in the SOC itself, just that the ARM based products still has 1 ancillary controller with x86. Your last paragraph is exactly what I’m talking about. All the weird **** that’s in some product is sometimes because well it’s always been there in previous products. That’s what I was trying to get across to the other poster as to “why is something the way it is, doesn’t seem optimal?” Sometimes it’s because of historical reasons and indeed it is isn’t optimal.

In this case, could Apple have swapped it out for something with an ARM based M0 or M3 and maybe save a tiny bit of power? Probably. I would imagine there are such converters on the market. Actually worth swapping out? Evidently not.
Yeah, I misread you, sorry for the confusion!

I have seen vendors switch from their legacy microcontrollers in this kind of chip. ASMedia USB3-to-SATA bridge ICs used to be based on Intel 8051 clone IP cores, but recent ones are now Arm-based.

If I had to guess at what drives these kinds of transitions, it's software development, not power or area. Dinosaur 1970s microcontroller cores should be fine on power and area - they're incredibly low gate count designs. Even 80186 qualifies as simple, since it was 8086 plus a couple new instructions and embedded peripherals. Not the best ISA design ever, but still a simple 1970s style 8/16-bit ISA. It wasn't until 286 that x86 began to be what we know it as today.

The software thing is a real problem, though. 8051 is a weird ISA designed around the assumption that users would mostly be programming it in assembly language. It's not easy to find a fully modern C compiler for it. Toolchain choices are probably drying up for 80186 and other popular choices like Z80 clones too. Switching to Arm or RISC-V unlocks the ability to use off-the-shelf open source toolchains for a number of different high level languages - C, C++, Rust, whatever.
 

BigPotatoLobbyist

macrumors 6502
Dec 25, 2020
301
155
For the "SoC" packages that keep the iGPU that won't help them.


Intel-Alder-Lake-Dies-768x410.jpg




The 6+0 saves space ( 53) but it isn't really saving a competitive iGPU addition worth of space. If not bringing up the iGPU on the mobile products then building a "dinosaur". Even in the bulk of mainstream desktop sales it is a shrinking pie. But the primary purpose of the E cores is to put the die size on a die ( 8 core , backported Gen 11 ... has some of the same problems as Gen 12 as pretty likely Intel initially imagined having something smaller available but had to 'settle'. )

If Intel wanted to do a competitor to the mainstream Ryzen desktop (and low end Threadripper) that was either completely GPU less ( or "stuck in time EU core allocation" ... smaller and smaller each process shrink iteration) then that is path , but probably detached from the primary markets that Apple is going after.

Intel moving into being a discrete GPU seller. Yeah.. probably right that they probably will want some "CPU" SKU product that sells more dGPUs.

versus what Apple is doing with die space allocation.

Die-Sizes.jpg



The 8+8+iGPU is in similar ballpark as a Pro in size but graphics performance not even close.

The mid-upper range , "box with slots" workstation the iGPU less allocation of max mid-size "E" cores would get traction. That actually make some sense as some of the commentary from Intel's Hot Chip session on Xeon SP Gen 4 (Sapphire Ridge) said that going to move away from some desktop benchmarks as guides for where server product was going.




If trying to keep up with Apple then it is probably going to be a 'fail'. That because it isn't just P cores that Apple is focusing on. Going relatively high multiple E cores means likely loose on the GPU front. QuickSync used to be the leading video de/encoder. That's slipping through their fingers also with "more E cores".


70% of Intel's 'client computing' business is selling laptop processors. More E cores isn't going to save their laptop business.
I didn't say this would help save them, and importantly, part of the speculated products I mentioned (after Raptor Lake) will realize density gains via new process nodes starting with Meteor Lake. I do tend to think the E-cores reach a point of diminishing returns due to the limitations of inter-CPU core software thread parallelism, whereas more die area devoted to larger iGPU's would be of use for many and in a much more linear manner.

Luckily, they are doing both. Look up Meteor Lake and Arrow Lake on their Forveros packaging tech. In 2023 and 2024 with both of those (14 and 15th gen) they will have SOC's where:
I) The IO/main base fabric is fabricated on TSMC N3
II) The GPU on TSMC N3, with up to 192 EU's in 2023 with Meteor Lake and 384 with Arrow Lake
III) The CPU is fabricated on Intel 4 and Intel 3, with increased use of high-density libraries reportedly, probably for smaller E cores or relatively smaller P-cores depending on the IP block.

It's not clear what your point really was in this reply other than to point out that, yes, adding E-cores indefinitely will have diminishing returns for most general purpose compute — which is rather obvious in these corners. Also, sure, it's true their GPU area is rather large what with the # of EU's/cores standard even today, and yet are impeded by poor software and bad architectures, (Vega 8 is frankly superior to 96EU Xe for many games, and that's Vega) but both of those deficiencies will change in the future and they may widen the memory bus width in 2023/2024.


None of this solves the real problem, which is just X86/X64. Still, I've no issue with kewl Intel packaging tech, denser libraries or new Intel processes making use of EUV patterning vs DUV today (which will improve yield especially in addition to ASML-distributed pelicles), presumably wider memory bus widths, or humongous iGPU's, of course.

Edited, just for for clarity and grammar's sake at 7/8:47 AM.
 
Last edited:

BigPotatoLobbyist

macrumors 6502
Dec 25, 2020
301
155
Citation very much needed. Essentially nobody but Intel uses x86 for deeply embedded microcontrollers (and not even Intel always does it, they've used other architectures too).

"Known familiar solution that works" doesn't solve the real problems here. There's major patent issues, and unlike ARM or RV, there's no good standards document to refer to when implementing x86 - the whole x86 ecosystem is very ad hoc.

And really, who inside Apple would find x86 so compellingly familiar that they'd want to use it in Apple Silicon despite all the downsides? The low-friction low-cost high-familiarity option for Apple would be a Cortex-M0, or an in-house equivalent.
Funny enough Intel actually use RISC-V now for the scheduling microcontroller that aids Thread Director. Nvidia are known (I mean that they state as much, etc) to utilize RISC-V for the little CPU/management GPU node in use for their GPGPU wares I think, and for security in some weird respect. Google does the same with Tensor's M2 chip being built off of RISC-V for privacy-related compute.

I suspect this is what Apple's "high-performance RISC-V engineer" job post was actually about. Sure, Apple always have failsafes and a plan B I imagine should ARM's ISA become limiting in the Very Long Run, but realistically they probably want to drop licensing fees and have greater avenues for customization in future microcontrollers or security chips, or IOT SOC's.
 
  • Like
Reactions: Krevnik and Xiao_Xi

BigPotatoLobbyist

macrumors 6502
Dec 25, 2020
301
155
It is an advantage when you have a monopoly on the dominant ISA and have the wafer volumes to sustain those fabs. If there is a competitive chip market (like ARM) it’s better for everyone to pool their volumes together for one company to fab (eg TSMC) for better economies of scale. This becomes even more important as new nodes get more and more expensive.

As for Intel becoming a foundry, we shall see. They’ve talked about doing this before and screwed all their foundry customers over with the 10nm disaster. Although this time they may have the advantage of Uncle Sam “suggesting” that government contracts include chips fabbed by a trusted foundry- AKA Intel. Plus CHIPS Act $$.
Intel fabs will probably end up more successful than Samsung's. Intel 10NM and 10NM SF were garbage but they've had a pretty notable improvement even on to just "7NM" (density has stayed the same but that's not all that's of concern) with ADL, which was a 10-15% performance per watt improvement.

One thing that's important to realize is they didn't use EUV and made poor material science bets, but it's not as if they haven't corrected for this with their future nodes and recent equipment purchases, or as if Intel 7 is all that bad for what it is. People need to also realize Intel design sloppiness and X86 having hamstrung this further is totally distinct from the fab failures. If anything, Intel fabs were their main sell for their CPU's, the design has always been average or right with AMD IMO, especially discounting for power.

What IFS could and will do for a third-party interested in building an ARM SOC with high-density/lower-voltage libraries (or a greater composition of those libraries for each IP block relative to what Intel's Design Division or e.g. AMD does for CPU's) will leave us with an entirely different picture of power on Intel 3/4, not to mention that those future nodes will actually utilize EUV patterning with ASML-provided pelicles, which is a huge boon to yields and power characteristics or densities.

The fact that even the Gracemont cores show (in that example from ChipsandCheese with 4 of them at 3.1GHz and 13-14 watts of total power with L3 and all, on 7-zip) in principle lower operating voltages and efficient performance in the 2-3GHz range are possible on Intel 7 vs the Golden Cove cores is huge.

While still somewhat mediocre by ARM or Apple performance per watt standards at these power and frequency levels and yes, losing a bit to Zen2 - that chart did show that people ought to be more careful about confusing Intel CPU Design with Intel fab capabilities - Golden Cove and Gracemont are very different in their fabrication, because IFS have the ability to meet differing demands. That their previous work has been exceptionally leakage-prone and minimized use of denser libraries and all (evidently, save for Cannon Lake or some older Atom ****) is irrelevant to the hypothetical capabilities of IFS.

So while they still need a node shrink or two, once they move up with Intel 4 and Intel 3 I would not be surprised if Qualcomm, Nvidia are interested at all, even if for midrange SKU's or just Amazon Graviton stuff.
 
Last edited:

BigPotatoLobbyist

macrumors 6502
Dec 25, 2020
301
155
How much could the PC world improve from a "soft" transition to only x64 (no more 32-bits apps support) rather than a "hard" transition to ARM/RISC-V?
Unclear, X64 is not like ARMV7 to ARMV8 was where the ISA was totally dissimilar so you had ARM implementing area-costly compatibility IP (or Apple too for a time) and they still do on the A710, so this was an obvious cost for ARM and suggests that with say, an A720 that dropped 32-bit compatibility, we may see a bit more throughput or a slightly wider, deeper core while retaining the A7X's wonderful PPA (performance per area) and performance per watt.

There's not really an obvious cost to having software support via a binary for 32-bit users that 64-bit OS users can utilize per se. It's not that a 64-bit binary or package won't ceteris paribus be drastically more performant especially for memory-intensive operation or swap - it will, it's just not clear to me where the actual direct drag on performance is for someone with a 64-bit X64 CPU running a 64-bit OS, and a package compiled for 64-bit operating systems and hardware in leaving 32-bit compatible binaries or operating stacks elsewhere. May not be the best person to ask on this, though, and it's 6/7AM my time so I'm not really on all cylinders here.


What would be pretty nice is shifting from 4K -> 16K page file sizes for Windows on X64 or on Aarch64. Would take some effort of course and coordination with hardware developers too (Intel, AMD, ARM's reference cores) but there are real TLB/cache benefits here IMO
 
  • Like
Reactions: Xiao_Xi

BigPotatoLobbyist

macrumors 6502
Dec 25, 2020
301
155
For the "SoC" packages that keep the iGPU that won't help them.


Intel-Alder-Lake-Dies-768x410.jpg




The 6+0 saves space ( 53) but it isn't really saving a competitive iGPU addition worth of space. If not bringing up the iGPU on the mobile products then building a "dinosaur". Even in the bulk of mainstream desktop sales it is a shrinking pie. But the primary purpose of the E cores is to put the die size on a die ( 8 core , backported Gen 11 ... has some of the same problems as Gen 12 as pretty likely Intel initially imagined having something smaller available but had to 'settle'. )

If Intel wanted to do a competitor to the mainstream Ryzen desktop (and low end Threadripper) that was either completely GPU less ( or "stuck in time EU core allocation" ... smaller and smaller each process shrink iteration) then that is path , but probably detached from the primary markets that Apple is going after.

Intel moving into being a discrete GPU seller. Yeah.. probably right that they probably will want some "CPU" SKU product that sells more dGPUs.

versus what Apple is doing with die space allocation.

Die-Sizes.jpg



The 8+8+iGPU is in similar ballpark as a Pro in size but graphics performance not even close.

The mid-upper range , "box with slots" workstation the iGPU less allocation of max mid-size "E" cores would get traction. That actually make some sense as some of the commentary from Intel's Hot Chip session on Xeon SP Gen 4 (Sapphire Ridge) said that going to move away from some desktop benchmarks as guides for where server product was going.




If trying to keep up with Apple then it is probably going to be a 'fail'. That because it isn't just P cores that Apple is focusing on. Going relatively high multiple E cores means likely loose on the GPU front. QuickSync used to be the leading video de/encoder. That's slipping through their fingers also with "more E cores".


70% of Intel's 'client computing' business is selling laptop processors. More E cores isn't going to save their laptop business.
I concur, of course, that Apple's decision to not only devote area to a larger standard (or upgraded) iGPU but also to things like the wonderfully efficient AMX co-processors that can be utilized via Apple's Linear Algebra library, or the massive ANE, or the encoding capabilities are at least with the M1 Pro and M1 Max *probably* smarter choices in our time for now vs things like 8P cores and 16/24 E cores, sure.

But I think I may end up actually preferring Intel's 2 or 4 P core & 8 E-core approac or even AMD's Zen 4 Mobile with 6, 8, 16 medium-sized "big" cores on TSMC N5 (albeit "optimized" a la suffering from worse power characteristics due to being designed for higher clock rates to compensate for lower IPC) to Apple's 4/4 or 8/2 approach given the likely future power and performance trajectories in mind.

Moreover, taking the price into mind, A Zen 4 8-Core system with a 15-25% IPC gain and similar in the performance per watt department if not a bit more would be a huge win, similar for Intel Meteor Lake CPU's on Intel 4 with a new uarch, 15-20% performance per watt improvement and 2+8 configurations that will be found in abundance on sub-1K USD SKUs.

Maybe another way to put it, given Apple's margin concern, I would like it if Apple's Icestorms or Blizzards were closer to the A710 and A78 than they currently are, and they threw more into laptops or desktops at the expense of the bigs, and even axed big cores on the phones, which are currently just too much for the utterly trash-tier thermal design of the modern iPhone, even keeping race to idle in mind. Maybe not currently RE: M-X SOC's, but in a few years I could see this being beneficial.

Currently the littles are about A76 or A77 *maybe* (if I recall which I don't in this, will have to check, but they're at least A76-tier for basic integer workloads) level of performance, and they apparently suck for some peculiar Swift code for whatever reason.
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
How much could the PC world improve from a "soft" transition to only x64 (no more 32-bits apps support) rather than a "hard" transition to ARM/RISC-V?

Supporting 32 and 64 would even be fine, as long as all instructions were a multiple of 32 bits wide. (Or even 16 bits.). The biggest problem in designing an x86 cpu is the variable instruction lengths. (The second biggest problem is crazy addressing modes). Getting rid of instructions that access memory (other than load/store) would help some, too.
 

Sydde

macrumors 68030
Aug 17, 2009
2,563
7,061
IOKWARDI
What would be pretty nice is shifting from 4K -> 16K page file sizes for Windows on X64 or on Aarch64. Would take some effort of course and coordination with hardware developers too (Intel, AMD, ARM's reference cores) but there are real TLB/cache benefits here IMO
ARM's AArch64 specifies three page sizes of 4K, 16K or 64K for its MMU, so a strictly implemented ARMv8/9 CPU or SoC should already have this capability. I suspect that Apple's implementation uses only one size, since it is not a general market device, and most likely it is one of the larger granularities, probably 16K since it offers the most flexibility (though 64K is a shallower tree). And, with the ARM architecture, the tree traversal supports block mapping, where a branch node can instead be a block map descriptor: one level 2 entry with 16K paging can indicate a single memory block mapping of 32Mb, which would make OS "wired" memory all that much more efficient to access.
 
Last edited:
  • Like
Reactions: BigPotatoLobbyist

Krevnik

macrumors 601
Sep 8, 2003
4,101
1,312
ARM's AArch64 specifies three page sizes of 4K, 16K or 64K for its MMU, so a strictly implemented ARMv8/9 CPU or SoC should already have this capability. I suspect that Apple's implementation uses only one size, since it is not a general market device, and most likely it is one of the larger granularities, probably 16K since it offers the most flexibility (though 64K is a shallower tree). And, with the ARM architecture, the tree traversal supports block mapping, where a branch node can instead be a block map descriptor: one level 2 entry with 16K paging can indicate a single memory block mapping of 16Mb, which would make OS "wired" memory all that much more efficient to access.

It has been confirmed that Apple uses 16K pages.
 

Sydde

macrumors 68030
Aug 17, 2009
2,563
7,061
IOKWARDI
Supporting 32 and 64 would even be fine, as long as all instructions were a multiple of 32 bits wide. (Or even 16 bits.). The biggest problem in designing an x86 cpu is the variable instruction lengths. (The second biggest problem is crazy addressing modes). Getting rid of instructions that access memory (other than load/store) would help some, too.
If Intel had not been doing rectal self-examination in the mid '80s, they might have had the good sense to create a new ISA for 32-bit CPUs. Even having a greatly constrained coding scheme of variable width 16/32bit instruction codes would have been vastly more efficient for the interpreter than the mess they have now, with only minimal reduction in flexibility. 32-bit programs would have called for different coding (with a separate onboard interpreter for 16-bit code), but losing the byte coding of x86 early on would have made their current job much easier.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,454
1,230
But I think I may end up actually preferring Intel's 2 or 4 P core & 8 E-core approac or even AMD's Zen 4 Mobile with 6, 8, 16 medium-sized "big" cores on TSMC N5 (albeit "optimized" a la suffering from worse power characteristics due to being designed for higher clock rates to compensate for lower IPC) to Apple's 4/4 or 8/2 approach given the likely future power and performance trajectories in mind.

Moreover, taking the price into mind, A Zen 4 8-Core system with a 15-25% IPC gain and similar in the performance per watt department if not a bit more would be a huge win, similar for Intel Meteor Lake CPU's on Intel 4 with a new uarch, 15-20% performance per watt improvement and 2+8 configurations that will be found in abundance on sub-1K USD SKUs.

Maybe another way to put it, given Apple's margin concern, I would like it if Apple's Icestorms or Blizzards were closer to the A710 and A78 than they currently are, and they threw more into laptops or desktops at the expense of the bigs, and even axed big cores on the phones, which are currently just too much for the utterly trash-tier thermal design of the modern iPhone, even keeping race to idle in mind. Maybe not currently RE: M-X SOC's, but in a few years I could see this being beneficial.

Currently the littles are about A76 or A77 *maybe* (if I recall which I don't in this, will have to check, but they're at least A76-tier for basic integer workloads) level of performance, and they apparently suck for some peculiar Swift code for whatever reason.

According to Anandtech an A15 blizzard core has roughly the SpecInt perf of a Dimensity A78 mid (a little lower) with lower floating point performance. Obviously different A76-78 designs will be clocked and configured in different ways by different manufacturers so it depends on which particular design you look at. But overall Apple is getting to your desired performance levels. And of course perf/w is insane for blizzard - uses fewer joules than a A5x for the same task. (As an aside ARM complains that running SpecInt, i.e. an integer heavy workload, is still too heavy for what the A5x cores are meant to handle since they are supposed to only used on light integer threads where their tiny footprint and power usage shines)

However I'm not sure I agree with your thesis wrt what Apple should do. Arm does, sort of, for itself, hence why it went big.Medium.Little (don't remember the actual name, that's not it). But when looking at Apple wrt AMD and Intel for laptops/desktops, the big cores are still quite small and power efficient in comparison. The chips and cheese article you cite in the previous post also mentions that Intel has the Gracemont cores running at over twice their peak work efficiency power levels, but that peak (about 4-5W) is basically where firestorm is already at and at far greater performance than Gracemont even in its current boosted configuration. And if I remember correctly the die area of firestorm isn't anywhere near as big as Golden Cove even accounting for differences in fabrication density. So Apple just doesn't have the same impetus to add a large number of mid-tier cores to its design in the near future. It's big cores are already there in terms of performance, power, and die area. Intel added such mid-tier cores to solve design problems that are specific to them. Maybe they'd help in general and AMD and Apple will adopt a design with large numbers of mid-cores like Intel, but they aren't as necessary. I believe @cmaier floated the possibility of adding true little cores a la A5x cores to the Apple design that really were just focused on light integer heavy workloads which constitute the standard background task. This would probably have the knock-on effect of making the current E-cores more of a mid-tier. However, I don't think the end state of his musings was to then populate the laptop/desktop die with lots of upgraded ice storm/blizzard cores and drop the number of big firestorm/avalanche cores.

I also should say I don't necessarily agree with others (I think @mr_roboto ?) that dropping E-cores entirely in desktop parts is necessarily the best idea either. Apple's approach in their higher tier chips so far seems to be that they keep the cluster small, halving the number of cores, and doubling the clock speed as needed (when the number of background tasks begin to overwhelm the E-cores or they work-steal from the larger cores). This basically says they are primarily there for housekeeping, keeping the big cores free of context switches to background tasks, and that's not a terrible thing to keep a few around for even in desktops since they are pretty small and efficient. The fact that they can also give a small bump to multithreaded throughput is a nice-to-have addition that just cements that keeping them around is only a benefit.

WRT dropping big cores in phones, even many ARM chip designers see the benefit in keeping one or two big cores around - especially things like a fast, fluid user interface does better when you have such a thing and occasionally you just need a fast core available to keep things running smoothly. I agree that Apple's thermal designs have gotten a bit ... throttle-y, but that has more to do with GPUs than the CPU.

You did state that all your prescriptions are something to consider beyond the current Mx designs and maybe for something a few years down the road, I'd additionally caveat that maybe a bit beyond that depending on your definition of "a few". :)
 
Last edited:

BigPotatoLobbyist

macrumors 6502
Dec 25, 2020
301
155
According to Anandtech an A15 blizzard core has roughly the SpecInt perf of a Dimensity A78 mid (a little lower) with lower floating point performance. Obviously different A76-78 designs will be clocked and configured in different ways by different manufacturers so it depends on which one design you look at. So they are getting there. Perf/w though is insane for blizzard - uses fewer joules than a A5x for the same task. (ARM complains that running SpecInt, i.e. an integer heavy workload, is too heavy for what the A5x cores are meant to handle since they are supposed to only used on light integer threads where their tiny footprint and power usage shines)

However I'm not sure I agree with your thesis wrt what Apple should do. Arm does, sort of, for itself, hence why it went big.Medium.Little (don't remember the actual name, that's not it). But when looking at Apple wrt AMD and Intel for laptops/desktops, the big cores are still quite small and power efficient in comparison. The chips and cheese article you cite in the previous post also mentions that Intel has the Gracemont cores running at over twice their peak work efficiency power levels, but that peak (about 4-5W) is basically where firestorm already is at far greater performance. And if I remember correctly the die area of firestorm isn't anywhere near as big as Golden Cove even accounting for differences in fabrication density. So Apple just doesn't have the same impetus to add a large number of mid-tier cores to its design in the near future. It's big cores are already there. Intel added them to solve design problems that are specific to them. Maybe they'd help in general and AMD and Apple will adopt a design with large numbers of mid-cores like Intel, but they aren't as necessary. I believe @cmaier floated the possibility of adding true little cores a la A5x cores to the Apple design that really were just focused on light integer heavy workloads which constitute the standard background task. This would have the knock-on effect of making the current E-cores more of a mid-tier. However, I don't think the end state of his musings was to then populate the laptop/desktop die with lots of upgraded ice storm/blizzard cores and drop the number of big firestorm/avalanche cores.

I also should say I don't necessarily agree with others (I think @mr_roboto ?) that dropping E-cores entirely in desktop parts is necessarily the best idea either. Apple's approach in their higher tier chips so far seems to be that they keep the cluster small, halving the number of cores, and doubling the clock speed as needed (when the number of background tasks begin to overwhelm the E-cores or they work-steal from the larger cores). This basically says they are primarily there for housekeeping, keeping the big cores free of context switches to background tasks, and that's not a terrible thing to keep a few around for even in desktops since they are pretty small and efficient. The fact that they can also give a small bump to multithreaded throughput is a nice-to-have addition that just cements that keeping them around is only a benefit.

WRT dropping big cores in phones, even many ARM chip designers see the benefit in keeping one or two big cores around - especially things like a fast, fluid user interface does better when you have such a thing and occasionally you just need a fast core available to keep things running smoothly. I agree that Apple's thermal designs have gotten a bit ... throttle-y, but that has more to with GPUs than the CPU.

You did state that all your prescriptions are something to consider beyond the current Mx designs and maybe for something a few years down the road, I'd additionally caveat that maybe a bit beyond that depending on your definition of "a few". :)
Am busy at the moment but I will say in quick reply that to be fair current X2 core power consumption is much lower than the A15 Avalanche, like 3-4W vs 5.5.

Also tbf I fully realize Apple's big cores are still miles ahead of Intel's E cores in efficiency, but for peak aggregate throughout controlling on area, depending on how things go it may be nicer if Apple adopted a different strategy at least for the entry-level stuff. Today I think they have nothing to worry about or even in the next 2-4 years minimum (realistically, even with other improvements, they will have the lead in efficient ST which is one of the most important things here).
 

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,867
One thing that's important to realize is they didn't use EUV and made poor material science bets, but it's not as if they haven't corrected for this with their future nodes and recent equipment purchases, or as if Intel 7 is all that bad for what it is. People need to also realize Intel design sloppiness and X86 having hamstrung this further is totally distinct from the fab failures. If anything, Intel fabs were their main sell for their CPU's, the design has always been average or right with AMD IMO, especially discounting for power.
I dunno if I'd go that far. Intel has done some good design work; I'd cite the stretch consisting of Core 2, Nehalem, and finally Sandy Bridge (with boring die shrinks between each new uarch).

If you go further back, the weren't always owners of the best process technology. In the 1990s, their designers shocked all the RISC CPU vendors with the P6 uarch (Pentium Pro / Pentium II). x86 wasn't supposed to be that fast.

The one thing Intel always seemed to lack was long-term focus on low power platforms. That's arguably more on management than the engineers.
 

BigPotatoLobbyist

macrumors 6502
Dec 25, 2020
301
155
ARM's AArch64 specifies three page sizes of 4K, 16K or 64K for its MMU, so a strictly implemented ARMv8/9 CPU or SoC should already have this capability. I suspect that Apple's implementation uses only one size, since it is not a general market device, and most likely it is one of the larger granularities, probably 16K since it offers the most flexibility (though 64K is a shallower tree). And, with the ARM architecture, the tree traversal supports block mapping, where a branch node can instead be a block map descriptor: one level 2 entry with 16K paging can indicate a single memory block mapping of 32Mb, which would make OS "wired" memory all that much more efficient to access.
TIL on the last part. Interesting. Looking forward to more tests with performant WOA CPU's/SOC's but not optimistic that they will implement 16K granules/page file sizes in OS.
 

BigPotatoLobbyist

macrumors 6502
Dec 25, 2020
301
155
I dunno if I'd go that far. Intel has done some good design work; I'd cite the stretch consisting of Core 2, Nehalem, and finally Sandy Bridge (with boring die shrinks between each new uarch).

If you go further back, the weren't always owners of the best process technology. In the 1990s, their designers shocked all the RISC CPU vendors with the P6 uarch (Pentium Pro / Pentium II). x86 wasn't supposed to be that fast.

The one thing Intel always seemed to lack was long-term focus on low power platforms. That's arguably more on management than the engineers.
Okay that's fair tbh, and they still have done impressive work with their relative area on Golden Cove, I can't say that I don't think they'd have one of the more impressive IPC showings were they using ARM or RISC-V tbh.

Maybe with Glenn Hinton among others returning out of retirement for a "high-performance CPU project" we will see them genuinely leapfrog everyone again in this regard, but the last 5-8 years have not been overly impressive imo, and they've sat and watched their dominance erode and ARM grow. Naturally there is a lag effect, e.g. a laptop with 8 X2 ARM cores clocked to 3.3GHz with extra cache would still be preferrable to anything intel ADL in my book owing to power constraints and relatively acceptable ST throughput from such a device, yet ARM (by proxy anyways vis Qualcomm and MediaTek using newer SOC's, as ARM themselves have been lobbying for it for years) haven't made a big non-Apple play for client desktops and laptops, but in principle it's pathetic Intel don't have products competitive with such things like the A710 or X2's.

Thinking back, 2015 strikes me as a fine inflection point I feel comfortable drawing where Apple's chips really came into their own - not to take anything from the 64-bit A7 in 2013, which was extremely impressive in microarchitectural characteristics for it's time of course (I had a 5S and was amazed by the speed. Still miss that phone in a weird way.)
 
  • Like
Reactions: psychicist

leman

macrumors Core
Original poster
Oct 14, 2008
19,522
19,679
I dunno if I'd go that far. Intel has done some good design work; I'd cite the stretch consisting of Core 2, Nehalem, and finally Sandy Bridge (with boring die shrinks between each new uarch).

If you go further back, the weren't always owners of the best process technology. In the 1990s, their designers shocked all the RISC CPU vendors with the P6 uarch (Pentium Pro / Pentium II). x86 wasn't supposed to be that fast.

I mean, if one looks at it that way, P6 has been their golden cow for over 20 years now. Their current arch are still based on P6 if I understand correctly.
 

Sydde

macrumors 68030
Aug 17, 2009
2,563
7,061
IOKWARDI
TIL on the last part. Interesting. Looking forward to more tests with performant WOA CPU's/SOC's but not optimistic that they will implement 16K granules/page file sizes in OS.
Who cares about page size? The Pager and to a lesser extent the memory allocation process. These are comparatively modular components of any OS. No other parts of the system or any utility or application processes are even aware of it. It happens transparently to everyone else in the system, so implementing 16K (or even 64K) paging on an ARM SoC is, while not exactly trivial, not all that big an issue.
 
  • Like
Reactions: Krevnik

BigPotatoLobbyist

macrumors 6502
Dec 25, 2020
301
155
I mean, if one looks at it that way, P6 has been their golden cow for over 20 years now. Their current arch are still based on P6 if I understand correctly.
lol yea (unless I, too, am missing something, but this was the one where they fully adopted the internal translation to microops yes?)
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.