Emulated 12900K for mobile has better performance with low power consumption like M1 Max?

JMacHack · Nov 9, 2021

thedocbwarren said:
Agreed, it seems it's all those posts are about. Or some sore of buyer's remorse or something. End of the day they all sound ridiculous and are irrelevant.

hagjohn said:
Considering it is irrelevant now that intel is no longer on most Macs now. It is all about sowing discontent.

I believe it’s astroturfing. Try asking them to refer to intel exexcs as clowns or morons, or Intel as ******* or incompetent. Usually they’re barred from shittalking Intel.

Gerdi · Nov 9, 2021

Few words regarding Cinebench. It uses the Embree Rendering library - i library developed by Intel. There are only AVX code paths available. If you want to compile Embree for ARM64 you have 2 options:
1) Use the default C implementation without any SIMD instructions
2) Use the AVX-to-NEON wrapper - this is a static wrapper, which maps AVX intrinsics to an implementation using NEON intrinsics

Both options are apparently not optimal for ARM64, as the code still is inherently written as AVX code.

Back to topic, realistically Intel's ADL is best case (for Intel) at least factor 2 away from M1 efficiency - so not even close. Which technically is an achievement in the sense, that Intel managed to do this with an architecture based an the ancient x86-64 ISA.

Rigby · Nov 10, 2021

Gerdi said:
Back to topic, realistically Intel's ADL is best case (for Intel) at least factor 2 away from M1 efficiency - so not even close. Which technically is an achievement in the sense, that Intel managed to do this with an architecture based an the ancient x86-64 ISA.

We'll see what efficiency they really have in a few weeks. I think it would be more accurate to say it's an achievement given that they are using a manufacturing process that is somewhere between 1-2 generations behind TSMC's 5nm process (which is what the M1 Pro/Max use as the first computer CPUs on the market). The ISA probably doesn't matter much for performance and efficiency. Here's an interesting collection of quotes (with references) from studies and experts (including Jim Keller, who had a hand in designing Apple's ARM CPUs):

ARM or x86? ISA Doesn’t Matter

For the past decade, ARM CPU makers have made repeated attempts to break into the high performance CPU market so it’s no surprise that we’ve seen plenty of articles, videos and discussi…

chipsandcheese.com

Love-hate 🍏 relationship · Nov 10, 2021

sunny5 said:
View attachment 1904823
View attachment 1904824
View attachment 1904834

알더레이크 모바일 엄청 섹시하네요 - 미코

6+8코어가 35w만 먹고 M1Pro/Max만큼 성능이 나온다니 인텔이 이렇게 분발할줄은 상상도 못했습니다

meeco.kr

Just an emulated result but according to China, 12900K with 6 big cores(3.0ghz) and 8 small cores(2.4ghz) used only 35W of power usage and the Cinebench result was around 14300 which is similar to M1 Max's result.

If this is true, then Intel is able to make very efficient CPU with x86 architecture and 7nm which might be very threatening result to Apple Silicon. But still, it's just an emulated result and Intel has more small cores but if it's true, I dont think x86 is the main bottleneck for having ARM's efficient power consumption anymore.

Yes, we need to wait and see how Alder Lake mobile CPU performs but it seems Intel might be able to make M1 and M1 Pro/Max grade CPU with similar power consumption.

https://www.reddit.com/r/hardware/comments/qo41ss

Thoughts?

I mean even though the perf per watt ism incredible on these apple silicon,I'm not sure if it's right to say "whoah these mac's are so much more powerful than the previous models ". After all,the mbp16 2019 was launched 2 years ago and it's processor is now 3 years old (9th gen vs 12th now )

Don't get me wrong these macs are good but I don't think it's fair to compare with their old predecessors

sunny5 · Nov 10, 2021

0089294 said:
I mean even though the perf per watt ism incredible on these apple silicon,I'm not sure if it's right to say "whoah these mac's are so much more powerful than the previous models ". After all,the mbp16 2019 was launched 2 years ago and it's processor is now 3 years old (9th gen vs 12th now )

Don't get me wrong these macs are good but I don't think it's fair to compare with their old predecessors

12th gen Intel CPU just released in this month. I dont know what you are saying.

crazy dave · Nov 10, 2021

Rigby said:
We'll see what efficiency they really have in a few weeks. I think it would be more accurate to say it's an achievement given that they are using a manufacturing process that is somewhere between 1-2 generations behind TSMC's 5nm process (which is what the M1 Pro/Max use as the first computer CPUs on the market). The ISA probably doesn't matter much for performance and efficiency. Here's an interesting collection of quotes (with references) from studies and experts (including Jim Keller, who had a hand in designing Apple's ARM CPUs):

ARM or x86? ISA Doesn’t Matter

For the past decade, ARM CPU makers have made repeated attempts to break into the high performance CPU market so it’s no surprise that we’ve seen plenty of articles, videos and discussi…

chipsandcheese.com

Their foundry is at most 1 full generation/node behind in perf/W - maybe less with Intel 7 (née 10nm+) being a 15% improvement on Intel 10nm. That's the the good news/bad news for Intel. Their foundries aren't as behind as people think, but their uarch (especially P-core) still is.

There are three things that the chipsandcheese post miss or misunderstand:

1) Almost all of those old papers are on in-order or the first out-of-order Arm v7 cores. There is a reason beyond 64-bit why Arm v8 exists which was a complete and total departure from Arm v7's design in many respects. While no one has exactly replicated those studies to my knowledge, in Anandtech's graphs we can see Apple's and even Arm's PPW and PPA advantage are substantial right now. ADL's E cores are an attempt to address Intel's deficiency, especially the second, even with respect to AMD never mind ARM.

2) They mention legacy bloat but misunderstand what Keller was saying. He was saying that Arm will eventually accrue such bloat, but one reason it currently outperforms x86-64 on perf/W is that it has substantially less *right now* because it is a much younger design. Obviously Risc V being the youngest ISA has the least legacy of all of them. Moving forwards, Arm will actually have even less as standard Arm cores will follow Apple and effectively drop Arm v7 support (technically you could still build a custom Arm v9 core with Arm v7 support, but so far no one has said they will). When Keller and others talk about the intrinsic performance gap between Arm and x86-64 being far smaller than the current gap would lead you to believe (about ~10% better for Arm v8 seems to generally agreed on paper napkin math), they are talking about a hypothetical clean sheet x86-64 design removing all the decades of cruft hold it back ... including 32-bit x86. In addition x86 still boots into 16-bit mode during its boot up sequence and has 8-bit support for addressing and registers. Dropping that legacy support has obvious implications for Wintel and there's a reason it hasn't been done. I believe these decades of cruft separating the two designs is what primarily what @Gerdi is referencing here. Even Arm v7 doesn't have that much and overall Arm and especially Apple can afford to be more ruthless here and have been so in their designs.

3) Microops and decode. ARM and Intel micro-ops vs microcode are *very* different. https://talkedabout.com/threads/x86-vs-arm.2182/page-2#post-72118. And while decode may or may not be as expensive right now, not until ADL was decode moved wider than it was, whereas Apple moved wider years ago and is still wider today. It is easier to go wide in ARM v8 than in x86-64. It is possible to go wider than 4-decode in the latter (obviously because ADL just did it), but such designs require more tricks to do so in a way that doesn't cause power to skyrocket. We see this in ADL with both the P-core and E-core deploying novel decode strategies and/or huge op/I-caches relative to previous Intel processors.

Having said that, obviously microarchitecture implementation and fabrication node are still huge factors. That part is true. It's just that their thesis of "ISA doesn't matter in practice" needs some additional qualification as the "in practice" part is "in practice ... in theory".

crazy dave · Nov 10, 2021

0089294 said:
I mean even though the perf per watt ism incredible on these apple silicon,I'm not sure if it's right to say "whoah these mac's are so much more powerful than the previous models ". After all,the mbp16 2019 was launched 2 years ago and it's processor is now 3 years old (9th gen vs 12th now )

Don't get me wrong these macs are good but I don't think it's fair to compare with their old predecessors

They've been compared favorably to Tiger Lake and Rocket Lake and Zen 3 chips as well. Not just previous models of Macs. Desktop ADL released just a week ago is a huge step up for Intel, though I have my doubts that the emulated results presented in the OP will hold up at 35W for mobile.

Love-hate 🍏 relationship · Nov 10, 2021

crazy dave said:
They've been compared favorably to Tiger Lake and Rocket Lake and Zen 3 chips as well. Not just previous models of Macs. Desktop ADL released just a week ago is a huge step up for Intel, though I have my doubts that the emulated results presented in the OP will hold up at 35W for mobile.

favorably but not necessarily destrying them

Love-hate 🍏 relationship · Nov 10, 2021

sunny5 said:
12th gen Intel CPU just released in this month. I dont know what you are saying.

i know,but its been released pretty closed to the m1pro chip thats what i mean.tho i agree if they had released an intel mac with regular schedule it woulda been 11th gen

crazy dave · Nov 10, 2021

0089294 said:
favorably but not necessarily destrying them

I dunno wrt to Tiger Lake ... if that wasn't destroying it I don't what else it would take to classify as TGL being "destroyed" ... I mean, better raw performance on almost every benchmark and 4-7x better performance per watt is just ... huge. We'll see ADL's mobile response soon. Looking at Desktop results, I suspect the top i7/9 chips will have equal/better raw performance in full power mode but at 2.5-5x the PPW and thus 2-4x the joules. Mid-tier i5 mobile will have less performance than the M1 Pro, but a better efficiency than their i7/i9 brethren and not suffer as much throttling or default off-battery performance loss.

thedocbwarren · Nov 10, 2021

Just an FYI if this is still unclear: Apple is no longer using Intel chips. It is irrelevant what Intel releases in this space. At best it's "interesting" to compare even though they are comparable. Nothing will come of it. This is the Apple Silicon board as well so why are we talking about a chip we will never use and not in any Mac nor ever will be?

Anyway, just saying. For what it's worth.

Rigby · Nov 10, 2021

crazy dave said:
Their foundry is at most 1 full generation/node behind in perf/W - maybe less with Intel 7 (née 10nm+) being a 15% improvement on Intel 10nm.

I think it's currently at least one behind, probably a bit more. The "Intel 7" name notwithstanding, I doubt that the process is entirely up to par to the now mature TSMC 7nm node. What happens next depends on Intel's ability to execute on their very ambitious roadmap, and on whether TSMC will continue executing without stumbling as they have in the last few years.

crazy dave said:
ADL's E cores are an attempt to address Intel's deficiency, especially the second, even with respect to AMD never mind ARM.

My guess is that there are a number of reasons why they went this route, some of them forward looking. Power efficiency is one (taking a page from mobile devices), defending against AMD's former multi-threaded performance advantage is another. And arguably the hybrid architecture (and the developing software support for it in Windows and Linux) is an asset for them that has the potential to help them leapfrog AMD in some segments.

crazy dave said:
2) They mention legacy bloat but misunderstand what Keller was saying. He was saying that Arm will eventually accrue such bloat, but one reason it currently outperforms x86-64 on perf/W is that it has substantially less *right now* because it is a much younger design.

ARM's ISA is actually quite old too, and x86-64 isn't your grandfather's x86 (but actually a rather modern instruction set). Anyway, one key point also made in the article is that legacy support doesn't really hold you back much, because there is no need to optimize legacy instructions. You just implement them in micro-code without adding much custom logic in silicon. Software that has been compiled less than a decade ago doesn't use the legacy instructions much anyway.

In other words, the "cruft" doesn't cost much but has a lot of value for many customers because they can continue to run old software where needed. That's a big part of why x86 has prevailed against various other contenders over the years.

crazy dave said:
Obviously Risc V being the youngest ISA has the least legacy of all of them. Moving forwards, Arm will actually have even less as standard Arm cores will follow Apple and effectively drop Arm v7 support (technically you could still build a custom Arm v9 core with Arm v7 support, but so far no one has said they will).

Dropping mandatory support for an older version of an instruction set doesn't make it a new instruction set. It still has decades of legacy design in it that is no longer state of the art.

crazy dave said:
3) Microops and decode. ARM and Intel micro-ops vs microcode are *very* different. https://talkedabout.com/threads/x86-vs-arm.2182/page-2#post-72118. And while decode may or may not be as expensive right now, not until ADL was decode moved wider than it was, whereas Apple moved wider years ago and is still wider today.

"Wider" isn't automatically the best choice. There are always tradeoffs.

mr_roboto · Nov 10, 2021

crazy dave said:
Obviously Risc V being the youngest ISA has the least legacy of all of them.

Welllllllllll... you'd think that, but then you go and look at the mess they've already made of RISC-V and you realize they decided to cut to the chase. I'm not very impressed by it, arm64 looks much cleaner.

crazy dave · Nov 10, 2021

Rigby said:
I think it's currently at least one behind, probably a bit more. The "Intel 7" name notwithstanding, I doubt that the process is entirely up to par to the now mature TSMC 7nm node. What happens next depends on Intel's ability to execute on their very ambitious roadmap, and on whether TSMC will continue executing without stumbling as they have in the last few years.

While it’s entirely possible that the analysis done so far is wrong as there are no chips produced in common (so far), the industry analysts I’ve read pegged Intel’s first 10nm process at roughly equivalent to TSMC’s 7nm in terms of PPA and PPW, maybe better, with slightly lower or equivalent transistor density depending on design and feature. If Intel 7 does indeed represent a 15% increase in PPW that would put it at 7nm+ or better. Again, these are of course all estimates since we don’t have the same processors being produced on each. If you have info that says otherwise I’d be happy to read it.

My guess is that there are a number of reasons why they went this route, some of them forward looking. Power efficiency is one (taking a page from mobile devices), defending against AMD's former multi-threaded performance advantage is another. And arguably the hybrid architecture (and the developing software support for it in Windows and Linux) is an asset for them that has the potential to help them leapfrog AMD in some segments.

The main reason is your second. The E-cores are not truly E-cores as we’ve seen them from ARM. This is huge.medium not big.little.

ARM's ISA is actually quite old too, and x86-64 isn't your grandfather's x86 (but actually a rather modern instruction set). Anyway, one key point also made in the article is that legacy support doesn't really hold you back much, because there is no need to optimize legacy instructions. You just implement them in micro-code without adding much custom logic in silicon. Software that has been compiled less than a decade ago doesn't use the legacy instructions much anyway.

In other words, the "cruft" doesn't cost much but has a lot of value for many customers because they can continue to run old software where needed. That's a big part of why x86 has prevailed against various other contenders over the years.

Dropping mandatory support for an older version of an instruction set doesn't make it a new instruction set. It still has decades of legacy design in it that is no longer state of the art.

ARM v8 just turned 10 years old. x86-64 is over 20. The original x86 puts them both to shame. Every chip designer I’ve read has cited x86 cruft as a major reason why modern x86-64 chips use too much power and too much for area and that a clean sheet x86-64 only design would vastly improve current cores. Btw the person you responded to originally, @Gerdi, if I remember right, is an ex-Intel chip designer. The guy I linked to in my response to you and wrote basically what I’ve said above, @cmaier, helped design x86-64 at AMD. The link explains why the microcode x86 instructions are not actually that elegant and cost both die area and power. Everyone else I’ve read, including Keller, concurs: legacy cruft is something that holds back chip design and x86-64 has it in a way that ARM doesn’t. This is as far as I can tell a unanimous assertion amongst chip designers who are at liberty to opine on the subject. Now of course you can’t just “drop support”, you have to redesign - hence, the term “clean sheet” design.

Obviously compatibility with legacy software is why they don’t do it - that and too many different groups to coordinate the transition quickly - but that doesn’t change that significant die area and power are being expended to maintain that compatibility.

"Wider" isn't automatically the best choice. There are always tradeoffs.

True every design is a trade off in this case ILP for clock speed/power, but Intel obviously felt there was benefit to going wider as they did do so in ADL. And my post was not about whether it was better. I was responding to the article saying that Intel/AMD could go wider than 4 decide if they wanted to just like Apple, that the ISA is no hindrance to such a design. And that’s true … but with a catch. It’s far trickier for them to do it than it is for arm64 chip. It can be done, has been done, and this is what Keller was referencing since he was undoubtedly part of the team that helped do it. However, it was much more trouble to pull off than Apple’s design which can, depending on your point of view, more elegantly/easily brute force go wider.

crazy dave · Nov 10, 2021

mr_roboto said:
Welllllllllll... you'd think that, but then you go and look at the mess they've already made of RISC-V and you realize they decided to cut to the chase. I'm not very impressed by it, arm64 looks much cleaner.

Fair enough

that’s another can of worms though!

mr_roboto · Nov 10, 2021

Rigby said:
ARM's ISA is actually quite old too, and x86-64 isn't your grandfather's x86 (but actually a rather modern instruction set). Anyway, one key point also made in the article is that legacy support doesn't really hold you back much, because there is no need to optimize legacy instructions. You just implement them in micro-code without adding much custom logic in silicon. Software that has been compiled less than a decade ago doesn't use the legacy instructions much anyway.

In other words, the "cruft" doesn't cost much but has a lot of value for many customers because they can continue to run old software where needed. That's a big part of why x86 has prevailed against various other contenders over the years.

In x86 land, you can't get rid of all the legacy because x86-64 is the same encoding scheme as i386, extended with more prefix bytes. 32-bit instructions are still legal in 64-bit mode (*), so the most that can be hoped for is to finally ditch 8086 and 80286 compatibility someday.

ARM v8, on the other hand, treats 32-bit Arm support as a separate and optional mode, and the 64-bit ISA is a very clean break with Arm's past. The only concession to compatibility is that AArch64 registers are a superset of AArch32 registers. The ISA which operates on those registers? No legacy at all, right down to the instruction set encoding. To an Arm v8 CPU in AArch64 mode, 32-bit Arm instructions are gibberish in an alien language.

Since Arm v8 makes it legal to build a core which can only run in AArch64 mode, it's possible to design a legacy-free Arm CPU with zero transistors spent on the old stuff. And that is exactly what Apple has done. Their first few AArch64 CPUs were dual-mode since lots of iOS apps were still 32-bit, but the nature of the iOS App Store helped them quickly get rid of legacy 32-bit binaries. Only took a few years before they were able to begin shipping 64-bit-only CPUs.

* - It's often claimed that Rosetta 2 only supports 64-bit. This isn't true, it turns out it's perfectly capable of running 32-bit code. Has to be! And my understanding is that this is how CrossOver manages to support 32-bit Windows apps inside Rosetta. What's impossible is running 32-bit x86 Mac apps, and it's for the same reason that macOS Catalina and later won't run them even on x86 CPUs: Apple stopped shipping 32-bit x86 libraries.

cmaier · Nov 10, 2021

Rigby said:
ARM's ISA is actually quite old too, and x86-64 isn't your grandfather's x86 (but actually a rather modern instruction set).

While I’d like to think that‘s true, when I was asked to sketch out the 64-bit integer ALU instructions for what became AMD64 (now x86-64), it must have been around the year 2000. 20 or 21 years ago is a long time - my daughter tells me I’m not very modern.

(Also, we were pretty hamstrung by having to keep compatibility and making the instruction set something that Microsoft would buy into, so it’s not like we started with a clean sheet of paper the way Arm did for aarch64.)

crazy dave · Nov 11, 2021

mr_roboto said:
In x86 land, you can't get rid of all the legacy because x86-64 is the same encoding scheme as i386, extended with more prefix bytes. 32-bit instructions are still legal in 64-bit mode (*), so the most that can be hoped for is to finally ditch 8086 and 80286 compatibility someday.

Could you explain further? I was under the impression that while it’s true that x86-64 was prefix byte extended version of x86, that you could clean a lot up of x86-64 core design by not supporting x86 itself. Because I’ve definitely read this before:

Good points. I’ve suggested that an x86-64 chip that threw away compatibility with 32-bit and below software would be a much more interesting chip. Apple made a similar choice with M and A chips.

X86 vs. Arm

At “the other place” I promised to address the fundamental disadvantage that Intel has to cope with in trying to match Apple’s M-series chips. I’ll do that in this thread, a little bit at a time, probably as a stream of consciousness sort of thing. Probably the first thing I’ll note is that...

talkedabout.com

mr_roboto said:
* - It's often claimed that Rosetta 2 only supports 64-bit. This isn't true, it turns out it's perfectly capable of running 32-bit code. Has to be! And my understanding is that this is how CrossOver manages to support 32-bit Windows apps inside Rosetta. What's impossible is running 32-bit x86 Mac apps, and it's for the same reason that macOS Catalina and later won't run them even on x86 CPUs: Apple stopped shipping 32-bit x86 libraries.

It’s not quite this simple as standard Wine does not support running 32bit Windows Apps on macOS. Much of the work on wine32on64 that was done for Catalina still seems to be necessary (I believe for 32 to 64bit pointer thunking). That work has not been upstreamed into standard Wine because it involves a compiler hack. However I’ll admit that I don’t know if other parts of it are simplified by the presence of Rosetta 2.

So We Don't Have a Solution for Catalina...Yet | CodeWeavers Blog

By Jana Schmid | But we do have is an explanation of the technical curveballs we’ve been thrown, the progress we have made, the time we see fit for us to release a CrossOver compatible Catalina, guidance for you from our support team and a gesture of goodwill if our produc

www.codeweavers.com

Catalina and the future of Wine on Mac - Page 6 - WineHQ Forums

Scroll down to Gcenx’s post.

theorist9 · Nov 11, 2021

Gerdi said:
Few words regarding Cinebench. It uses the Embree Rendering library - i library developed by Intel. There are only AVX code paths available. If you want to compile Embree for ARM64 you have 2 options:
1) Use the default C implementation without any SIMD instructions
2) Use the AVX-to-NEON wrapper - this is a static wrapper, which maps AVX intrinsics to an implementation using NEON intrinsics

Both options are apparently not optimal for ARM64, as the code still is inherently written as AVX code.

Back to topic, realistically Intel's ADL is best case (for Intel) at least factor 2 away from M1 efficiency - so not even close. Which technically is an achievement in the sense, that Intel managed to do this with an architecture based an the ancient x86-64 ISA.

This somewhat reminds me of what's going on with Wolfram's native AS build of Mathematica. When run on Intel chips, it can make use of Intel's highly optimized MKL (for other readers: Math Kernel Library). An equally-optimized full replacement for MKL is not yet available for its AS (Apple Silicon) build. Hence (at least partly for this reason—there may be other things involved) it's not yet as well-optimized for AS.

The link below is mostly about what MKL functions are available on Intel but not AS; but it also mentions that, even where they have created/obtained AS replacements, they are often not as well optimized as MKL's:

"I think for everything else we have a replacement, though it may not be as optimized. MKL is very highly optimized and even under Rosetta 2 can outperform native libraries in some situations."

Library for FEAST method is missing

Mathematica (V 12.3.1, Native Mac M1 version) is not letting me use the FEAST method for solving eigenvalue problems. For example, testym = {{1., 3.}, {3., 5.}}; Eigensystem[testym, Method -> {...

mathematica.stackexchange.com

Rigby · Nov 11, 2021

crazy dave said:
While it’s entirely possible that the analysis done so far is wrong as there are no chips produced in common (so far), the industry analysts I’ve read pegged Intel’s first 10nm process at roughly equivalent to TSMC’s 7nm in terms of PPA and PPW, maybe better, with slightly lower or equivalent transistor density depending on design and feature. If Intel 7 does indeed represent a 15% increase in PPW that would put it at 7nm+ or better. Again, these are of course all estimates since we don’t have the same processors being produced on each. If you have info that says otherwise I’d be happy to read it.

The E-cores are not truly E-cores as we’ve seen them from ARM. This is huge.medium not big.little.

The important thing is that the groundwork for their big.little designs is done, and initial OS support is available (if not yet fully mature). Scaling the E-cores up or down (along with appropriate adaptation of the parameters provided to the OS CPU scheduler by the "thread director") shouldn't be a major issue going forward.

crazy dave said:
The guy I linked to in my response to you and wrote basically what I’ve said above, @cmaier, helped design x86-64 at AMD. The link explains why the microcode x86 instructions are not actually that elegant and cost both die area and power.

Yeah well, that posting also claims that ARM has nothing like micro-code, which is not true for modern ARM CPUs according to the Chips and Cheese article. But admittedly I'm not an ARM expert.

crazy dave said:
Everyone else I’ve read, including Keller, concurs: legacy cruft is something that holds back chip design and x86-64 has it in a way that ARM doesn’t.

This doesn't really jive with what Keller said in that Anandtech interview.

crazy dave said:
Obviously compatibility with legacy software is why they don’t do it - that and too many different groups to coordinate the transition quickly - but that doesn’t change that significant die area and power are being expended to maintain that compatibility.

The question is how "significant" that really is in a modern CPU.

Rigby · Nov 11, 2021

mr_roboto said:
ARM v8, on the other hand, treats 32-bit Arm support as a separate and optional mode, and the 64-bit ISA is a very clean break with Arm's past. The only concession to compatibility is that AArch64 registers are a superset of AArch32 registers. The ISA which operates on those registers? No legacy at all, right down to the instruction set encoding. To an Arm v8 CPU in AArch64 mode, 32-bit Arm instructions are gibberish in an alien language.

There's more to an ISA than instruction encoding. Again I'm not claiming to be an ARM expert (never even wrote a single line of assembly code for one), but this is what Keller had to say about it:

"So when Arm first came out, it was a clean 32-bit computer. Compared to x86, it just looked way simpler and easier to build. Then they added a 16-bit mode and the IT (if then) instruction, which is awful. Then [they added] a weird floating-point vector extension set with overlays in a register file, and then 64-bit, which partly cleaned it up. There was some special stuff for security and booting, and so it has only got more complicated."

And:

"x86-64 was a fairly clean slate, but obviously it had to carry all the old baggage for this and that. They deprecated a lot of the old 16-bit modes. There's a whole bunch of gunk that disappeared, and sometimes if you're careful, you can say ‘I need to support this legacy, but it doesn't have to be performant, and I can isolate it from the rest’. You either emulate it or support it."

crazy dave · Nov 11, 2021

Rigby said:
Yeah well, that posting also claims that ARM has nothing like micro-code, which is not true for modern ARM CPUs according to the Chips and Cheese article. But admittedly I'm not an ARM expert.

No the post says it has microcode, just nothing like what x86-64 microcode can be like. x86-64 microcode *can* be and often is as simple as Arm’s, but not in all cases and when it isn’t, it gets nasty. Arm’s never gets anywhere near as complicated. So saying Arm also has microcode as though they are equivalent is wrong.

This doesn't really jive with what Keller said in that Anandtech interview.

The question is how "significant" that really is in a modern CPU.

Rigby said:
There's more to an ISA than instruction encoding. Again I'm not claiming to be an ARM expert (never even wrote a single line of assembly code for one), but this is what Keller had to say about it:

"So when Arm first came out, it was a clean 32-bit computer. Compared to x86, it just looked way simpler and easier to build. Then they added a 16-bit mode and the IT (if then) instruction, which is awful. Then [they added] a weird floating-point vector extension set with overlays in a register file, and then 64-bit, which partly cleaned it up. There was some special stuff for security and booting, and so it has only got more complicated."

And:

"x86-64 was a fairly clean slate, but obviously it had to carry all the old baggage for this and that. They deprecated a lot of the old 16-bit modes. There's a whole bunch of gunk that disappeared, and sometimes if you're careful, you can say ‘I need to support this legacy, but it doesn't have to be performant, and I can isolate it from the rest’. You either emulate it or support it."

Yes but this is in the context of starting from clean and working your way up vs starting from dirty and trying to clean your way down. Keller is stating that each have done each but that doesn’t mean x86-64 doesn’t carry extra baggage that ARM doesn’t. Remember I said that a x86-64 design has been calculated to be a 10% performance loss vs arm64 …

https://twitter.com/x/status/1452667309202493443

https://twitter.com/x/status/1452674991372255242

That’s not as much as the apparent advantage is today. Is 10% insurmountable? Well no, but it ain’t 0 either. Btw, this is what I meant when I say even Keller is in agreement. Everyone seems to agree on this number as the intrinsic ISA advantage of arm64.

But neither, and especially not x86-64 cores, are clean-sheet designs with no legacy loss. Yes Apple hired and bought very well and those designers and engineers implemented their CPU uarch well with an ISA that dovetailed very well with that design, but the engineers at AMD and Intel are hardly incompetent. They’re doing great work. Zen 3 in particular was remarkable. I’d also argue that Gracemont is a potentially really cool design. But eventually you have to ask, why ARM and especially Apple are so far ahead in PPW and PPA, beyond what manufacturing explains. The rest comes down to design priorities and philosophies and the ISA. Low power priority designs vs desktop power priority designs probably make the biggest part of it. But part of that ISA issue and maybe exists on top of that 10% is that legacy support and design in silicon using extra silicon and power. How much is indeed difficult to quantify and the estimate will depend on who you talk to from small to large, but it ain’t ever zero. Now maybe Intel and AMD can clean that up further to the point where it is almost 0 without dropping legacy ISA support entirely (well 0 on top of the 10%).

This is why I say that chipsandcheese article needs additional qualifications. It's not outright wrong. It's largely correct, but the nuances matter.

Also in these quotes Keller is underplaying for rhetorical effect how little cleaned up 64bit ARM is here (and overplaying how cleaned up x86-64 was). 64bit ARM was a complete do over and rework in a way that x86-64 was not. Maybe he thinks that the 10% arm64 advantage is all that it enjoys right now and no further gains for x86-64 can be drawn from dropping legacy modes. Certainly @mr_roboto stated that there is a limit to what can be dropped from x86-64 which stands in contrast to @cmaier 's assertion that the x86-64 core designs could be much more efficient than they are without the legacy cruft. The latter would suggest to me it’s 10%+ right now, but could be closer to that optimum efficiency. I’ve seen this suggested elsewhere as well.

Think of it this way what are the factors that influence why one chip performs better for less power and silicon area than another:

1) Manufacturing
2) ISA (including legacy)
3) uarch design
4) SOC/uncore design
5) software optimization

Right now in software that is optimized equally, Apple enjoys a massive p/w advantage over Intel and a smaller but still significant one over AMD. The issue at hand is how important #2 is. According to estimates, it’s about 10%, but maybe more depending on how you include legacy design and it’s impact on 3).

cmaier · Nov 11, 2021

Rigby said:
Yeah well, that posting also claims that ARM has nothing like micro-code, which is not true for modern ARM CPUs according to the Chips and Cheese article. But admittedly I'm not an ARM expert.

That mischaracterizes what I wrote. What I actually said was that Arm does not require a microcode ROM or a microcode sequencer. At most you have fused instructions where you can immediately decode into two microOps, but most instructions require no microcode at all (not in the sense that computer designers mean. Obviously the internal representation of an instruction is different than the ISA representation, but there’s a 1:1 mapping in almost all cases in Arm). And not having a sequencer is a huge difference.

mr_roboto · Nov 11, 2021

crazy dave said:
Could you explain further? I was under the impression that while it’s true that x86-64 was prefix byte extended version of x86, that you could clean a lot up of x86-64 core design by not supporting x86 itself.
[...]
It’s not quite this simple as standard Wine does not support running 32bit Windows Apps on macOS. Much of the work on wine32on64 that was done for Catalina still seems to be necessary (I believe for 32 to 64bit pointer thunking). That work has not been upstreamed into standard Wine because it involves a compiler hack. However I’ll admit that I don’t know if other parts of it are simplified by the presence of Rosetta 2.

x86-64 still includes IA32 opcodes. They don't behave exactly like standard IA32 opcodes any more: the default size of addresses gets promoted to 64 bits, while the default size of data remains 32 bits. x86-64 also introduces operand size prefix bytes which may be used to override either default.

Because x86-64 is so closely built on top of IA32, you can't get away from its legacy, especially in the place where you most want to - the decoders. I suspect a x86-64-only core wouldn't be much less complex than a x86-64 + IA32 core. As far as I know, when @cmaier has talked about how it'd be nice to see modern x86-64 drop some legacy, he's been discussing the latter option, not the former - feel free to correct me if I'm wrong, Cliff!

Crossover's support for 32-bit Windows apps is based on the recognition that for the most part, 32-bit IA32 code can run fine on an x86-64 in 64-bit mode as long as there's enough thunking supporting it. A 'thunk' is an ABI translation layer between the system (or a library) and userspace code. This kind of thing would have to be very careful about what it lets the IA32 code see, and where it's allowed to live - code compiled for IA32 is incredibly likely to break on x86-64 if it's ever allowed to see an address outside the low 4GB of the 64-bit address space, since it has no idea how to preserve the upper 32 bits of any pointer.

As far as I know, that's what Crossover has done. It's hard to find definitive statements, but things like this blog post make it clear they're currently reliant on Rosetta 2 and will have to come up with an alternative if Apple ever drops it.

What is our technology roadmap for Apple Silicon? | CodeWeavers Blog

By James Ramey | You asked. We answered. Today, tomorrow and the future.

www.codeweavers.com

cmaier · Nov 11, 2021

mr_roboto said:
x86-64 still includes IA32 opcodes. They don't behave exactly like standard IA32 opcodes any more: the default size of addresses gets promoted to 64 bits, while the default size of data remains 32 bits. x86-64 also introduces operand size prefix bytes which may be used to override either default.

Because x86-64 is so closely built on top of IA32, you can't get away from its legacy, especially in the place where you most want to - the decoders. I suspect a x86-64-only core wouldn't be much less complex than a x86-64 + IA32 core. As far as I know, when @cmaier has talked about how it'd be nice to see modern x86-64 drop some legacy, he's been discussing the latter option, not the former - feel free to correct me if I'm wrong, Cliff!

Crossover's support for 32-bit Windows apps is based on the recognition that for the most part, 32-bit IA32 code can run fine on an x86-64 in 64-bit mode as long as there's enough thunking supporting it. A 'thunk' is an ABI translation layer between the system (or a library) and userspace code. This kind of thing would have to be very careful about what it lets the IA32 code see, and where it's allowed to live - code compiled for IA32 is incredibly likely to break on x86-64 if it's ever allowed to see an address outside the low 4GB of the 64-bit address space, since it has no idea how to preserve the upper 32 bits of any pointer.

As far as I know, that's what Crossover has done. It's hard to find definitive statements, but things like this blog post make it clear they're currently reliant on Rosetta 2 and will have to come up with an alternative if Apple ever drops it.

What is our technology roadmap for Apple Silicon? | CodeWeavers Blog

By James Ramey | You asked. We answered. Today, tomorrow and the future.

www.codeweavers.com

I‘d have to go through the instruction set instruction-by-instruction and try and figure out what could go and which “32-bit” instructions would need to stay, but I’m pretty sure you could simplify the decoder substantially by getting rid of most of the variable-length stuff. And most of the 64-bit stuff doesn’t require complex sequences of microcode, so that also simplifies things. This Frankenprocessor might not run existing software without modification, but you’d have to change things up a little bit just to get an OS to boot since existing x86-64 starts up in 16-bit mode anyway.

Emulated 12900K for mobile has better performance with low power consumption like M1 Max?

Suspended

macrumors 6502

macrumors 603

macrumors 68040

Suspended

macrumors 68000

macrumors 68000

macrumors 68040

macrumors 68040

macrumors 68000

macrumors 6502

macrumors 603

macrumors 6502a

macrumors 68000

macrumors 68000

macrumors 6502a

Suspended

macrumors 68000

macrumors 601

macrumors 603

macrumors 603

macrumors 68000

Suspended

macrumors 6502a

Suspended

Our Staff