i hope intel comes back to beat the m1

cmaier · Apr 13, 2021

Sydde said:
Since Rosetta is a static code translator, it could, theoretically, be possible to liberally salt the translated code with STL/LDA instructions to enforce slightly sharper memory ordering at somewhat less of a cost that throwing in a bunch of DMBs. However, in some cases, the release/acquire instructions would incur a penalty because they are register-direct only (indexes, offsets and updates would have to be handled manually). Of course, you would have to have a very sophisticated translator to be able to tease out the places where that would be needed.

Some years ago (probably 25+) I read part of an essay in PC Mag (or somesuch) entitled “RISC: Unsafe at Any Speed” in which the author made the argument that coding for CISC was better because small mistakes arising from having to code more laboriously were a major hazard. Today, obviously, no one hand-codes in AL (at least, not to any significant extent) and compilers are parsecs beyond what we used back in the day, in terms of converting verbose source into refined object code. Reading cmaier's accounts seem to suggest that, where raw speed is called for, that author had the story totally backwards.

CISC makes sense in only two scenarios. (1) where you are hand coding in assembly language (and even then, you could always create macros that do the same thing as the CISC instructions, so what’s the point) and (2) where you are extremely memory-limited, and saving some bytes of instruction memory is important (which is really never in modern machines)

thedocbwarren · Apr 13, 2021

cmaier said:
CISC makes sense in only two scenarios. (1) where you are hand coding in assembly language (and even then, you could always create macros that do the same thing as the CISC instructions, so what’s the point) and (2) where you are extremely memory-limited, and saving some bytes of instruction memory is important (which is really never in modern machines)

Yes exactly! I've made this argument as well. I've written macros before that is the same honestly.

Sydde · Apr 13, 2021

cmaier said:
… where you are extremely memory-limited …

ARM did have Thumb for extremely tight memory space, but since Apple has completely ditched ARM-32-bit (except, perhaps, in WatchOS), there is no Thumb on M1. Probably no Jazelle either.

thekev · Apr 13, 2021

leman said:
Yeah, it's very obvious that Apple deliberately trades the ability to achieve high clocks for the ability to execute many instructions simultaneously. This gives them very high baseline performance and (probably more importantly!) predictable power usage, but not much breathing space above. I do wonder whether Firestorm M1 is clocked conservatively or whether they could potentially push it higher. Anandtech did report that earlier cores (A12 I believe) showed a huge spike in power consumption at the end of their frequency curve. Still, if Apple is holding back (and I would expect anything from those sneaky folks), and they could ship stable Firestorm at 3.5-3.7ghz while still keeping the power consumption of 10-15W per core, it would look bad for Intel or AMD.
e in the x86 land, they are smelling blood and starting to challenge Intel where it hurts the most — the server market.

They might in the sense of optimizing for the use of multiple cores, but typical latency/throughput ratios on common arithmetic instructions don't seem to vary that much across vendors in this space. Most of the ARM cpus destined for phones and things appear to target ratios that are easy to optimize code generation around when comes to common arithmetic operations (except division).

A lot of these things also spend a crazy amount of time waiting on memory and may saturate the number of in flight loads/stores they can support on common workloads, even if it happens less often when processing large amounts of matrix arithmetic. I imagine they probably don't favor clocking as high as possible for this reason. Their brand focuses a lot on being light and having good battery life.

bobcomer · Apr 13, 2021

Smalltalk said:
I'm not sure what you mean. A symmetric 1Gb connection with low latency had better be fast. What else could you use?

Well, we have a 75Mb fiber connection at work... Reliable, but not fast. I never really tested latency. A 1G connection would be quite a bit more costly.

Ethosik · Apr 13, 2021

JouniS said:
Gaming is one of the biggest reasons for backward compatibility. People often return to games from 10-15 years ago, just like they may want to rewatch "old" movies from the 2000s. And while some popular games get "enhanced editions" or "remastered versions", playing old games usually requires running software that has not received updates for a long time.

Gaming is going to become very niche. While there will be people, like me, what prefers 144+ FPS on 144+ Hz and can DEFINITELY notice input lag on streaming services, the general user does not have these issues. Therefore, it will get even more niche as my brother for example will be just fine with cloud gaming.

Maconplasma · Apr 13, 2021

JouniS said:
Gaming is one of the biggest reasons for backward compatibility. People often return to games from 10-15 years ago, just like they may want to rewatch "old" movies from the 2000s. And while some popular games get "enhanced editions" or "remastered versions", playing old games usually requires running software that has not received updates for a long time.

Neither Macs or Windows should be supporting games that are 10-15 years old, nor should those games be kept in mind as the OS's get updated. By your take on this there should be no moving forward in the world of tech. And as much as you may not accept it gaming on computers is becoming a very small audience of people doing it. With the advancements of the Playstation and Xbox there's no need for computer companies to keep gaming as one of the main needs. Lordy you actually used an old movie as an example? You can watch movies back in the 1940's and they don't require software updates or special OS support to watch them.

JouniS · Apr 13, 2021

xWhiplash said:
Gaming is going to become very niche. While there will be people, like me, what prefers 144+ FPS on 144+ Hz and can DEFINITELY notice input lag on streaming services, the general user does not have these issues. Therefore, it will get even more niche as my brother for example will be just fine with cloud gaming.

Gaming is a bigger industry than movies and music, and it's growing at a faster pace.

Streaming services may help with new games. If you live in an area where such services are available and have a reliable internet connection, you don't need latest hardware to play latest games. Such services don't change the need for backward compatibility. The business models of streaming services don't really favor keeping old classics readily available all the time. If you want to play old classics, you better have a local copy available.

Maconplasma said:
Neither Macs or Windows should be supporting games that are 10-15 years old, nor should those games be kept in mind as the OS's get updated. By your take on this there should be no moving forward in the world of tech.

This makes absolutely no sense. What good is technological progress, if it means losing much of our culture?

Once games are old enough, we can play them on emulators and let the technology move forward. Games from the recent past need compatibility, because they require too much from the hardware to emulate on current hardware. Because the 100x performance increases in a decade are a thing of the past, we now need backward compatibility for longer than we used to.

Maconplasma said:
And as much as you may not accept it gaming on computers is becoming a very small audience of people doing it. With the advancements of the Playstation and Xbox there's no need for computer companies to keep gaming as one of the main needs.

PC gaming alone is bigger than recorded music, and it's still growing. While closed platforms such as consoles and mobile are good for the more consumer-oriented aspects of gaming, open platforms are better for more creative aspects such as modding and indie games. PC gaming is also where the future game developers come from.

robco74 · Apr 13, 2021

It's on the game developer to update their titles and keep them current. It's up to OS vendors to provide the tools to allow them to do it. Typically when Apple abandons older technologies and frameworks, it provides a path for developers to update their software. Some developers will decide it isn't worthwhile, but it's up to them.

As for movies and music, those are also available on modern media so long as they are converted and made available - but that still requires effort and resources. Most folks aren't going to buy a reel-to-reel, 8-track, or film projector to consume older content. And that content may require a new purchase, as I can attest, having to repurchase some of the music I had on cassette when CDs came out, or some movies that were on VHS on DVD, and later digital.

JouniS · Apr 13, 2021

robco74 said:
It's on the game developer to update their titles and keep them current. It's up to OS vendors to provide the tools to allow them to do it. Typically when Apple abandons older technologies and frameworks, it provides a path for developers to update their software. Some developers will decide it isn't worthwhile, but it's up to them.

Unfortunately it doesn't work that way. Once a developer stops publishing new content for a game, the sales effectively stop. Porting an old game to a new environment takes a lot of work for what is usually very little gain. Very few developers see it worthwhile, which is why gamers rely on the platform to provide backward compatibility.

robco74 · Apr 13, 2021

JouniS said:
Unfortunately it doesn't work that way. Once a developer stops publishing new content for a game, the sales effectively stop. Porting an old game to a new environment takes a lot of work for what is usually very little gain. Very few developers see it worthwhile, which is why gamers rely on the platform to provide backward compatibility.

It depends on the extent of the changes of course. It also depends on how many active users a title has. In some cases, it's worth doing a "remastered" version of a beloved title to drum up new sales as well. This is also why you see more devs looking for recurring revenue though add-ons and expansions.

At some point though, you reach the point of diminishing returns. Apple certainly has the resources to hire enough engineers to maintain backward compatibility, but it would add considerable time to each release for development and testing. At some point, I wonder if MS will go back to a bifurcated Windows ecosystem with a legacy version of Windows users can boot into for legacy apps and games, and a newer version that abandons some of the cruft.

But no matter which path is chosen, anyone affected will complain. Loudly.

Maconplasma · Apr 13, 2021

JouniS said:
This makes absolutely no sense. What good is technological progress, if it means losing much of our culture?

Once games are old enough, we can play them on emulators and let the technology move forward. Games from the recent past need compatibility, because they require too much from the hardware to emulate on current hardware.

FROM JouniS:
"Gaming is one of the biggest reasons for backward compatibility. People often return to games from 10-15 years ago, just like they may want to rewatch "old" movies from the 2000s. And while some popular games get "enhanced editions" or "remastered versions", playing old games usually requires running software that has not received updates for a long time."

The post I quoted wasn't talking about recent games was it? No. It was talking about supporting games that are 10-15 years old which makes ZERO sense for today's OS's to support other than by use of an emulator that runs on a current OS. Not a thing you stated in this post had anything to do with recent games.

JouniS said:
PC gaming alone is bigger than recorded music, and it's still growing.

PC gaming has been going on since the 90's. If it's still "growing" then it's going nowhere.

JouniS said:
While closed platforms such as consoles and mobile are good for the more consumer-oriented aspects of gaming, open platforms are better for more creative aspects such as modding and indie games.

Which is a niche. And BTW what you call a closed platform is being used by many pro gamers in competitions.

JouniS said:
PC gaming is also where the future game developers come from.

So? And the future isn't about PC gaming anymore, it's about consoles.

JouniS · Apr 13, 2021

Maconplasma said:
The post I quoted wasn't talking about recent games was it? No. It was talking about supporting games that are 10-15 years old which makes ZERO sense for today's OS's to support other than by use of an emulator that runs on a current OS. Not a thing you stated in this post had anything to do with recent games.

10-15 years ago is still in the recent past. Single-core CPU performance has grown very slowly since the mid-2000s, which is why we can't use emulators for games from the late 2000s yet.

Maconplasma said:
So? And the future isn't about PC gaming anymore, it's about consoles.

That claim was far more believable in the mid-2000s, when single-core performance growth was slowing down, consoles were based on advanced multicore architectures, and powerful GPUs were not yet a thing. Since then, PC gaming and console gaming have grown more or less at the same pace on the average.

Sydde · Apr 13, 2021

Man, I would never think of doing heavy gaming on a computer. That stuff belongs on a Xiintendationboy, where it can be optimized for the device. Computers are for doing work or playing Minesweeper. And some 90% of people can ditch the tower or notebook for a tablet. The landscape has changed.

Gerdi · Apr 13, 2021

leman said:
I sure hope not, otherwise they would probably not let me teach this stuff...

I will give a few pointers below.

What I don't understand is why asymptotic complexity is even remotely relevant for this discussion. Decoding a stream of instructions or in fact scanning any kind of sequence is always going to be O(n) at best — simply because you have to examine every element of the sequence regardless of what you do.

Thats just wrong - of course it depends what you understand under "scanning". In a general computation model you calculate your output from the input - there is no notion of scanning. First of all, a parallel machine model or uniform circuit has essentially parallel access to the input - there is no notion of going sequentially through your input. Of course the algorithm (or function) itself might have dependencies, which might force you to sequentialize some computations - and that is precisely what we are talking about here.

Coming back to what you trying to say above, there are some theorems about lower limits if we remove the notion of scanning. One of them states that a function, which needs all inputs to calculate the output cannot be in NC^0 and must be at least in NC^1. (informally NC^x = O(log^x n) depth on a circuit with bounded fan-in using standard gates). As lecture i suggest to read up on the classes NC, AC, TC - but i guess Nicks Class (NC) is the most relevant for the discussion.

What we are discussing instead is how to reduce the constant factor. Processing multiple bytes simultaneously is an obvious approach. It won't change the asymptotic complexity, but it will improve the performance by a factor of X.

Of course it is harder to speed up parsing a variable-length code by parallelization, but it's not impossible. There is this popular view that since x86 ISA is variable-length, the decoder has to process each instruction on bey one (mind, I am not implying that you subscribe to this view).

You should convince yourself, that when talking about NC, we are already talking about the best case parallel implementation - indeed i am not subscribing to such mundane arguments

A block of k bytes can be preprocessed in constant time to detect the offsets of each individual instruction, and each of these instructions can be decoded in parallel by a separate decoder. This will obviously add some latency and considerable hardware complexity to the implementation, and it won't reduce the asymptotic complexity, but it will improve the decoding performance.

As you noted correctly the O(n) nature is something you cannot escape by any clever implementation - and i am sure the current 4-wide decode in x86 is already extremely clever and it gets increasingly harder to "hide" the O(n) nature of the problem as you go wider. This is opposed to ARM (or any other fixed length ISA), where the nature of the problem is O(1).

Backends seem rather similar to me (although Apple does have more integer ports and it is also possible that their EUs are in general more capable)

Cortex X1: https://fuse.wikichip.org/news/3543/arm-cortex-x1-the-first-from-the-cortex-x-custom-program/
A14/M1: https://www.anandtech.com/show/16226/apple-silicon-m1-a14-deep-dive/2

The argument would be that building wide cores is difficult because ILP in real world code is inherently limited by dependencies, control flow and things like that. When you look at modern high performance computing, be it Zen or Intel cores, or X1, they seem to converge at roughly comparable backend sizes. It's quite possible that going wider would be subject to diminishing returns. But Apple can go wider and also maintain very high retire rate. Maybe their branch predictor is just that good?

But anyway, I am just grasping at straws here. I don't really insist that this argument has much merit. I'm just curious about these things.

Right, ILP is limited but on the other hand it is very application dependent. But we should separate concerns here. So even if you predict branches 100% correctly you still have the same ILP limit.
Outside of the theoretical ILP limit, you have just limits on how the ISA allows you to present the parallelism. All the parallelism a backend based on dynamic scheduling (aka OOE) architecture is able to extract is based on observing register dependencies. Therefore register-renaming plays a huge role, as it removes anti-dependencies between instructions. Anyway, size of your architectural register file also limits how parallelism can be presented. If the compiler ever comes into a situation, where it needs to emit spilling code because running out of architectural registers - register renaming does not help you anymore - the potential parallelism is gone. The conclusion here would be, that with ARM you can potentially extract more ILP due to the larger GP register file compared to x64. And while you can never pass the inherent ILP limit - there still is an ISA dependency here - namely architectures with larger GP register files do have an advantage.

That having said, i truly believe, if ARM decides to make the cores the same size as Firestorm, they would achieve very similar IPC - contrary to the x64 competition. In some cases ARM is very explicit, they say, when we would have increased feature xyz by this amount, they would have gained uvw amount of performance - so "we are not doing this". The X1 is the first ARM core, where ARM is somewhat deviating from these considerations.[/quote]

Maconplasma · Apr 13, 2021

JouniS said:
10-15 years ago is still in the recent past.

That's totally awesome. ?

JouniS said:
Single-core CPU performance has grown very slowly since the mid-2000s, which is why we can't use emulators for games from the late 2000s yet.

You should do a pole to see how many people here have the same interest as you in expecting the latest OS's to support games that niche group only cares about.

JouniS said:
That claim was far more believable in the mid-2000s, when single-core performance growth was slowing down, consoles were based on advanced multicore architectures, and powerful GPUs were not yet a thing. Since then, PC gaming and console gaming have grown more or less at the same pace on the average.

Any proof of fact to backup such a statement?

JouniS · Apr 13, 2021

Maconplasma said:
Any proof of fact to backup such a statement?

Here is one example: https://www.visualcapitalist.com/50-years-gaming-history-revenue-stream/

In the mid-2000s, PCs were stagnating, while consoles used new architectures not limited by the PC legacy. It made sense to believe that the end of the PC as a gaming platform was near, and that the PC would go the way of C64 and Amiga. Today, consoles are simply PC hardware in a slightly more integrated package, and the fates of the platforms are tied.

Maconplasma · Apr 13, 2021

JouniS said:
Here is one example: https://www.visualcapitalist.com/50-years-gaming-history-revenue-stream/

In the mid-2000s, PCs were stagnating, while consoles used new architectures not limited by the PC legacy. It made sense to believe that the end of the PC as a gaming platform was near, and that the PC would go the way of C64 and Amiga. Today, consoles are simply PC hardware in a slightly more integrated package, and the fates of the platforms are tied.

That's a very good graph you linked to and your point is well received. I have the utmost respect for people who backup their statements so kudos for that!!
The only part of the equation that is not included (and for good reason) is Mac gaming, reason being is that people don't buy Macs to game. Macs are not a general purpose box that you install the OS of your choice, they are designed to run MacOS which is not a platform that is taken seriously when it comes to gaming. I actually have a fantastic emulator that I use on my Mac to play games from the 90's but call me when the latest COD and Mortal Kombat are written for MacOS natively, because this discussion is about M1 Macs.

cmaier · Apr 13, 2021

Gerdi said:
I will give a few pointers below.

Thats just wrong - of course it depends what you understand under "scanning". In a general computation model you calculate your output from the input - there is no notion of scanning. First of all, a parallel machine model or uniform circuit has essentially parallel access to the input - there is no notion of going sequentially through your input. Of course the algorithm (or function) itself might have dependencies, which might force you to sequentialize some computations - and that is precisely what we are talking about here.

Coming back to what you trying to say above, there are some theorems about lower limits if we remove the notion of scanning. One of them states that a function, which needs all inputs to calculate the output cannot be in NC^0 and must be at least in NC^1. (informally NC^x = O(log^x n) depth on a circuit with bounded fan-in using standard gates). As lecture i suggest to read up on the classes NC, AC, TC - but i guess Nicks Class (NC) is the most relevant for the discussion.

You should convince yourself, that when talking about NC, we are already talking about the best case parallel implementation - indeed i am not subscribing to such mundane arguments

As you noted correctly the O(n) nature is something you cannot escape by any clever implementation - and i am sure the current 4-wide decode in x86 is already extremely clever and it gets increasingly harder to "hide" the O(n) nature of the problem as you go wider. This is opposed to ARM (or any other fixed length ISA), where the nature of the problem is O(1).

Right, ILP is limited but on the other hand it is very application dependent. But we should separate concerns here. So even if you predict branches 100% correctly you still have the same ILP limit.
Outside of the theoretical ILP limit, you have just limits on how the ISA allows you to present the parallelism. All the parallelism a backend based on dynamic scheduling (aka OOE) architecture is able to extract is based on observing register dependencies. Therefore register-renaming plays a huge role, as it removes anti-dependencies between instructions. Anyway, size of your architectural register file also limits how parallelism can be presented. If the compiler ever comes into a situation, where it needs to emit spilling code because running out of architectural registers - register renaming does not help you anymore - the potential parallelism is gone. The conclusion here would be, that with ARM you can potentially extract more ILP due to the larger GP register file compared to x64. And while you can never pass the inherent ILP limit - there still is an ISA dependency here - namely architectures with larger GP register files do have an advantage.

That having said, i truly believe, if ARM decides to make the cores the same size as Firestorm, they would achieve very similar IPC - contrary to the x64 competition. In some cases ARM is very explicit, they say, when we would have increased feature xyz by this amount, they would have gained uvw amount of performance - so "we are not doing this". The X1 is the first ARM core, where ARM is somewhat deviating from these considerations.

In the late 90’s, it seems that the distinguishing feature of every architecture was its register file - its size, it’s characteristics (sparc register windows!), etc. In the early 2000’s, the big problem in every implementation was how many ports you could get on the register file (and/or the CAM necessary for the register renamer/reservation stations). It was all about register files.

Seems like now we’re past that, but I have to think that if it were me defining a new architecture, I would install a ton of registers, and eat the context-switch penalty.

crazy dave · Apr 13, 2021

JouniS said:
10-15 years ago is still in the recent past. Single-core CPU performance has grown very slowly since the mid-2000s, which is why we can't use emulators for games from the late 2000s yet.

I agree that gaming, including PC gaming, is huge. However I disagree with this contention. You absolutely can use emulation/translation for PC games from even a few years ago. Single core evolution is not *that* slow and translation/emulation is surprisingly good. (Edit: and, up-to—a-point, GPUs tend to be the bigger bottleneck.) Technologies like WINE (and of course DOSBox for really old games) are great for preservation and sometimes the Mac/Linux Wine port run better than on modern Windows for older games. Even Windows’ backwards compatibility only goes so far.

Further we’ve seen in games that are universal on the M1, the Rosetta penalty is not that high. So provided MS does a decent job with their version of Rosetta, once the hardware gets good enough, Windows ARM should be just as good for native games (if not better) and playable for x86 translations ... except maybe x86 esports requiring 144 FPS.

Maconplasma said:
That's a very good graph you linked to and your point is well received. I have the utmost respect for people who backup their statements so kudos for that!!
The only part of the equation that is not included (and for good reason) is Mac gaming, reason being is that people don't buy Macs to game. Macs are not a general purpose box that you install the OS of your choice, they are designed to run MacOS which is not a platform that is taken seriously when it comes to gaming. I actually have a fantastic emulator that I use on my Mac to play games from the 90's but call me when the latest COD and Mortal Kombat are written for MacOS natively, because this discussion is about M1 Macs.

I think we’re also talking about the larger context of ARM vs x86 and Intel’s future. So in that context this discussion makes sense though I disagree with @JouniS that translation or emulation costs would be too high going from x86 to ARM. We’ve already seen evidence that games from a few years ago can be played on the M1 through Wine (and thus Rosetta too!) and the M1’s GPU, as good as it is for an iGPU, is still often the bottleneck, not the CPU (and thus not the emulation/translation).

Joelist · Apr 13, 2021

I do wonder how short the pipe is on Firestorm. We know it is extremely "fat" - as in it can process twice the number of instructions at a time than anything else. If it is also short (not a lot of stages) that would be really interesting.

In fact, it would remind me of when Intel inflicted the Pentium 4 on us (nicknamed the Death Star for its horrid thermal properties) and how things were shifted seismically when the Banias microarchitecture was rolled out, with a pipe both fatter and much shorter than the Pentium 4. The resulting Pentium M traded some graphics performance for incredible I/O performance and superior (at the time) PPW.

Gerdi · Apr 13, 2021

cmaier said:
In the late 90’s, it seems that the distinguishing feature of every architecture was its register file - its size, it’s characteristics (sparc register windows!), etc. In the early 2000’s, the big problem in every implementation was how many ports you could get on the register file (and/or the CAM necessary for the register renamer/reservation stations). It was all about register files.

Seems like now we’re past that, but I have to think that if it were me defining a new architecture, I would install a ton of registers, and eat the context-switch penalty.

Agreed - context switch penalty should be a non issue compared to the gains. And since nowadays you have much more register ports than load/store* slots you have much bigger opportunities for parallel execution if you manage to reduce your load/stores, which in turn requires a large number of architectural registers.

*There are additional reasons load/stores are bad, as they can generate side effects. So you have to pessimistically order your load and stores, in particular if the memory model is not weakly ordered.

If you would ask me what the inherent disadvantage of x64 compared to ARM is, i would name 3 things: variable length instructions, smaller architectural register file, no weakly ordered memory model. As you see, non of the arguments has anything to do with the reduced or complex nature of the ISA.

cmaier · Apr 13, 2021

Joelist said:
I do wonder how short the pipe is on Firestorm. We know it is extremely "fat" - as in it can process twice the number of instructions at a time than anything else. If it is also short (not a lot of stages) that would be really interesting.

In fact, it would remind me of when Intel inflicted the Pentium 4 on us (nicknamed the Death Star for its horrid thermal properties) and how things were shifted seismically when the Banias microarchitecture was rolled out, with a pipe both fatter and much shorter than the Pentium 4. The resulting Pentium M traded some graphics performance for incredible I/O performance and superior (at the time) PPW.

Likely the pipelines aren’t very deep. Hard to get away with wide and deep, because that explodes the number of in flight instructions that can cause interdependencies, and the hardware necessary to Keep track of it all. Not to mention that risc machines rarely require more than fetch, decode, issue, execute, and retire. Mostly 1 cycle each.

Joelist · Apr 13, 2021

cmaier said:
Likely the pipelines aren’t very deep. Hard to get away with wide and deep, because that explodes the number of in flight instructions that can cause interdependencies, and the hardware necessary to Keep track of it all. Not to mention that risc machines rarely require more than fetch, decode, issue, execute, and retire. Mostly 1 cycle each.

Kinda what I am suspecting but it is interesting. Pentium 4 like I mentioned had a very deep pipe and the resulting frequencies were high AND thermals were terrible.

JouniS · Apr 13, 2021

crazy dave said:
Further we’ve seen in games that are universal on the M1, the Rosetta penalty is not that high. So provided MS does a decent job with their version of Rosetta, once the hardware gets good enough, Windows ARM should be just as good for native games (if not better) and playable for x86 translations ... except maybe x86 esports requiring 144 FPS.

Actual emulation is harder than what Rosetta is doing. Rosetta basically takes a 64-bit binary that would run on an Intel Mac with Big Sur and transforms it into a native ARM binary that runs on an M1 Mac. It doesn't even bother dealing with 32-bit binaries compiled in 2007 for Leopard or Windows XP/Vista. An emulator would have to emulate hardware/OS behavior and features not available on the host platform, which often reduces the performance to that of interpreted code.

i hope intel comes back to beat the m1

Suspended

macrumors 6502

macrumors 68030

macrumors 604

macrumors 601

Contributor

Cancelled

macrumors 6502a

macrumors 6502a

macrumors 6502a

macrumors 6502a

Cancelled

macrumors 6502a

macrumors 68030

macrumors 6502

Cancelled

macrumors 6502a

Cancelled

Suspended

macrumors 65816

macrumors 6502

macrumors 6502

Suspended

macrumors 6502

macrumors 6502a

Our Staff