i hope intel comes back to beat the m1

Gerdi · Apr 18, 2021

bobcomer said:
That's really low balling it compared to my experience with Windows VM's on Windows, more like 80-90%. And maybe higher on some things. As long as you allocate enough RAM and have enough cores. I really don't see any difference on x86 vs x64 on Windows.

I was not comparing x86 vs x64 on an x64 machine, but x86 emulation vs x64 emulation vs. native on an ARM64 Windows machine.
I did not comment on virtualization penalty, because i do not know - because i only can do measurements on a Surface Pro X.
My Mac is an x64 machine.

bobcomer · Apr 18, 2021

bobcomer said:
That's really low balling it compared to my experience with Windows VM's on Windows, more like 80-90%.

One thing I'd add, I wasn't thinking of video performance, that, of course would be a lot worse. Only the parent partition would get full advantage of the hardware. Though remoting into a VM is pretty much the same as remoting into the host.

For the best Hyper-V performance for non server OS's, turn off Hyper-V's virtual memory, it's targeted for server loads and would only slow things down for anything else. Allocate as much RAM to the VM as you would if it were a physical machine. Play with the core settings, as that doesn't always follow logically what's going to be best.

And the most important thing, don't starve the parent partition for cores or RAM, since that's where all the I/O happens with the real hardware.

bobcomer · Apr 18, 2021

Gerdi said:
I was not comparing x86 vs x64 on an x64 machine, but x86 emulation vs x64 emulation vs. native on an ARM64 Windows machine.
I did not comment on virtualization penalty, because i do not know - because i only can do measurements on a Surface Pro X.
My Mac is an x64 machine.

That's fair. I was comparing it to running Windows on Windows, so my expectations and experience really don't fit, but 60-75% wouldn't be good enough for me. (except for very occasional work, which is all I really expected on my own usage on my MBA for Windows stuff.)

The things that don't work at all and the fact that there's no licensing for WOA yet, that's more important and we really shouldn't have a performance discussion until that happens. And to see what the M1's next version might do.

Gerdi · Apr 18, 2021

bobcomer said:
but 60-75% wouldn't be good enough for me. (except for very occasional work, which is all I really expected on my own usage on my MBA for Windows stuff.)

This does not make much sense at all. An M1 even running at only 70% of native speed is faster then many Intel machines. If this is not enough for you i could conclude, that a few year ago there was not even a machine existing, which would fit your requirements. I could also conclude, that you should stay away from Rosetta 2 and wait for native versions of the programs you use - because less is not acceptable.
To be honest, i have a hard time even imagining a case, where 100% speed would be fine and 70% is a showstopper - and that the magic 100% is by chance precisely the native speed of the M1 - what a coincidence!!!

bobcomer · Apr 18, 2021

Gerdi said:
This does not make much sense at all. An M1 even running at only 70% of native speed is faster then many Intel machines. If this is not enough for you i could conclude, that a few year ago there was not even a machine existing, which would fit your requirements. I could also conclude, that you should stay away from Rosetta 2 and wait for native versions of the programs you use - because less is not acceptable.
To be honest, i have a hard time even imagining a case, where 100% speed would be fine and 70% is a showstopper - and that the magic 100% is by chance precisely the native speed of the M1 - what a coincidence!!!

I use VM's all the time in my work, and the bar for performance is moving all the time with each hardware upgrade. And I can tell you the performance of the M1 isn't faster than the machines I am using.

As for Rosetta, nah, I don't use MacOS for work, so my needs aren't the same for it. Yes, I want to be able to do everything I need to do on my MBA, but I'm more likely not to use it at all than to just ignore Rosetta and put up with WOA.

Maconplasma · Apr 18, 2021

Ratsaremyfreinds said:
i dont understand why anyone would not want intel to compete with apple. well if you had applle stock id understand. but other wise i want intel and amd to keeep apple on there toes

LMAO ! It's Intel and AMD that need to be kept on THEIR toes. Very strange post. You sound like Apple is the one that is slouching when they are the one that's got Intel crying like child.

thedocbwarren · Apr 18, 2021

Ratsaremyfreinds said:
i dont understand why anyone would not want intel to compete with apple. well if you had applle stock id understand. but other wise i want intel and amd to keeep apple on there toes

Not the same market and product. Apple doesn't sell Apple Silicon chips for use by third party.

cmaier · Apr 18, 2021

thedocbwarren said:
Not the same market and product. Apple doesn't sell Apple Silicon chips for use by third party.

I guess his point is that since Apple competes with Intel’s customers, it’s good for Intel (and AMD) to do well, because then Apple, in order to compete with those customers, will have to continue to excel in its own efforts at CPU design.

Of course, Apple has not traditionally been the sort of company that needs external pressure in order to drive its own roadmap forward, but I guess it couldn’t hurt.

quarkysg · Apr 18, 2021

cmaier said:
Of course, Apple has not traditionally been the sort of company that needs external pressure in order to drive its own roadmap forward, but I guess it couldn’t hurt.

Completely agree.

And if Intel/AMD or which ever company comes up with a killer CPU/architecture that allows Apple to deliver their next product vision that their own solution could not, I'm pretty sure Apple will plan for the transition. I suppose the more you do it, the easier it gets? Heh heh.

thedocbwarren · Apr 18, 2021

cmaier said:
I guess his point is that since Apple competes with Intel’s customers, it’s good for Intel (and AMD) to do well, because then Apple, in order to compete with those customers, will have to continue to excel in its own efforts at CPU design.

Of course, Apple has not traditionally been the sort of company that needs external pressure in order to drive its own roadmap forward, but I guess it couldn’t hurt.

Agreed.

leman · Apr 18, 2021

Gerdi said:
This is an assumption not a fact. And it is not even true either.
The short answer as to why this is the case is, that a user mode multi-threaded application typically does not have any particular requirements on memory ordering outside of synchronization.

It's a potentially upsetting bug waiting to happen. Sure, trivial programs won't rely on specific CPU behavior here, but the compiler makes specific assumptions and if the emulation environment does not honor these assumptions, you are looking at crashing applications or corrupted data where you least suspect.

"Close your eyes and hope it doesn't happen" is not a robust software development principle. I hope that Microsoft will make use of switchable memory ordering modes on Apple Silicon.

Gerdi said:
You also need to distinguish x64 and x86 emulation. Since x86 emulation being inherently harder than x64 emulation, the emulation performance of x64 hence is significantly faster.

Can you elaborate a bit on this? Why is x86 emulation faster (beyond the need of emulating system DLLs, which will definitely get a significant perf hit)? I am not doubting what you are saying, just want to understand it better.

cmaier · Apr 19, 2021

leman said:
It's a potentially upsetting bug waiting to happen. Sure, trivial programs won't rely on specific CPU behavior here, but the compiler makes specific assumptions and if the emulation environment does not honor these assumptions, you are looking at crashing applications or corrupted data where you least suspect.

"Close your eyes and hope it doesn't happen" is not a robust software development principle. I hope that Microsoft will make use of switchable memory ordering modes on Apple Silicon.

Can you elaborate a bit on this? Why is x86 emulation faster (beyond the need of emulating system DLLs, which will definitely get a significant perf hit)? I am not doubting what you are saying, just want to understand it better.

By design, we got rid of a lot of the weird corner cases when extending to 64-bit, and threw away a bunch of complicated instruction patterns. We also don’t have to deal with things like instructions that mix and match between 8, 16 and 32 bit operands, etc. And there is no modifying code pages in 64-bit (no self-modifying code) So it’s definitely true that, if we are talking emulation, limiting to 64 bit mode would be much easier. As for translation, while it would take longer to perform the initial translation when you are translating x86 (as opposed to x86-64), I’m not sure the generated code would run more slowly (putting aside situations with self-modifying code).

fwilers · Apr 19, 2021

crevalic said:
Wow, multiple massive inaccuracies in only 2 points, congrats.

1) Intel invested a massive amount of resources into the Atom development starting in 2004, years before smartphones ever appeared. They developed 1-3W cores, while standard low power laptop chips were in the 35W+ range and were initially very competitive on the market. However, internal struggles resulted in an effective abandonment of this segment. Firstly, the Atom was competing in a lower margin, low unit cost market with more competition than the x86 market (at Atom's release in 2008, Intel already regained leadership ahead of AMD). At that point, Intel sold everything they could make and it made sense to use fab capacity on the expensive parts where they were free to set the price. Secondly, since Intel's approach was so profitable (and, to be honest, still is), many people inside the company were against "rocking the boat" and risking their high margin, high-performance cash cows, especially when netbooks exploded in popularity.

2) This is just so wrong, like, you really need to know absolutely nothing about this field to be able to come up with this. In real life, Intel wanted to move on from x86 and invested absolutely monstrous resources, including the largest design team in history, to create a completely new architecture called IA-64 (also known as Intel Itanium). IA-64 was not compatible with x86/32-bit instructions, breaking compatibility with all older software, and would get rid of pesky x86 competition like AMD, if successful. Instead, AMD went the opposite way and extended the x86 instructions set with 64-bit support, enabling backwards compatibility. In the end, Intel was forced to effectively write off the investment in IA-64 and license x86-64 from AMD.

He wasn't wrong though. You just reiterated what he said with examples.
Just because Intel tried something, failed, and gave up, doesn't change what he wrote as true.

They could have had huge success with IA-64, Atom, and numerous other products they no longer sell. I think the latest dead tech is their Optane ssd's.
In the case of IA-64, you needed killer software that could perform. Not just perform, but also had support. Intel didn't provide any of that and just hoped people would learn.
Atom could have evolved into a killer product too. But it was too slow and consumed way too much power for it's instruction set. They also limited connections, memory, bus speed, etc way too much for what customers wanted.
Optane, fantastic technology. Way too expensive. People will wait 6 months to get something 1/2 as cheap with 95% of the performance.

The problem is, they want huge corporations to buy this stuff at ridiculous prices to compensate for r&d. That would be fine, except as a business buying Intel products, you need ROI. Intel couldn't provide that. And retail corporations building for customers knew better what their customers wanted than Intel.

So in the mean time, yes, they doubled down on x86 chips, because that's what they are known for. But 14+, then 14++, 14+++, 14++++, isn't exactly exciting or innovative. It's all they had.

Gerdi · Apr 19, 2021

leman said:
It's a potentially upsetting bug waiting to happen. Sure, trivial programs won't rely on specific CPU behavior here, but the compiler makes specific assumptions and if the emulation environment does not honor these assumptions, you are looking at crashing applications or corrupted data where you least suspect.

I was afraid, that i would have to give a longer answer

First, it is absolutely not about a triviality of the program. I mean programs of any complexity. And it is not a compiler issue either, as (ARM64 C/C++) compilers do not inject barriers in the code. The key observation here is, that if your program do not synchronize data access with other threads, you will corrupt you data as well - so memory ordering alone without synchronization would not get you very far. The other key observation is, that ordering is sequentially consistent for the same core even on ARM.
And when this is understood, it suddenly gets clear, why you can just compile C/C++ code for ARM64 without any additional barriers - it is the very same reason a x86-to-ARM64 JIT engine for user mode code would normally not have to add additional barriers. The required barriers are all implemented in the kernel - and since the kernel is ARM64 it is supposed to have the barriers. (We would have a totally different discussion, if we were talking kernel emulation)

It is possible to write programs which are misbehaving. I am talking about programs, which are trying to write their own synchronization routines without relying on synchronization primitives of the kernel. These programs would fail for both, recompilation for ARM64 as well as MS JIT.

That having said, i have yet to encounter a SW project, which is misbehaving in this regard. So i am sure, thats an extremely rare issue. If you ever encounter such a program, Windows allows you to change the emulation settings for this particular program.

Can you elaborate a bit on this? Why is x86 emulation faster (beyond the need of emulating system DLLs, which will definitely get a significant perf hit)? I am not doubting what you are saying, just want to understand it better.

I did say x64 emulation is faster. There are several reasons - x64 ISA is in many regards cleaner and has less corner cases you need to cope with compared to x86 ISA ... i assume @cmaier can elaborate more on this. And then as important, x64 code uses much more registers and relies much less on memory operands - as consequence you can much more trivially generate good ARM64 code out of x64 code.

theotherphil · Apr 19, 2021

UBS28 said:
AMD is the one that is going to beat M1, because they can easily switch to 5nm too once TSMC has enough capacity to produce those chips. And they have much greater software compatiblity.

I honestly would have wished that Apple would have gone to AMD, to ensure maximum software compatability

Apple’s priority is not performance - it is efficiency and compatibility with its iOS line. An extra 200 points on Geekbench will hardly be noticed by your average joe but 20hrs battery life most definitely will.

What the M1 has been showing is consistent performance plugged in or not. Pretty much every Ryzen/ Intel laptop has a significant performance drop when unplugged. This means the M1 actually ends up ahead of everything when it’s used in its mobile configuration (unplugged).

leman · Apr 20, 2021

Gerdi said:
I was afraid, that i would have to give a longer answer
First, it is absolutely not about a triviality of the program. I mean programs of any complexity. And it is not a compiler issue either, as (ARM64 C/C++) compilers do not inject barriers in the code. The key observation here is, that if your program do not synchronize data access with other threads, you will corrupt you data as well - so memory ordering alone without synchronization would not get you very far. The other key observation is, that ordering is sequentially consistent for the same core even on ARM.
And when this is understood, it suddenly gets clear, why you can just compile C/C++ code for ARM64 without any additional barriers - it is the very same reason a x86-to-ARM64 JIT engine for user mode code would normally not have to add additional barriers. The required barriers are all implemented in the kernel - and since the kernel is ARM64 it is supposed to have the barriers. (We would have a totally different discussion, if we were talking kernel emulation)

I was thinking about multi-threaded programs that rely on lock-free data structures. Translating such code directly from x86 to ARM will almost inevitably introduce race condition bugs. The end result is the same as misusing atomic operations in you code (e.g. specific case for C++ that will be commonly used to write such code: semantics, implementation for various CPUs). I mean, you don't really need to go that far for an example — if you have concurrent programs that rely on reference counting to release resources (a fairly common technique), naive x86-to-ARM translation will lose the acquire-release semantics necessary for correct execution.

But you are right that most applications probably won't be affected by this and even those that are will probably work fine most of the time. It's a pragmatic solution, albeit technically incorrect one. But then again, MS kind of has a history of cutting corners when it comes to correctness.

Gerdi said:
I did say x64 emulation is faster. There are several reasons - x64 ISA is in many regards cleaner and has less corner cases you need to cope with compared to x86 ISA ... i assume @cmaier can elaborate more on this. And then as important, x64 code uses much more registers and relies much less on memory operands - as consequence you can much more trivially generate good ARM64 code out of x64 code.

This and what @cmaier wrote makes a lot of sense to me.

UBS28 · Apr 20, 2021

theotherphil said:
Apple’s priority is not performance - it is efficiency and compatibility with its iOS line. An extra 200 points on Geekbench will hardly be noticed by your average joe but 20hrs battery life most definitely will.

What the M1 has been showing is consistent performance plugged in or not. Pretty much every Ryzen/ Intel laptop has a significant performance drop when unplugged. This means the M1 actually ends up ahead of everything when it’s used in its mobile configuration (unplugged).

My priority is having software that works. With AMD, we still would allow us to use bootcamp and 32-bit OSX.

If I need 20 hour of battery life, it means it is very light tasks and I can do that on my iPhone 11 Pro Max on the go.

The thing that worries me about the future of Mac is, the software that is available when Rosetta 2 will be removed from OS X (it is going to happen in a few years).

leman · Apr 20, 2021

UBS28 said:
My priority is having software that works. With AMD, we still would allow us to use bootcamp and 32-bit OSX.

Wait, is your priority software that works or backwards compatibility? These are not the same thing. My M1 is perfectly capable of running any software I want and almost everything I use is already native. If you care about running "old" programs, then I am surprised you are even using a Mac — Apple's record was never particularly stellar in this regard.

BigPotatoLobbyist · Apr 20, 2021

Bruninho said:
I would never go to games through streaming. I like to own the games I buy and play. I do not like subscription based games and applications. I've stopped buying them around 2014.

The only subscription I pay for is Apple Music. But for a good reason - I can quickly have a carefully curated music archive across all my devices synced, and the songs can still be played when I am offline. They day they remove this feature, it will be the day I will cancel my subscription and go back to a carefully curated archive made from the good old MP3 music collection. The headache of having to sync manually across all my devices is worth the money I can save cancelling the subscription if this happens one day.

Streaming is cancer. If songs were 50 cents I’d probably just buy the 700 missing outright ones I want and be done, ****ing loathe the digital serfdom economy

BigPotatoLobbyist · Apr 20, 2021

UBS28 said:
My priority is having software that works. With AMD, we still would allow us to use bootcamp and 32-bit OSX.

If I need 20 hour of battery life, it means it is very light tasks and I can do that on my iPhone 11 Pro Max on the go.

The thing that worries me about the future of Mac is, the software that is available when Rosetta 2 will be removed from OS X (it is going to happen in a few years).

I’m more worried about Apple not maintaining the damn quality of mac os. Since 2013, I recall Windows being faster on similar hardware, even after fresh installs on both or programs closed. IMO people underestimate the benefit of having a wide use base - if anyone’s “optimizing” (nebulous term but I digress) it’ll be companies like MS with Windas or Google and Chrome OS, due to shittops. Not Apple. Mac OS = a pig

Kung gu · Apr 20, 2021

BigPotatoLobbyist said:
I’m more worried about Apple not maintaining the damn quality of mac os. Since 2013, I recall Windows being faster on similar hardware, even after fresh installs on both or programs closed. IMO people underestimate the benefit of having a wide use base - if anyone’s “optimizing” (nebulous term but I digress) it’ll be companies like MS with Windas or Google and Chrome OS, due to shittops. Not Apple. Mac OS = a pig

On M1 Macs, Mac OS feels like a totally different OS. It's smooth and fast and snappy, which is completely different to what I have experienced on my 16" intel MacBook Pro. On my 16" Mac OS is slow, animations lag and overall its not as smooth as M1 Macs. I went my local retailer to check the M1 out and to see if they are better and oh boy night and day difference between the M1 and my 16" MBP.

UBS28 · Apr 20, 2021

leman said:
Wait, is your priority software that works or backwards compatibility? These are not the same thing. My M1 is perfectly capable of running any software I want and almost everything I use is already native. If you care about running "old" programs, then I am surprised you are even using a Mac — Apple's record was never particularly stellar in this regard.

“old programs”?

My access virus TI is a professional tool that has been used in countless top 40 hits and it is still one of the best sythesizers money can buy.

I basically now have to use Windows for it to work. Apple should remove “Pro” from their computers as they have been killing professional equipment support.

I will still get a M2X 16” MBP, but it will not be my main computer.

leman · Apr 20, 2021

UBS28 said:
“old programs”?

My access virus TI is a professional tool that has been used in countless top 40 hits and it is still one of the best sythesizers money can buy.

I basically now have to use Windows for it to work. Apple should remove “Pro” from their computers as they have been killing professional equipment support.

I will still get a M2X 16” MBP, but it will not be my main computer.

That is a very strange perspective. Why don't you complain to the developer of this tool that they are not keeping up with the advances in computing and not updating their software to work with the new technology?

Again, if you are relying on legacy software that makes certain hardware assumptions, then you have to use appropriate hardware. I don't think that a hardware manufacturer can be made responsible for policies of a individual software maintainer.

Talking about pro software, I do believe that I classify as a "pro" user and M1 Macs offer tremendous value for money in areas I care about. When comes to data analysis or compiling software, they are extremely performant and responsive.

Gerdi · Apr 20, 2021

leman said:
I was thinking about multi-threaded programs that rely on lock-free data structures. Translating such code directly from x86 to ARM will almost inevitably introduce race condition bugs. The end result is the same as misusing atomic operations in you code (e.g. specific case for C++ that will be commonly used to write such code: semantics, implementation for various CPUs). I mean, you don't really need to go that far for an example — if you have concurrent programs that rely on reference counting to release resources (a fairly common technique), naive x86-to-ARM translation will lose the acquire-release semantics necessary for correct execution.

Not really. lock-free data structure implementations including your reference counter example using atomic operations, which are translated with required acquire-release semantics by the Microsoft emulator. I am going to be more concrete.

A reference counter is incremented by following atomic (C++ 14)

std::atomic<int> ref_cnt = { 0 };
ref_cnt.fetch_add(1);

This is translated into:

(x64)
lock xadd DWORD PTR ?ref_cnt@@3U?$atomic@H@std@@A, eax ; ref_cnt

(arm64)
|label|
ldaxr w9,[x11]
add w9,w9,#1
stlxr w8,w9,[x11]
cbnz w8,|label|
dmb ish

Indeed the Microsoft emulator translating this correctly with necessary acquire and release semantics.

But you are right that most applications probably won't be affected by this and even those that are will probably work fine most of the time. It's a pragmatic solution, albeit technically incorrect one. But then again, MS kind of has a history of cutting corners when it comes to correctness.

As i said, they did not cut corners at all - otherwise they would not even have bothered implementing different translation strategies, when it comes to memory ordering.

pasamio · Apr 20, 2021

leman said:
That is a very strange perspective. Why don't you complain to the developer of this tool that they are not keeping up with the advances in computing and not updating their software to work with the new technology?

Too many developers take the Microsoft approach.

i hope intel comes back to beat the m1

macrumors 6502

macrumors 601

macrumors 601

macrumors 6502

macrumors 601

Cancelled

macrumors 6502

Suspended

macrumors 65816

macrumors 6502

macrumors Core

Suspended

macrumors member

macrumors 6502

macrumors 6502a

macrumors Core

macrumors 68030

macrumors Core

macrumors 6502

macrumors 6502

Suspended

macrumors 68030

macrumors Core

macrumors 6502

macrumors 6502

Our Staff