Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

jdb8167

macrumors 601
Nov 17, 2008
4,859
4,599
First, it very likely isn't a primary 'feature' . Microsoft isn't going to tweak their compatibility solution just for one vendors CPU implementation. If Arm specified it as an optional feature then maybe. But as a proprietary fork from the Arm standard it doesn't do much to create a competitive Arm implementation environment for Windows.

It should be a pretty easy tweak. If Microsoft detects a TSO mode, set the memory compatibility setting as low as it can go and turn it on.

The Arm Neoverse implementations that Amazon , Ampere Computing , etc. have released have zero TSO and yet have taken substantive share away from x86 servers. Amazon is up closing in 20+% of deployed servers with no TSO at all. The hype that they have to have this feature to win is mostly just Apple 'fan boy' talk. There is already concrete evidence that it isn't necessarily. It is a 'nice to have' , but it isn't necessary in the general market.

Server products have little reason to do x86 binary translations. This Qualcomm SoC is meant for consumer Windows PCs. Microsoft OEMs have a hard time selling them because x86 compatibility isn't that great.

Apple's policy of 'all or nothing" coupled with 'burn the bridge after march the army over the river" makes it a more necessary feature in the macOS space. That is largely driven by Apple killing x86 off as an option for macOS in the intermediate future. Arm is not going to kill off x86 in Windows ( or Linux ). The hard core, 'legacy x86 code at maximum speed' folks in the Windows space are just going to buy x86 Windows systems. Apple just doesn't even give that choice as an option. ( the Mac space is about having at least an order of magnitude less system choice options. )

I find it hard to believe that Microsoft doesn't think the highest x86 compatibility isn't a top priority. This is Microsoft who always values backwards compatibility more than Apple.

Second, the 'from scratch' is likely not entirely true. Part of the Arm law suite again Nuvia/Qualcomm is about the assistance of data and IP that Arm provided to Nuvia to jumpstart their efforts. That additionally grounds the reality here that these cores started off in a different target market. Being 'repurposed' into a alternative target market leaves even less time/effort/resources available for non critical features. If even up slower than x86 options on general code then TSO isn't going to 'buy' Qualcomm/Nuvia anything. And deploying something slower (or less competitive) than Neoverse v3 would be even worse. They need to get something out the door with as few distractions as possible.

Third, since were going to leave with a decent size block of Apple engineers , it was about 100% certain the Nuvia was going to get sued. If they7 tried to copy a feature that only Apple implemented, then Apple would be digging around in their company for far , far , far longer than they budget for ( Apple piling on lawyers means you also need lawyers. ). Apple's TSO came out of lots of data they collected on x86 processors. If Nuvia takes the data then they are shooting themselves in the foot. If they spend gobs of time recreating the data... they are burning money that they have relatively little of to spare.

This is a good point. Nuvia was building a server chip where I already acknowledged that x86 compatibility isn't particularly important. I guess I don't know enough about the process that Nuvia and then Qualcomm went through to get a final product. If the lawsuit continues then we may find out in the future.

In the Linux market ( i.e., high overlap with servers) lots of the applications are 'open source' which means it isn't all that hard to just recompile them and deploy on a new platform. (e.g., why Amazon and other hyperscalings are relatively rapidly deployed once at the Linux foundation work was done in the server space ( drivers , kernel, boot, common utilities , etc. ). Android market is minor variation on Linux. Same issue different day.

Linux is really not something that is relevant to this. The Qualcomm SoC is mostly targeted at Windows.

Windows on Arm primarily problem for the last 3-5 years has been native Arm performance; not 'emulation'.

A substantial amount of information about where fencing is/is not necessary is in the source code. Leverage that and don't need to rely on rewriting binaries where most of that has been thrown away. More performance will pull more software developers into doing the compile to Arm instead od doing a re-compile behind their back. ( Emulation is the wrong word connotation here. These are both running Arm binaries. It is varying on when and how much effort is put into the recompile time/resources. )


And Apple is likely also 'spraying' memory fences ... just behind the instruction decoder. That is why Rosetta cannot mix-and-match code.

Again, the issue is x86 compatibility not porting to ARM. (And yes emulation is the wrong word but I was using it as a shortcut instead of saying binary translation.) I doubt Microsoft is going to be any more successful at getting ARM ports than they have been in the past. I guess we'll see.

As for Apple spraying memory fences, I doubt it. Most acknowledge that Rosetta 2 is faster and more reliable than Microsoft's solution. Apple added TSO specifically so that no changes to the memory model of the x86 binaries was necessary.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
It should be a pretty easy tweak. If Microsoft detects a TSO mode, set the memory compatibility setting as low as it can go and turn it on.
RISC-V can switch from TSO mode to WMO mode and vice versa by changing one bit in some registers.
The dynamic-RVTSO behaviour is controlled by bit 8 (DTSO) of menvcfg, senvcfg, and henvcfg for all lower privilege levels.

What is the difference between running a multithreaded application in WMO mode and in TSO mode? Is instruction reordering in TSO mode more restricted than in WMO mode? What exactly has Apple done to improve the translation of x64 applications?

By the way, there is a paper (behind a paywall) that says that M1 Ultra on Asahi Linux using TSO is on average 9% slower than using WMO in the SPECspeed 2017 Floating Point suite.

1710194470528.png

Faster execution results in a higher score

 
Last edited:

leman

macrumors Core
Oct 14, 2008
19,521
19,674
What is the difference between running a multithreaded application in WMO mode and in TSO mode? Is instruction reordering in TSO mode more restricted than in WMO mode? What exactly has Apple done to improve the translation of x64 applications?

It’s not about restriction reordering, but about how different instructions can observe state changes. For example, some processors define that if you saw a result of instruction i, you will also see the result of all previously executed instructions. Other processors don’t make any guarantees or make different kinds of guarantees.

The point is that x86 and ARM have different behavior. Code written assuming x86 behavior might perform incorrectly if executing under the ARM model. Apple simply implements x86 model in hardware.
 
  • Like
Reactions: jdb8167 and Xiao_Xi

Sydde

macrumors 68030
Aug 17, 2009
2,563
7,061
IOKWARDI
Apple simply implements x86 model in hardware.

AIUI, Apple only implements TSO and some corner case flag states, but not in-order-execution. Most likely, Rosetta 2 is able to do more that simply translate the code but also insure that the effects of OoOE are accounted for. Which is not really a problem, because the completion queues are all in-order, so instructions do see the results of other instructions in a manner that is not practically inconsistent with x86. Include TSO and you would basically not be able to tell the difference.
 

MayaUser

macrumors 68040
Nov 22, 2021
3,177
7,196
I love how enthusiasts are you all here for this since you cannot wait an actual device with this and see if these benchmarks are equal to the real product and not repeat all over again the past arm windows devices difference score between presentations/web and real work/actual device
I really hope this is kind of true because finally in mid 2024 we have a capable arm windows device and will push windows even further to this architecture
 

MayaUser

macrumors 68040
Nov 22, 2021
3,177
7,196
The future for this world is a silent one..with electric cars, devices that dont blast the fans like jets...now, electric cars just need to float over so there will be no tire anymore , devices almost silent, graveyards vibes :)
 
  • Love
Reactions: Cape Dave

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
This blog post talks about Apple's secret sauce to accelerate x86 binary translation. It's a good reading.
Apple's secret sauce must be somewhere else. I may be missing something, but setting those flags on the hardware seems easy.

The point is that x86 and ARM have different behavior. Code written assuming x86 behavior might perform incorrectly if executing under the ARM model. Apple simply implements x86 model in hardware.
The head of the RISC-V memory model explained the differences between SC, TSO and WMO some time ago.

He didn't have enough time to explain all the details, but fortunately, there is an open book on memory consistency.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,674
Apple's secret sauce must be somewhere else. I may be missing something, but setting those flags on the hardware seems easy.

Extra flags and memory model emulation are pretty much the extent of the secret sauce. And yes, supporting these extra flags is not that hard. But you have to design your CPU around it. It does add to the cost.

The head of the RISC-V memory model explained the differences between SC, TSO and WMO some time ago.

Just a word of caution: these terms refer to fairly abstract ideas, and while it is important to learn them and talk about them, don’t forget that the only real authority is what the actual CPU does. The memory model of x86 CPUs is a specific thing, not just something you can completely describe with one of these labels. The main thing is that existing software relies on certain behavior so newer hardware has to implement it. The x86 memory model was not formally specified initially, it kind of developed with whatever Intel shipped, and later is was very important to keep backwards compatibility. The actual details can be rather complex.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
They didn't advertise it during the event, so I assume they don't have it. It's a big feature, and I can't imagine them not mentioning it.
Qualcomm appears to be investing in x86 game porting.

There is a video of a laptop playing Baulders Gate 3 at 1080p at around 30FPS. Game settings and power settings are unknown.
 
Last edited:

MayaUser

macrumors 68040
Nov 22, 2021
3,177
7,196
If these will compete with at least the M1 and M1 pro (besides Max/Ultra) i still think its a nice addition to the competition if not, then it will be another surface pro x fiasco tragedy
 
  • Like
Reactions: Digitalguy

honcho

macrumors member
Apr 19, 2011
84
30
I’m hoping this Snapdragon chip really is a beast. With any luck, it will hold Apple’s feet to the fire. Apple Silicon hasn’t had any competition for the best part of four years. A competitive landscape will lead to innovation, and as someone looking to upgrade a 2019 16” MBP, hopefully Apple’s M4 will rise to the challenge.

I‘m hoping that the widespread option of ARM will make an ARM version of Boot Camp a reality. Also, I’d really like to see more games ported to ARM so that one day I can dispense with a separate gaming PC. I can dream, I guess.
 
  • Like
Reactions: eltoslightfoot

MayaUser

macrumors 68040
Nov 22, 2021
3,177
7,196
Rise to the challenge? Snapdragon is already no challenge to Apple’s Pro/Max/Ultra chips, especially when it comes to GPU performance.
And thats an issue...from my point of view...because that means this will challenge just to the ultrabook segment like Macbook Air and all windows laptops without dGpu or an entry level dgpu...and maybe for the mac mini style mini-towers pc
 
  • Like
Reactions: Homy

MRMSFC

macrumors 6502
Jul 6, 2023
371
381
Even if it doesn’t meet touted performance figures, I’m hoping it at least makes just enough of an impact that other manufacturers start producing competitors.

I don’t see x86 improving meaningfully in efficiency to compete with Apple Silicon, and I believe having Windows more architecture agnostic will open the door to other CPU manufacturers, hopefully increasing competition outside the Apple ecosystem.
 

MayaUser

macrumors 68040
Nov 22, 2021
3,177
7,196
Even if it doesn’t meet touted performance figures, I’m hoping it at least makes just enough of an impact that other manufacturers start producing competitors.

I don’t see x86 improving meaningfully in efficiency to compete with Apple Silicon, and I believe having Windows more architecture agnostic will open the door to other CPU manufacturers, hopefully increasing competition outside the Apple ecosystem.
That will be good also from developers point of view to make more and more apps not only for x86
 

honcho

macrumors member
Apr 19, 2011
84
30
Rise to the challenge? Snapdragon is already no challenge to Apple’s Pro/Max/Ultra chips, especially when it comes to GPU performance.
I made no mention of Pro/Max/Ultra chips. All the same, competition at entry level will spur architectural innovation. This also benefits the higher end Apple chips. Apple has enjoyed years of clear blue water between itself and any rival. It is a powerful narrative that Apple will wish to maintain.
 

Homy

macrumors 68030
Jan 14, 2006
2,506
2,458
Sweden
I made no mention of Pro/Max/Ultra chips.

That was the problem and my point. Although competition is good and we hear the argument all the time Apple is not always driven by its competitors but by their own strategy and what they do best. It’s not as if Apple’s been chilling in their boat on those clear blue waters with no progress because they’re alone with no competition. Despite not having competitors to Apple Silicon they have developed and released new chips with new features. The competitors like Intel/AMD have chosen to release power hungry chips to stay ahead. Qualcomm has done the same to be able to claim victory over Apple. They compare their 12-core X Elite using 80W with M2/M3 with 8 cores using around 23W. A ”fair” comparison would be to use 12-core M3 Pro. So again no, Apple doesn’t need M4 to ”rise to the challenge”.

X-Elite-geekbench-6.jpg

(Source: Just Josh on YouTube)

X-Elite-cinebench-24.jpg

(Source: Just Josh on YouTube)

X-Elite-efficiency.jpg

(Source: Just Josh on YouTube)
 
Last edited:
  • Like
Reactions: MayaUser

MayaUser

macrumors 68040
Nov 22, 2021
3,177
7,196
lets not forgot that M3 scores are real, from real testing...im very worried that the end consumer device will be slithy slower...so an SD X 23W will be around 34 compared to 68 M3 and soon M4 that will be over 70
I still think in windows world these are 3 years behind
 
  • Like
Reactions: Homy

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
.... They compare their 12-core X Elite using 80W with M2/M3 with 8 cores using around 23W. A ”fair” comparison would be to use 12-core M3 Pro using only 24W. So again no, Apple doesn’t need M4 to ”rise to the challenge”.

View attachment 2367598
(Source: Just Josh on YouTube)

Do you even read the graph your are posting? Line one above is a 23W X Elite and the closet M3 variant to that is a the "23W M3". So yes, they are comparing those two. There is actually a horizontal line separating the the MBP 14"'s from the 80W mode X Elite. So they are trying to compare across the barrier line they put in the graph? Errr Probably not.



Apple built TWO different dies; the plain M3 and the M3 Pro. Qualcomm built one that sits in-between those two. They are not trying to exactly what Apple did. Nor should they ( on the first iteration). Apple built the A__X variants for YEARS before essentially slapping a Mn name on them ( and moving on to addition die sizes).

There is two major modes they can set the die/package to. One that is plays in roughly the same zone as the plain Mn SoC from Apple. And competes very handily with the actual real competitors of AMD/Intel with a 23W constraint. The other Windows/Linux laptop competitive threat is "repurposed desktop SoC" in a laptop. Those chassis typical have thermal capacity that handily accepts 80W. Getting the SoC placed into those chassis won't be hard. And they absolutely do not have to compete for placement in Mac chassis at all.

The 80W X Elite is up-clocked into a zone where really start to push past where the fab process's sweet spot on efficiency. Apple tends to just stop before the efficient curve goes bad ( Perf/Watt being the primary target metric) and simply makes a bigger die to walk up the multicore performance ladder. Qualcomm is trying to 'kill two birds with one stone'. Since their primary competitors push their chips into the diminishing returns zone (especially Intel), it isn't going to 'hurt' Qualcomm much to do the same exact thing to gather some incremental additional system placements. There is at least one , if not two, orders of magnitude more possible laptop chassis to be placed into than what exists in the Mac line up ( as if that was the entire possible area of competition. ). X Elite has to be selected and bought by system vendors for the line up to survive to additional generations. Qualcomm needs some positive cash flow here before they go to multiple dies in the SoC line up. ( buying Nuvia just gives them a $1B hole to at least partially crawl out of first).

Is Qualcomm going to dominate the 90+ W laptop market? Probably not since don't appear to do dGPUs , but decent chance they can get a few placements there as an 'alternative' SoC configuration. Nvidia isn't going to get banned from Windows/Linux laptops any time soon. So it isn't the Mac chassis restrictions that Qualcomm really has to compete with over the long term. In second , or third, iteration will probably see Qualcomm 'adapt' to that issue (probably only on 'bigger' die variants. Qualcomm is probably going toward phones/tablets smaller too; so dGPU ability is very likely not going to be ubiquitous in their line up. ). However, Qualcomm probalby is not going to feel same pressure that Apple has by banning dGPU, so unlikely going to see "Max like" die area commitment for GPU subsystem resources from Qualcomm. Scaling down to phone SoC like allocations is likely going to have higher priority than going 'max area'.


There is nothing suggestive in those charts that indicate that Qualcomm cannot keep efficiency is made more appropriate sized dies to walk deeper into the Pro/Max CPU compute zone. There is no big show stopper present.
Qualcomm can continue to make the entry X Elite is bit bigger than a plain M3 because Android tablet ( iPad Pro ) isn't going to pull the same volume. The bigger die could end up in-between the Pro and Max targets also. It makes little sense to try to land exactly on top of the Apple SoCs when the Qualcomm SoCs are not going into the exact same set of chassis constraints.
 

MayaUser

macrumors 68040
Nov 22, 2021
3,177
7,196
@deconstruct60 do you think they can keep the performance and efficiency with M3 and M3 Pro levels?
Since all will be in mobile devices, for customers means a lot to have a laptop that can keep demanding tasks for 2 hours on the go
Bottom line you think they will compete with M1 Macbook Air, or even with M3 Macbook air, also will compete with intel i7 laptops with entry level dgpu? To question if the top of the line can compete with M3 Max in maya projects that needs also lot of gpu power on the go is futile right?
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
lets not forgot that M3 scores are real, from real testing...im very worried that the end consumer device will be slithy slower...so an SD X 23W will be around 34 compared to 68 M3 and soon M4 that will be over 70

Pretty good chance these are close to "real testing". Windows 11 version 24H1 has effectively gone "release ready" by the time they held these 'hands on" labs at Qualcomm. Additionally, there have been complaints from system vendors that Qualcomm is requirement that they use Qualcomm's power management chips. That will lead to less vendor variations from reference designs these tests were run on. The SoCs themselves are going to be the same.

Perhaps if there was bad thermal management in the chassis that could lead to throttling, but given the more thermally challenging AMD/Intel SoCs likely being placed into same chassis ( maybe some largely cosmetic variation) that isn't likely either.

The X Elite are arriving relatively 'late' given the initial , overly optimistic, estimates by Qualcomm. There has been lots of time to spend getting things prepped so that system vendors laptops match the reference ones.


I still think in windows world these are 3 years behind

This "Nuvia core" is a server focused core that was retargeted on the fly to a laptop service duty. They also got to skip N3B ( reports indicate that they are using some TSMC N4 variant ). This first iteration is likely to be far more a "Frankenstein" project (mashing together Nuvia CPU cores with Qualcomm GPU/NPU-DSP/Memory/etc. ) than the second iteration.

Would need to see what next iteration improvements are before making any sort of real judgements about "how far behind" they are. Whether they are '3 years behind' or not depends upon how much pipelined, parallel next gen development work they have managed to get done. Right now it is more like an iceberg where most of it is likely out of view.

Qualcomm has Phone SoCs, VR SoCs , etc to align also. Unclear if they are going to pursue almost everything all same time or be more deliberate/prudent in expanding the line up. If they dilute themselves too thinly, then they will likely fall closer to '3 years behind'. Haphazard management seems to be a bigger threat than some technological barrier. There are also lots more 'cats to herd' given can't just kill off dGPUs or Windows feature XYZ unilaterally.
 
  • Like
Reactions: MiniApple
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.