Qualcomm revealed X Elite's benchmark scores

jdb8167 · Mar 11, 2024

deconstruct60 said:
First, it very likely isn't a primary 'feature' . Microsoft isn't going to tweak their compatibility solution just for one vendors CPU implementation. If Arm specified it as an optional feature then maybe. But as a proprietary fork from the Arm standard it doesn't do much to create a competitive Arm implementation environment for Windows.

It should be a pretty easy tweak. If Microsoft detects a TSO mode, set the memory compatibility setting as low as it can go and turn it on.

deconstruct60 said:
The Arm Neoverse implementations that Amazon , Ampere Computing , etc. have released have zero TSO and yet have taken substantive share away from x86 servers. Amazon is up closing in 20+% of deployed servers with no TSO at all. The hype that they have to have this feature to win is mostly just Apple 'fan boy' talk. There is already concrete evidence that it isn't necessarily. It is a 'nice to have' , but it isn't necessary in the general market.

Server products have little reason to do x86 binary translations. This Qualcomm SoC is meant for consumer Windows PCs. Microsoft OEMs have a hard time selling them because x86 compatibility isn't that great.

deconstruct60 said:
Apple's policy of 'all or nothing" coupled with 'burn the bridge after march the army over the river" makes it a more necessary feature in the macOS space. That is largely driven by Apple killing x86 off as an option for macOS in the intermediate future. Arm is not going to kill off x86 in Windows ( or Linux ). The hard core, 'legacy x86 code at maximum speed' folks in the Windows space are just going to buy x86 Windows systems. Apple just doesn't even give that choice as an option. ( the Mac space is about having at least an order of magnitude less system choice options. )

I find it hard to believe that Microsoft doesn't think the highest x86 compatibility isn't a top priority. This is Microsoft who always values backwards compatibility more than Apple.

deconstruct60 said:
Second, the 'from scratch' is likely not entirely true. Part of the Arm law suite again Nuvia/Qualcomm is about the assistance of data and IP that Arm provided to Nuvia to jumpstart their efforts. That additionally grounds the reality here that these cores started off in a different target market. Being 'repurposed' into a alternative target market leaves even less time/effort/resources available for non critical features. If even up slower than x86 options on general code then TSO isn't going to 'buy' Qualcomm/Nuvia anything. And deploying something slower (or less competitive) than Neoverse v3 would be even worse. They need to get something out the door with as few distractions as possible.

Third, since were going to leave with a decent size block of Apple engineers , it was about 100% certain the Nuvia was going to get sued. If they7 tried to copy a feature that only Apple implemented, then Apple would be digging around in their company for far , far , far longer than they budget for ( Apple piling on lawyers means you also need lawyers. ). Apple's TSO came out of lots of data they collected on x86 processors. If Nuvia takes the data then they are shooting themselves in the foot. If they spend gobs of time recreating the data... they are burning money that they have relatively little of to spare.

This is a good point. Nuvia was building a server chip where I already acknowledged that x86 compatibility isn't particularly important. I guess I don't know enough about the process that Nuvia and then Qualcomm went through to get a final product. If the lawsuit continues then we may find out in the future.

deconstruct60 said:
In the Linux market ( i.e., high overlap with servers) lots of the applications are 'open source' which means it isn't all that hard to just recompile them and deploy on a new platform. (e.g., why Amazon and other hyperscalings are relatively rapidly deployed once at the Linux foundation work was done in the server space ( drivers , kernel, boot, common utilities , etc. ). Android market is minor variation on Linux. Same issue different day.

Linux is really not something that is relevant to this. The Qualcomm SoC is mostly targeted at Windows.

deconstruct60 said:
Windows on Arm primarily problem for the last 3-5 years has been native Arm performance; not 'emulation'.

A substantial amount of information about where fencing is/is not necessary is in the source code. Leverage that and don't need to rely on rewriting binaries where most of that has been thrown away. More performance will pull more software developers into doing the compile to Arm instead od doing a re-compile behind their back. ( Emulation is the wrong word connotation here. These are both running Arm binaries. It is varying on when and how much effort is put into the recompile time/resources. )

And Apple is likely also 'spraying' memory fences ... just behind the instruction decoder. That is why Rosetta cannot mix-and-match code.

Again, the issue is x86 compatibility not porting to ARM. (And yes emulation is the wrong word but I was using it as a shortcut instead of saying binary translation.) I doubt Microsoft is going to be any more successful at getting ARM ports than they have been in the past. I guess we'll see.

As for Apple spraying memory fences, I doubt it. Most acknowledge that Rosetta 2 is faster and more reliable than Microsoft's solution. Apple added TSO specifically so that no changes to the memory model of the x86 binaries was necessary.

Xiao_Xi · Mar 11, 2024

jdb8167 said:
It should be a pretty easy tweak. If Microsoft detects a TSO mode, set the memory compatibility setting as low as it can go and turn it on.

RISC-V can switch from TSO mode to WMO mode and vice versa by changing one bit in some registers.

The dynamic-RVTSO behaviour is controlled by bit 8 (DTSO) of menvcfg, senvcfg, and henvcfg for all lower privilege levels.

riscv-ssdtso/ssdtso.adoc at v1.0-draft3 · riscv/riscv-ssdtso

The Ssdtso is a fast-track extension adding a 'dynamic-RVTSO' mode of operation and on-demand per-hart switching between the memory models. - riscv/riscv-ssdtso

github.com

What is the difference between running a multithreaded application in WMO mode and in TSO mode? Is instruction reordering in TSO mode more restricted than in WMO mode? What exactly has Apple done to improve the translation of x64 applications?

By the way, there is a paper (behind a paywall) that says that M1 Ultra on Asahi Linux using TSO is on average 9% slower than using WMO in the SPECspeed 2017 Floating Point suite.

Faster execution results in a higher score

https://www.researchgate.net/publication/373415694_TOSTING_Investigating_Total_Store_Ordering_on_ARM

leman · Mar 11, 2024

Xiao_Xi said:
What is the difference between running a multithreaded application in WMO mode and in TSO mode? Is instruction reordering in TSO mode more restricted than in WMO mode? What exactly has Apple done to improve the translation of x64 applications?

It’s not about restriction reordering, but about how different instructions can observe state changes. For example, some processors define that if you saw a result of instruction i, you will also see the result of all previously executed instructions. Other processors don’t make any guarantees or make different kinds of guarantees.

The point is that x86 and ARM have different behavior. Code written assuming x86 behavior might perform incorrectly if executing under the ARM model. Apple simply implements x86 model in hardware.

Sydde · Mar 17, 2024

leman said:
Apple simply implements x86 model in hardware.

AIUI, Apple only implements TSO and some corner case flag states, but not in-order-execution. Most likely, Rosetta 2 is able to do more that simply translate the code but also insure that the effects of OoOE are accounted for. Which is not really a problem, because the completion queues are all in-order, so instructions do see the results of other instructions in a manner that is not practically inconsistent with x86. Include TSO and you would basically not be able to tell the difference.

vanc · Mar 17, 2024

Xiao_Xi said:
What exactly has Apple done to improve the translation of x64 applications?

This blog post talks about Apple's secret sauce to accelerate x86 binary translation. It's a good reading.

MayaUser · Mar 18, 2024

I love how enthusiasts are you all here for this since you cannot wait an actual device with this and see if these benchmarks are equal to the real product and not repeat all over again the past arm windows devices difference score between presentations/web and real work/actual device
I really hope this is kind of true because finally in mid 2024 we have a capable arm windows device and will push windows even further to this architecture

MayaUser · Mar 18, 2024

The future for this world is a silent one..with electric cars, devices that dont blast the fans like jets...now, electric cars just need to float over so there will be no tire anymore , devices almost silent, graveyards vibes

Xiao_Xi · Mar 18, 2024

vanc said:
This blog post talks about Apple's secret sauce to accelerate x86 binary translation. It's a good reading.

Apple's secret sauce must be somewhere else. I may be missing something, but setting those flags on the hardware seems easy.

leman said:
The point is that x86 and ARM have different behavior. Code written assuming x86 behavior might perform incorrectly if executing under the ARM model. Apple simply implements x86 model in hardware.

The head of the RISC-V memory model explained the differences between SC, TSO and WMO some time ago.

He didn't have enough time to explain all the details, but fortunately, there is an open book on memory consistency.

A Primer on Memory Consistency and Cache Coherence, Second Edition

Many modern computer systems, including homogeneous and heterogeneous architectures, support shared memory in hardware.

link.springer.com

leman · Mar 18, 2024

Xiao_Xi said:
Apple's secret sauce must be somewhere else. I may be missing something, but setting those flags on the hardware seems easy.

Extra flags and memory model emulation are pretty much the extent of the secret sauce. And yes, supporting these extra flags is not that hard. But you have to design your CPU around it. It does add to the cost.

Xiao_Xi said:
The head of the RISC-V memory model explained the differences between SC, TSO and WMO some time ago.

Just a word of caution: these terms refer to fairly abstract ideas, and while it is important to learn them and talk about them, don’t forget that the only real authority is what the actual CPU does. The memory model of x86 CPUs is a specific thing, not just something you can completely describe with one of these labels. The main thing is that existing software relies on certain behavior so newer hardware has to implement it. The x86 memory model was not formally specified initially, it kind of developed with whatever Intel shipped, and later is was very important to keep backwards compatibility. The actual details can be rather complex.

MayaUser · Mar 21, 2024

Xiao_Xi · Mar 22, 2024

leman said:
They didn't advertise it during the event, so I assume they don't have it. It's a big feature, and I can't imagine them not mentioning it.

Qualcomm appears to be investing in x86 game porting.

Qualcomm says most Windows games should “just work” on its unannounced Arm laptops

How well will they run, though?

www.theverge.com

There is a video of a laptop playing Baulders Gate 3 at 1080p at around 30FPS. Game settings and power settings are unknown.

https://twitter.com/x/status/1772295505524973783

Xiao_Xi · Apr 8, 2024

Microsoft seems to be confident of Qualcomm's SoC.

Microsoft is confident Windows on Arm could finally beat Apple

Microsoft is gearing up for next-gen AI PCs.

www.theverge.com

~~Update: To prevent the moderators from closing the thread, please refrain from commenting on this news in this thread for the next few hours.~~

MayaUser · Apr 9, 2024

Microsoft was confident for windows S and a lot more
So lets wait for the real devices to be in our hands...probably in 1-2 months !?

MayaUser · Apr 9, 2024

If these will compete with at least the M1 and M1 pro (besides Max/Ultra) i still think its a nice addition to the competition if not, then it will be another surface pro x fiasco tragedy

honcho · Apr 9, 2024

I’m hoping this Snapdragon chip really is a beast. With any luck, it will hold Apple’s feet to the fire. Apple Silicon hasn’t had any competition for the best part of four years. A competitive landscape will lead to innovation, and as someone looking to upgrade a 2019 16” MBP, hopefully Apple’s M4 will rise to the challenge.

I‘m hoping that the widespread option of ARM will make an ARM version of Boot Camp a reality. Also, I’d really like to see more games ported to ARM so that one day I can dispense with a separate gaming PC. I can dream, I guess.

Homy · Apr 9, 2024

honcho said:
hopefully Apple’s M4 will rise to the challenge.

Rise to the challenge? Snapdragon is already no challenge to Apple’s Pro/Max/Ultra chips, especially when it comes to GPU performance.

MayaUser · Apr 9, 2024

Homy said:
Rise to the challenge? Snapdragon is already no challenge to Apple’s Pro/Max/Ultra chips, especially when it comes to GPU performance.

And thats an issue...from my point of view...because that means this will challenge just to the ultrabook segment like Macbook Air and all windows laptops without dGpu or an entry level dgpu...and maybe for the mac mini style mini-towers pc

MRMSFC · Apr 10, 2024

Even if it doesn’t meet touted performance figures, I’m hoping it at least makes just enough of an impact that other manufacturers start producing competitors.

I don’t see x86 improving meaningfully in efficiency to compete with Apple Silicon, and I believe having Windows more architecture agnostic will open the door to other CPU manufacturers, hopefully increasing competition outside the Apple ecosystem.

MayaUser · Apr 10, 2024

MRMSFC said:
Even if it doesn’t meet touted performance figures, I’m hoping it at least makes just enough of an impact that other manufacturers start producing competitors.

I don’t see x86 improving meaningfully in efficiency to compete with Apple Silicon, and I believe having Windows more architecture agnostic will open the door to other CPU manufacturers, hopefully increasing competition outside the Apple ecosystem.

That will be good also from developers point of view to make more and more apps not only for x86

honcho · Apr 11, 2024

Homy said:
Rise to the challenge? Snapdragon is already no challenge to Apple’s Pro/Max/Ultra chips, especially when it comes to GPU performance.

I made no mention of Pro/Max/Ultra chips. All the same, competition at entry level will spur architectural innovation. This also benefits the higher end Apple chips. Apple has enjoyed years of clear blue water between itself and any rival. It is a powerful narrative that Apple will wish to maintain.

Homy · Apr 11, 2024

honcho said:
I made no mention of Pro/Max/Ultra chips.

That was the problem and my point. Although competition is good and we hear the argument all the time Apple is not always driven by its competitors but by their own strategy and what they do best. It’s not as if Apple’s been chilling in their boat on those clear blue waters with no progress because they’re alone with no competition. Despite not having competitors to Apple Silicon they have developed and released new chips with new features. The competitors like Intel/AMD have chosen to release power hungry chips to stay ahead. Qualcomm has done the same to be able to claim victory over Apple. They compare their 12-core X Elite using 80W with M2/M3 with 8 cores using around 23W. A ”fair” comparison would be to use 12-core M3 Pro. So again no, Apple doesn’t need M4 to ”rise to the challenge”.

(Source: Just Josh on YouTube)

(Source: Just Josh on YouTube)

(Source: Just Josh on YouTube)

MayaUser · Apr 11, 2024

lets not forgot that M3 scores are real, from real testing...im very worried that the end consumer device will be slithy slower...so an SD X 23W will be around 34 compared to 68 M3 and soon M4 that will be over 70
I still think in windows world these are 3 years behind

deconstruct60 · Apr 11, 2024

Homy said:
.... They compare their 12-core X Elite using 80W with M2/M3 with 8 cores using around 23W. A ”fair” comparison would be to use 12-core M3 Pro using only 24W. So again no, Apple doesn’t need M4 to ”rise to the challenge”.

View attachment 2367598
(Source: Just Josh on YouTube)

Do you even read the graph your are posting? Line one above is a 23W X Elite and the closet M3 variant to that is a the "23W M3". So yes, they are comparing those two. There is actually a horizontal line separating the the MBP 14"'s from the 80W mode X Elite. So they are trying to compare across the barrier line they put in the graph? Errr Probably not.

Apple built TWO different dies; the plain M3 and the M3 Pro. Qualcomm built one that sits in-between those two. They are not trying to exactly what Apple did. Nor should they ( on the first iteration). Apple built the A__X variants for YEARS before essentially slapping a Mn name on them ( and moving on to addition die sizes).

There is two major modes they can set the die/package to. One that is plays in roughly the same zone as the plain Mn SoC from Apple. And competes very handily with the actual real competitors of AMD/Intel with a 23W constraint. The other Windows/Linux laptop competitive threat is "repurposed desktop SoC" in a laptop. Those chassis typical have thermal capacity that handily accepts 80W. Getting the SoC placed into those chassis won't be hard. And they absolutely do not have to compete for placement in Mac chassis at all.

The 80W X Elite is up-clocked into a zone where really start to push past where the fab process's sweet spot on efficiency. Apple tends to just stop before the efficient curve goes bad ( Perf/Watt being the primary target metric) and simply makes a bigger die to walk up the multicore performance ladder. Qualcomm is trying to 'kill two birds with one stone'. Since their primary competitors push their chips into the diminishing returns zone (especially Intel), it isn't going to 'hurt' Qualcomm much to do the same exact thing to gather some incremental additional system placements. There is at least one , if not two, orders of magnitude more possible laptop chassis to be placed into than what exists in the Mac line up ( as if that was the entire possible area of competition. ). X Elite has to be selected and bought by system vendors for the line up to survive to additional generations. Qualcomm needs some positive cash flow here before they go to multiple dies in the SoC line up. ( buying Nuvia just gives them a $1B hole to at least partially crawl out of first).

Is Qualcomm going to dominate the 90+ W laptop market? Probably not since don't appear to do dGPUs , but decent chance they can get a few placements there as an 'alternative' SoC configuration. Nvidia isn't going to get banned from Windows/Linux laptops any time soon. So it isn't the Mac chassis restrictions that Qualcomm really has to compete with over the long term. In second , or third, iteration will probably see Qualcomm 'adapt' to that issue (probably only on 'bigger' die variants. Qualcomm is probably going toward phones/tablets smaller too; so dGPU ability is very likely not going to be ubiquitous in their line up. ). However, Qualcomm probalby is not going to feel same pressure that Apple has by banning dGPU, so unlikely going to see "Max like" die area commitment for GPU subsystem resources from Qualcomm. Scaling down to phone SoC like allocations is likely going to have higher priority than going 'max area'.

There is nothing suggestive in those charts that indicate that Qualcomm cannot keep efficiency is made more appropriate sized dies to walk deeper into the Pro/Max CPU compute zone. There is no big show stopper present.
Qualcomm can continue to make the entry X Elite is bit bigger than a plain M3 because Android tablet ( iPad Pro ) isn't going to pull the same volume. The bigger die could end up in-between the Pro and Max targets also. It makes little sense to try to land exactly on top of the Apple SoCs when the Qualcomm SoCs are not going into the exact same set of chassis constraints.

MayaUser · Apr 11, 2024

@deconstruct60 do you think they can keep the performance and efficiency with M3 and M3 Pro levels?
Since all will be in mobile devices, for customers means a lot to have a laptop that can keep demanding tasks for 2 hours on the go
Bottom line you think they will compete with M1 Macbook Air, or even with M3 Macbook air, also will compete with intel i7 laptops with entry level dgpu? To question if the top of the line can compete with M3 Max in maya projects that needs also lot of gpu power on the go is futile right?

deconstruct60 · Apr 11, 2024

MayaUser said:
lets not forgot that M3 scores are real, from real testing...im very worried that the end consumer device will be slithy slower...so an SD X 23W will be around 34 compared to 68 M3 and soon M4 that will be over 70

Pretty good chance these are close to "real testing". Windows 11 version 24H1 has effectively gone "release ready" by the time they held these 'hands on" labs at Qualcomm. Additionally, there have been complaints from system vendors that Qualcomm is requirement that they use Qualcomm's power management chips. That will lead to less vendor variations from reference designs these tests were run on. The SoCs themselves are going to be the same.

Perhaps if there was bad thermal management in the chassis that could lead to throttling, but given the more thermally challenging AMD/Intel SoCs likely being placed into same chassis ( maybe some largely cosmetic variation) that isn't likely either.

The X Elite are arriving relatively 'late' given the initial , overly optimistic, estimates by Qualcomm. There has been lots of time to spend getting things prepped so that system vendors laptops match the reference ones.

MayaUser said:
I still think in windows world these are 3 years behind

This "Nuvia core" is a server focused core that was retargeted on the fly to a laptop service duty. They also got to skip N3B ( reports indicate that they are using some TSMC N4 variant ). This first iteration is likely to be far more a "Frankenstein" project (mashing together Nuvia CPU cores with Qualcomm GPU/NPU-DSP/Memory/etc. ) than the second iteration.

Would need to see what next iteration improvements are before making any sort of real judgements about "how far behind" they are. Whether they are '3 years behind' or not depends upon how much pipelined, parallel next gen development work they have managed to get done. Right now it is more like an iceberg where most of it is likely out of view.

Qualcomm has Phone SoCs, VR SoCs , etc to align also. Unclear if they are going to pursue almost everything all same time or be more deliberate/prudent in expanding the line up. If they dilute themselves too thinly, then they will likely fall closer to '3 years behind'. Haphazard management seems to be a bigger threat than some technological barrier. There are also lots more 'cats to herd' given can't just kill off dGPUs or Windows feature XYZ unilaterally.

Qualcomm revealed X Elite's benchmark scores

macrumors 601

macrumors 68000

macrumors Core

macrumors 68030

macrumors 6502a

macrumors 68040

macrumors 68040

macrumors 68000

macrumors Core

macrumors 68040

macrumors 68000

macrumors 68000

macrumors 68040

macrumors 68040

macrumors regular

macrumors 68030

macrumors 68040

macrumors 6502

macrumors 68040

macrumors regular

macrumors 68030

macrumors 68040

macrumors G5

macrumors 68040

macrumors G5

Our Staff