Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

leman

macrumors Core
Oct 14, 2008
19,521
19,674
It's amazing to me that the CPU architects that came from Apple and are designing a brand new processor from scratch didn't bother implementing something like Apple's TSO mode. Are we sure they didn't? If not it calls into question how expert these ex-Apple engineers really are. To not have reliable x86 and x86-64 emulation without heavy use of memory fences that slows down execution is borderline incompetent.

They didn't advertise it during the event, so I assume they don't have it. It's a big feature, and I can't imagine them not mentioning it.

There is no doubt about the ability of the CPU architects that founded Nuvia, at the same time I do get the suspicion that they were responsible for the P-core architecture, and only that.
 
  • Like
Reactions: jdb8167

leman

macrumors Core
Oct 14, 2008
19,521
19,674
Why do you assume this? Has Apple mentioned it in any of its presentations or marketing materials?

No, but Apple's event and Qualcomm's event were very different. Qualcomm went quite in depth with technical details as they wanted to showcase their superiority. Apple was more along the lines of "we make most advanced CPUs and we can also run x86 code seamlessly".
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Qualcomm went quite in depth with technical details
I may have a bad memory because I don't remember Qualcomm going into detail. What details did Qualcomm mention in their presentation that Apple doesn't mention in theirs?
 

eltoslightfoot

macrumors 68030
Feb 25, 2011
2,545
3,094
It's amazing to me that the CPU architects that came from Apple and are designing a brand new processor from scratch didn't bother implementing something like Apple's TSO mode. Are we sure they didn't? If not it calls into question how expert these ex-Apple engineers really are. To not have reliable x86 and x86-64 emulation without heavy use of memory fences that slows down execution is borderline incompetent.
Let's see what they actually put out before we get too hyperbolic. :D
 
  • Like
Reactions: MRMSFC and name99

MrGunny94

macrumors 65816
Dec 3, 2016
1,148
675
Malaga, Spain
A visit from Galaxy Book:


1709921454007.png


I really cannot wait to see this running under Linux...

Couple comparisons between the M3 line up:

1709921710157.png


One thing to have in mind is that all of these are 16GB variants, I'm quite curious to see RAM prices on these laptops when they start coming out.

I'm definitely picking one up on release week to give it a go with Arch
 
Last edited:

dgdosen

macrumors 68030
Dec 13, 2003
2,817
1,463
Seattle
A visit from Galaxy Book:


View attachment 2356953

I really cannot wait to see this running under Linux...

Couple comparisons between the M3 line up:

View attachment 2356956

One thing to have in mind is that all of these are 16GB variants, I'm quite curious to see RAM prices on these laptops when they start coming out.

I'm definitely picking one up on release week to give it a go with Arch

What is a "GalaxyBook4 Edge"? I hope there's a 12" version...
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
What is a "GalaxyBook4 Edge"? I hope there's a 12" version...

the current Galaxy Book consists of ‘360’ , ‘Pro’ , and ‘Ultra’ ( with Core Ultra SoCs ) .
would be surprising if ‘Edge‘ was an additional suffix for Qualcomm Elite/Plus SoCs .



If they use both Qualcomm variants, then ‘Elite’ would not be a good suffix . But ‘Edge‘ does lead with a ‘E’ . They may be going for an “Edgy“ notion ( of being non mainstream )

Probably no 12” version . Most likely , the Qualcomm SoC will be thrown into some subset of the same chassis set that the Intel ( maybe AMD ) ones leverage. Just different battery and performance outcomes. If 12” doesn’t work for Intel powered book 4 , it likely isn’t going to work in same target market for a Qualcomm postered one .


if Qualcomm has a lower power consumption that has better synergy with higher power consuming, larger screens ( as a battery consumption offset ) .
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
A visit from Galaxy Book:


View attachment 2356953

I really cannot wait to see this running under Linux...

Couple comparisons between the M3 line up:

View attachment 2356956


the M3 is layered on TSMC N3 ( N3B) whereas X Elite/plus real competitors are also on N4 ( and Intel 4 ).
M3 doesn’t natively run Windows so not directly competing ( comparative ) .




One thing to have in mind is that all of these are 16GB variants, I'm quite curious to see RAM prices on these laptops when they start coming out.

Decent chance the minimum RAM is 16GB . The RAM is likely all BTO options. But even on Intel SoC Book 4variants , it is mostly just 16GB .
 

MrGunny94

macrumors 65816
Dec 3, 2016
1,148
675
Malaga, Spain
We previously got the old Lenovo/Dell ARM variants to test when they initially came out a few years ago. If I get any of these new model in the upcoming months I'll do a couple of tests with Windows and Linux.

I'm very excited about these models as I have been saying for years that ARM should be the way to go for mobility (outside of gaming laptops)

the M3 is layered on TSMC N3 ( N3B) whereas X Elite/plus real competitors are also on N4 ( and Intel 4 ).
M3 doesn’t natively run Windows so not directly competing ( comparative ) .






Decent chance the minimum RAM is 16GB . The RAM is likely all BTO options. But even on Intel SoC Book 4variants , it is mostly just 16GB .
I understand your view but if Qualcomm wants to push these for Enterprise they have to give a 32GB RAM option.
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
I understand your view but if Qualcomm wants to push these for Enterprise they have to give a 32GB RAM option.

'Enterprise' ? Because managers , traveling sales people , legal department , finance , and regular office workers deeply need 32GB of RAM why?

If 'Enterprise' is buy one specific system for everyone and make the deployed systems 100% uniform in SoC and system vendor, then they would. But that is a rather non-necessary requirement. It isn't going to be a "do everything for everybody" kind of system in the first generation. The very rigid , 'Enterprise' conservative folks will also likely have some high threshold x86 performance criteria ( "gotta run 'old' Enterprise stuff as fast as possible." ) It isn't going to be high synergy there either on the first generation.


If Qualcomm has had to push to bleeding edge LPDDR5X modules to get to their performance , there is substantive pricing pressure that will boat anchor along with that which also will blow back into pushing into relatively fringe niches that the legacy x86 systems will be keen to exploit. Also won't be surprising if there are no dGPUs. So the subset fringe that needs 32GB to pump data through a dGPU won't have the 'other' base component either. If bleeding edge LPDDR5x , then just getting uniform, high quantities of the modules for 16GB is an mild issue.

There may be a corner case BTO option on the most expensive chassis options for something like 24GB like the MBA 13/15". Apple has managed to sell only a 'measly' tens of millions of those through M1-M3 iterations.
 

Sydde

macrumors 68030
Aug 17, 2009
2,563
7,061
IOKWARDI
the M3 is layered on TSMC N3 ( N3B) whereas X Elite/plus real competitors are also on N4 ( and Intel 4 ).

How about this, then:
QC Oryon: 2574sc / 12562mc
Mac M2 Pro: 2650sc / 14281mc

Both on N4
Both have 12 cores
– except, the M2 Pro is 8P/4E while Oryon is a 12-core P fest.

(and the M3 iMac has 93% of the Oryon's mc score with only 4P/4E)

And, yes, it does matter that Apple Silicon and macOS are tightly integrated with each other.
 
Last edited:

leman

macrumors Core
Oct 14, 2008
19,521
19,674
What could explain the 12-core M2 Pro losing in Ray Tracing by a wide margin (in single and multi)

I’d say clock difference. The RT test is essentially an in-cache SIMD FP benchmark for these CPUs and their FP pipelines are probably pretty much identical (since Oryon seems to be mostly a reverse-engineered Firestorm). Oryon runs higher clock though.

and winning in File Compression, Photo Filtering and HDR by an even larger margin in multi against the Qualcomm SoC?

That is odd, right? Oryon has more cores and runs on a higher frequency. It should easily win in all tests.
 
  • Like
Reactions: Xiao_Xi

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
That is odd, right? Oryon has more cores and runs on a higher frequency. It should easily win in all tests.
Geekbench results never make sense to me. For example, the 12-core M2 Pro and the Qualcomm SoC are tied in single core in HTML5 Browse and PDF Render, but the M2 Pro wins in multi core, especially in HTML5 Browse by a wide margin.

For reference:
The HTML5 Browser workload opens various web pages using a web browser. It models the use case of a user browsing the web with a browser (such as Chrome and Safari). This workload uses a headless browser and opens, parses, lays out, and renders text and images from 8 web pages from popular websites (such as Ars Technica, Instagram, and Wikipedia).
The PDF Render workload opens complex PDF (Portable Document Format) documents using PDFium, Google Chrome’s PDF renderer. It models the use case of a user opening PDFs in a browser. This workload renders PDFs of park maps from the American National Park Service (sizes from 897 KB to 1.5 MB) that contain large vector images, lines and text.
 
  • Like
Reactions: Chuckeee

MrGunny94

macrumors 65816
Dec 3, 2016
1,148
675
Malaga, Spain
'Enterprise' ? Because managers , traveling sales people , legal department , finance , and regular office workers deeply need 32GB of RAM why?

If 'Enterprise' is buy one specific system for everyone and make the deployed systems 100% uniform in SoC and system vendor, then they would. But that is a rather non-necessary requirement. It isn't going to be a "do everything for everybody" kind of system in the first generation. The very rigid , 'Enterprise' conservative folks will also likely have some high threshold x86 performance criteria ( "gotta run 'old' Enterprise stuff as fast as possible." ) It isn't going to be high synergy there either on the first generation.


If Qualcomm has had to push to bleeding edge LPDDR5X modules to get to their performance , there is substantive pricing pressure that will boat anchor along with that which also will blow back into pushing into relatively fringe niches that the legacy x86 systems will be keen to exploit. Also won't be surprising if there are no dGPUs. So the subset fringe that needs 32GB to pump data through a dGPU won't have the 'other' base component either. If bleeding edge LPDDR5x , then just getting uniform, high quantities of the modules for 16GB is an mild issue.

There may be a corner case BTO option on the most expensive chassis options for something like 24GB like the MBA 13/15". Apple has managed to sell only a 'measly' tens of millions of those through M1-M3 iterations.
There’s way more people than those you are mentioning.

You are not taking in consideration developers, dev ops, cloud folks, on premise engineers and many more sectors.

For example at my company any engineer/architect is specced with 32Gb.

Not everyone just uses 16GB. Remember if they wanna steal AMD and Intel market share they gotta hit the folks who buy in bulk and order machines valued around 1500-3000$ like Thinkpads, Latitudes and etc
 
  • Like
Reactions: MRMSFC

Sydde

macrumors 68030
Aug 17, 2009
2,563
7,061
IOKWARDI
but the M2 Pro wins in multi core, especially in HTML5 Browse by a wide margin
I think I have an answer for that. A lot of browser work is pretty low-load, so it goes onto the E-cores, with only a fraction of browser work happening on P-cores. The E-cores feed quite a lot less waste heat onto the chip, giving the P-cores more headroom to burn through the heavy-load work at a higher speed. Put all the work on P-cores and you have all the cores pushing on each other with their heat output, so the whole array has to dial it back.

Apple's E-cores appear to be very good, so they can take on a lot of work without heating the chip up as much.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
I think I have an answer for that. A lot of browser work is pretty low-load, so it goes onto the E-cores, with only a fraction of browser work happening on P-cores. The E-cores feed quite a lot less waste heat onto the chip, giving the P-cores more headroom to burn through the heavy-load work at a higher speed. Put all the work on P-cores and you have all the cores pushing on each other with their heat output, so the whole array has to dial it back.
I don't know how the GB6 works, so I'm probably misunderstanding what you're saying. Are you suggesting that the M2 Pro gets better results because the Qualcomm SoC throttles?

The M2 Pro scales much better than the Qualcomm SoC. In single-core, both SoCs render about 55 pages/second, but in multi-core, the M2 Pro renders about 360 pages/second, while the Qualcomm SoC, about 290 pages/second.

Is it possible that macOS handles multithreaded applications better than Windows? Could the application that the benchmark uses explain the differences?
 

MRMSFC

macrumors 6502
Jul 6, 2023
371
381
'Enterprise' ? Because managers , traveling sales people , legal department , finance , and regular office workers deeply need 32GB of RAM why?
Chrome, Outlook, Teams, Excel, and specific engineering software are RAM hogs.

At least in my experience with enterprise, having all of those apps open at once is expected, and they’re not getting lighter.
The very rigid , 'Enterprise' conservative folks will also likely have some high threshold x86 performance criteria ( "gotta run 'old' Enterprise stuff as fast as possible." )
I think that they’re relying on Microsoft to do the heavy lifting there, and Windows 10 is nearing EoL.

We’re already switching to Windows 11 and porting our tools over to it. I would imagine that MS will move heaven and earth to make sure compatibility is kept since that’s their M.O. anything else likely either gets ported or run on a proprietary emulation layer. (Or the old machine just gets kept around).
If Qualcomm has had to push to bleeding edge LPDDR5X modules to get to their performance , there is substantive pricing pressure that will boat anchor along with that which also will blow back into pushing into relatively fringe niches that the legacy x86 systems will be keen to exploit.
This is true that it will have an upward pressure on price, but that can be remedied by cutting fat elsewhere.

Going back to your assertion about RAM capacity, hypothetically QC could just cut corners on the chassis material, screen quality, etc to make up for the cost.

I find that, at least with enterprise, having a mediocre screen or chassis is acceptable to the company so long as the device can deliver.
Also won't be surprising if there are no dGPUs. So the subset fringe that needs 32GB to pump data through a dGPU won't have the 'other' base component either.
The most common use for dGPU is gaming anyway, and in enterprise anything that requires dGPU power is run on servers.
 

Basic75

macrumors 68020
May 17, 2011
2,101
2,447
Europe
Oryon has more cores and runs on a higher frequency. It should easily win in all tests.
And it has a faster memory interface, at least in theory. Perhaps the caches or the interfaces between the caches and the memory are rubbish?
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
It's amazing to me that the CPU architects that came from Apple and are designing a brand new processor from scratch didn't bother implementing something like Apple's TSO mode. Are we sure they didn't?

Several odd assumptions built into the above.

First, it very likely isn't a primary 'feature' . Microsoft isn't going to tweak their compatibility solution just for one vendors CPU implementation. If Arm specified it as an optional feature then maybe. But as a proprietary fork from the Arm standard it doesn't do much to create a competitive Arm implementation environment for Windows.

The Arm Neoverse implementations that Amazon , Ampere Computing , etc. have released have zero TSO and yet have taken substantive share away from x86 servers. Amazon is up closing in 20+% of deployed servers with no TSO at all. The hype that they have to have this feature to win is mostly just Apple 'fan boy' talk. There is already concrete evidence that it isn't necessarily. It is a 'nice to have' , but it isn't necessary in the general market.

Apple's policy of 'all or nothing" coupled with 'burn the bridge after march the army over the river" makes it a more necessary feature in the macOS space. That is largely driven by Apple killing x86 off as an option for macOS in the intermediate future. Arm is not going to kill off x86 in Windows ( or Linux ). The hard core, 'legacy x86 code at maximum speed' folks in the Windows space are just going to buy x86 Windows systems. Apple just doesn't even give that choice as an option. ( the Mac space is about having at least an order of magnitude less system choice options. )


Second, the 'from scratch' is likely not entirely true. Part of the Arm law suite again Nuvia/Qualcomm is about the assistance of data and IP that Arm provided to Nuvia to jumpstart their efforts. That additionally grounds the reality here that these cores started off in a different target market. Being 'repurposed' into a alternative target market leaves even less time/effort/resources available for non critical features. If even up slower than x86 options on general code then TSO isn't going to 'buy' Qualcomm/Nuvia anything. And deploying something slower (or less competitive) than Neoverse v3 would be even worse. They need to get something out the door with as few distractions as possible.

Third, since were going to leave with a decent size block of Apple engineers , it was about 100% certain the Nuvia was going to get sued. If they7 tried to copy a feature that only Apple implemented, then Apple would be digging around in their company for far , far , far longer than they budget for ( Apple piling on lawyers means you also need lawyers. ). Apple's TSO came out of lots of data they collected on x86 processors. If Nuvia takes the data then they are shooting themselves in the foot. If they spend gobs of time recreating the data... they are burning money that they have relatively little of to spare.


If not it calls into question how expert these ex-Apple engineers really are.
No it doesn't at all. An 'expert' is a person who picks the correct solution to a specifically posed problem. Experts are not necessarily indicated by someone who simply pulls out the same hammer every them and mainly just regurgitates what they have already done over and over again.

In the Linux market ( i.e., high overlap with servers) lots of the applications are 'open source' which means it isn't all that hard to just recompile them and deploy on a new platform. (e.g., why Amazon and other hyperscalings are relatively rapidly deployed once at the Linux foundation work was done in the server space ( drivers , kernel, boot, common utilities , etc. ). Android market is minor variation on Linux. Same issue different day.

Extremely relevant core issue here was that Nuvia wasn't going to build a macOS/iOS clone running CPU. Windows already had an conversation tool long before Nuvia even sat down in their brand new , 'from scratch' offices. Running the Microsoft solution faster is the primary point. Not solving the problem they might 'wish they had'. It is solving the issue they did have.

Same thing with boot UEFI (because Windows is grounded in it) . Better know something about Pluton ... because Windows is heading in that direction. Apple can burn all that up by burning bridges behind them (they have iPhone/iPad ecosystem to lean on). Nuvia/Qualcomm cannot.

To not have reliable x86 and x86-64 emulation without heavy use of memory fences that slows down execution is borderline incompetent.

Windows on Arm primarily problem for the last 3-5 years has been native Arm performance; not 'emulation'.

A substantial amount of information about where fencing is/is not necessary is in the source code. Leverage that and don't need to rely on rewriting binaries where most of that has been thrown away. More performance will pull more software developers into doing the compile to Arm instead od doing a re-compile behind their back. ( Emulation is the wrong word connotation here. These are both running Arm binaries. It is varying on when and how much effort is put into the recompile time/resources. )


And Apple is likely also 'spraying' memory fences ... just behind the instruction decoder. That is why Rosetta cannot mix-and-match code.
 
  • Like
Reactions: name99

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
How about this, then:
QC Oryon: 2574sc / 12562mc
Mac M2 Pro: 2650sc / 14281mc

Both on N4
Both have 12 cores
– except, the M2 Pro is 8P/4E while Oryon is a 12-core P fest.

(and the M3 iMac has 93% of the Oryon's mc score with only 4P/4E)

So a gap of 2% on single threaded which for two different implementations shouldn't be all that surprising. That isn't a substantive gap.

The multithread is the bigger gap; 13%. The M-series P core clusters don't get all of the memory bandwidth in their SoC. It wouldn't be all that surprising if the Nuvia cores ( which initially were not targeted to have to compete in a heterogenous core competition ) were restricted by a tighter QoS leash so that they don't disrupt their die cohorts. I suspect this is not a "windows is leaving P core performance on the floor" as much as it is Nuvia cores outstripping their memory bandwidth allocation when use all of them at once. ]


( Geekbench largely treats these die as if they only consist of CPU cores. They do not. And that other stuff is going to bring into play design trade-offs than Geekbench doesn't directly measure at all. )



And, yes, it does matter that Apple Silicon and macOS are tightly integrated with each other.

Probably not as much in this context. There likely is bigger gap in the two memory subsystems and/or the die network bandwidth.

Nuvia did the CPU cores, but Qualcomm did the GPU cores , NPU/DSP cores , and likely a decent chunk of the network and memory subsystem work. [ The GPU cores likely have a higher memory bandwidth coupling so some "Nuvia memory system" slapped onto the GPU cores would probably have more issues than vice-versa. ] . Most of Windows lifetime has been on homogenous cores so a P vs E core balancing thing probably isn't a major issue here.


Version 2.0 is were it would be more expected that they going the balancing issues more worked out on the die and a bit less cobbled together "Frankenstein" assembly. If still hamstrung there then I'd suspect the OS has more to do with the 'scale up on die' problem.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.