Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
OK, so we have the last 2 SOCs they launched, the 8Gen2 and Gen3 so can you elaborate?

These are the preliminary results for the Xiaomi 15 for example. So: excellent performance, thermals and power efficiency, way better than previous generation.

Also the Chinese are the ones that make the best mobile SOC tech reviews now not Ars and things are clear, they already tested the reference 8 Elite device and an actual retail unit. What exactly do you think you will see from Ars?

Wait a moment, so you consider these results you just quoted to be authoritative? It mentions 3K/9.4K GB6.3 for X8E. Typical A18 Pro scores in GB6.3 are 3.5K/8.8K. So you are saying that X8E is 15% slower in single-core and 7% faster in multi-core, right?
 
He clarified that it also gets over 3k and 10k scores.
I get it, now that the numbers are so close to Apple's chips there's an obsession here to turn them around at every chance. The purpose of the link was to show that the data coming from another 8 Elite device is also positive so there's no point in waiting to see how bad the chip is, you will be disappointed.
 
I can't speak to the strangeness of Apple's designs. Apple uses ARM instructions, but the designs are all their own. I would argue A/M series chips are better described as being ARM-compatible rather than ARM-based.
And, I would wholeheartedly agree. The only thing Apple’s solution has to do is execute the ARM instruction set.
 
If I remember correctly, wasn’t this something to do about the drivers expecting a certain device memory caching behavior to work that Apple does not support? “GPU commands” are what the driver makes them to be, they are not somehow limited by the platform. But if you can’t make the driver run due to a low-level incompatibility, that’s a problem.
I’ll have to go find it again, but it read more like while PCIe is absolutely supported, there are no system level requests for drawing graphics that leave the SoC. And there’s not a way for anyone to force them externally. A person can buy a MacPro and put a GPU in a slot, but nothing’s going to populate the GPU with data. Which also means there’s no signals for eGPU’s to use.

EDIT: https://social.treehouse.systems/@AsahiLinux/110501435070102543
No more info since this. And, I think if they had been able to get it to work, it would have been touted as loudly as their recent ability to get Vulcan working. :)
 
Last edited:
He clarified that it also gets over 3k and 10k scores.
I get it, now that the numbers are so close to Apple's chips there's an obsession here to turn them around at every chance.

Weren't you the one who complained "it's not 10%, it's 5-7% most of the time"? Now others are "obsessed"?
 
  • Like
Reactions: BugeyeSTI
Not yet. They've shown intent, but it's not a done deal. There is zero chance they will follow through before QC tries for a restraining order, or some other legal maneuver. This will not end quickly, and it will probably end in some sort of negotiated settlement. ARM doesn't really want to take those chips off the market, after all. They just want more revenue.
Yeah, this feels more like a “we’re done discussing this, you have our ‘closed door’ contract, come back with a realistic proposal within the net two months”. I believe ARM would not want to be without Qualcomm’s money, but it’s better to stick to their contracts or all of their partners may try to do similarly, so if they have to cut Qualcomm, they will. (They can create IP and be a functional company from the money they’re receiving from their other partners).

Qualcomm on the other hand only has ARM. They had an opportunity before the Nuvia acquisition to deal with the core issues but I think they’ve overplayed their hand.
 
I’ll have to go find it again, but it read more like while PCIe is absolutely supported, there are no system level requests for drawing graphics that leave the SoC. And there’s not a way for anyone to force them externally. A person can buy a MacPro and put a GPU in a slot, but nothing’s going to populate the GPU with data. Which also means there’s no signals for eGPU’s to use.
PCIe is much lower level than "system level request for drawing graphics". It's a memory-mapped I/O bus, so higher level concepts like "ask the GPU to draw a triangle" are implemented in terms of PCIe memory reads and writes, not extending PCIe to understand graphics.

The actual issue is that not all MMIO is equal. On x86 PC platforms, there is support for low-overhead CPU access to PCIe MMIO regions, treating them as essentially "normal" memory. On Apple's Arm platforms, you cannot use "normal" memory mapping for PCIe devices, you must use "device" mapping instead.

This isn't a problem for PCIe peripherals other than GPUs - they're fine with the reduced performance and flexibility because their MMIO regions are pretty much always exclusively for control and status registers, where you actually want the restrictions of "device" mapping. However, NVidia and AMD GPUs have both evolved around the assumption that drivers and even userspace applications should be able to map GPU memory as "Normal" memory. This causes problems on Apple Silicon, both performance and correctness.

Workarounds are possible, but there's a heavy performance cost. That's why Asahi Linux devs haven't attempted to actually implement a workaround; they know it would be a waste of time that would disappoint everyone.
 
  • Like
Reactions: HDFan and jdb8167
@crazy dave Why are you speculating, when the facts are literally out there?

Geekerwan did a microarchitectural analysis and found that the Oryon Prime Core and Oryon Performance core are entirelt different microarchitectures. It's not like a Zen5/Zen5C situation.
234youTube.jpg

If that isn't enough, die shots confirm that Oryon-L and Oryon-M aren't the same microarchitecture;
IMG-20241022-WA0026.jpg

Oryon-L = 2.1 mm²
Oryon-M = 0.9 mm²
 
  • Like
Reactions: crazy dave
@crazy dave Why are you speculating, when the facts are literally out there?

Geekerwan did a microarchitectural analysis and found that the Oryon Prime Core and Oryon Performance core are entirelt different microarchitectures. It's not like a Zen5/Zen5C situation.
View attachment 2441998
If that isn't enough, die shots confirm that Oryon-L and Oryon-M aren't the same microarchitecture;
Finally, some useful info, instead of nonsense from shills. Thank you.

So the M core is clearly new; good to see that they actually have built a smaller core. Is the L core the same as the core in the SXE, or is it a new generation?

Is the SD8G4 2L+6M (using their nomenclature)?
 
So the M core is clearly new; good to see that they actually have built a smaller core. Is the L core the same as the core in the SXE, or is it a new generation?
Qualcomm is advertising the Oryon CPU in 8 Elite as 2nd gen. (X Elite has 1st gen Oryon CPU).

During the announcement event, they said that "The 2nd generation Oryon CPU is redesigned from the ground-up for mobile".

23334Viewer.jpg

There is roughly 2x efficiency uplift from 1st Gen Oryon CPU (X Elite) -> 2nd gen Oryon CPU (8 Elite).

The process node upgrade N4P -> N3E alone cannot explain this huge uplift. It suggests that they made design changes too.

Indeed, Geekerwan found exactly that;


Compared to Oryon (X Elite), Oryon-L (8 Elite) has several microarchitectural differences;

SoCX Elite8 Elite
P-coreOryonOryon-L
L1i192 KB128 KB
ROB entries650679
INT scheduler entries120157
FP scheduler entries192209

These numbers are taken from Geekerwan'a video. Highly recommended to watch (it has English subtitles).
⁸Is the SD8G4 2L+6M (using their nomenclature)?
So far, none of their marketing materials mention the L/M nomenclature. But evidently, this nomenclature is being used internally by engineers.
 
Last edited:
Qualcomm is advertising the Oryon CPU in 8 Elite as 2nd gen. (X Elite has 1st gen Oryon CPU).

During the announcement event, they said that "The 2nd generation Oryon CPU is redesigned from the ground-up for mobile".

There is roughly 2x efficiency uplift from 1st Gen Oryon CPU (X Elite) -> 2nd gen Oryon CPU (8 Elite).

The process node upgrade N4P -> N3E alone cannot explain this huge uplift. It suggests that they made design changes too.

Indeed, Geekerwan found exactly that;

Compared to Oryon (X Elite), Oryon-L (8 Elite) has several microarchitectural differences;

These numbers are taken from Geekerwan'a video. Highly recommended to watch (it has English subtitles).

So far, none of their marketing materials mention the L/M nomenclature. But evidently, this nomenclature is being used internally by engineers.
Fascinating, and thanks for the add'l info in the later edits.

Indeed, if they're getting anything remotely resembling 2x efficiency, that's a huge generational uplift, as you're right, N4P->N3E doesn't come anywhere near that level of improvement.

This is even weirder than Apple pushing out the M4. Something must have gone really wrong with the first-gen Oryon for it to have shipped so late that the second gen is nipping on its heels, with such a huge improvement. But clearly, something went just as right with the second gen. I wonder if we'll ever know that story?

They're still not caught up with the M4 - getting that last 10% of performance is a very heavy lift - but if they manage to add a good SME implementation on the next gen, they may get there! At least, for single-core, which I did not expect to see happen any time soon. Of course by then the M5 should be out, but still... that would be a *huge* achievement.

I wonder how soon we'll see an SXE2? Some people would be quite upset to see such their first-gen crushed quite so badly, but if they want to keep any momentum against the latest Intel and AMD chips, they need to move VERY fast.

If anyone has specINT for the SD8G4 L-core from a reputable source like Geekerwan (not some QC marketing shill) please post it. I'm especially interested to se how that sub-bench stacks up against the M4.
 
If anyone has specINT for the SD8G4 L-core from a reputable source like Geekerwan (not some QC marketing shill) please post it. I'm especially interested to se how that sub-bench stacks up against the M4.
From the Geekerwan video I referenced;

P-core SPEC INT/FP comparison
23_YouTube.jpg

Efficiency wise in SPEC2017INT, the 8 Elite is between Apple A15 and A16. In FP, the 8 Elite is between A17 Pro and A18 Pro.

E-core SPEC INT/FP comparison;
52_YouTube.jpg
02_YouTube.jpg

It is clear why Qualcomm is advertising the Oryon-M cores as "performance cores" in the marketing materials. Unlike the efficiency cores of A18 Pro (A18-E 2.2 GHz) or Dimensity 9400 (2.4 GHz Cortex A720), the Oryon-M core scales upto higher clock speeds, higher performance, and higher power consumption levels.

At iso-power vs the A18-E, Oryon-M is quite close in terms of efficiency, particularly in FP.

Geekbench 6 Multi-core;
7_YouTube.jpg

The 8 Elite's efficiency is bit better than A18 Pro at the low power point, and equal at the high power point.

GPU efficiency (3D Mark Steel Nomad Light);
455578eje.jpg
 
@crazy dave Why are you speculating, when the facts are literally out there?

Geekerwan did a microarchitectural analysis and found that the Oryon Prime Core and Oryon Performance core are entirelt different microarchitectures. It's not like a Zen5/Zen5C situation.
View attachment 2441998
If that isn't enough, die shots confirm that Oryon-L and Oryon-M aren't the same microarchitecture;
View attachment 2441999
Oryon-L = 2.1 mm²
Oryon-M = 0.9 mm²
Hi @Mahua, thanks for the information! The reason I didn't find this is I thought there was only one Geekerwan video from 5 days ago which I thought was odd since it was shorter with less information than their usual videos and it had a different host. I hadn't realized they'd already uploaded a second video 4 days ago. I even said "hopefully Geekerwan will cover it" (wrt to the GPU I think, but still). Now that I know that there was a second video, I realize someone had even linked me that second video, but I mistakenly thought it was the same one and didn't really pay attention to it. I'll chalk it up to a head cold and sleep deprivation making me more confused than usual. :) Also all the written material from the typical news aggregator sites I was reading was frustratingly vague and didn't make this clear.

BTW did you mean to reply to me here? The speculation you refer to was stuff I'd written in a different forum, not here. Just to let you know, I don't check into this account as much and wouldn't have even known you had replied for a few days more unless @Confused-User had told me on the other site. Again though, I appreciate you grabbing all the screenshots and laying all this out, truly.
 
  • Like
Reactions: Mahua
Efficiency wise in SPEC2017INT, the 8 Elite is between Apple A15 and A16. In FP, the 8 Elite is between A17 Pro and A18 Pro.
Right. Not caught up yet. Though if they implement a good SME unit they will make up a lot of ground.

It is clear why Qualcomm is advertising the Oryon-M cores as "performance cores" in the marketing materials. Unlike the efficiency cores of A18 Pro (A18-E 2.2 GHz) or Dimensity 9400 (2.4 GHz Cortex A720), the Oryon-M core scales upto higher clock speeds, higher performance, and higher power consumption levels.

At iso-power vs the A18-E, Oryon-M is quite close in terms of efficiency, particularly in FP.
That's impressive, compared to everything non-Apple.

I don't think I would characterize the M as "close" to the A18 E-core. However, the O-M core can apparently ~double performance at a cost of quadrupling power. I don't think anyone knows what Apple's E core can do if you feed it 4x more power, but it's academic as we don't have that option, AFAIK.

These results are entirely in line with reports that the SD can beat Apple MC performance at iso-power; by running 6 cores at 2/3 the power of each of Apple's 4 E cores, QC gets a favorable position on the P/E curve, getting a bit more performance at the same power as Apple.

Geekbench 6 Multi-core;
The 8 Elite's efficiency is bit better than A18 Pro at the low power point, and equal at the high power point.
This lines up perfectly with the other data.

GPU efficiency (3D Mark Steel Nomad Light);
Unfortunately, this doesn't really tell us much - we already know the QC GPU has more basic crunch than the A18. What would be really interesting is a cross-platform benchmark that is optimized for both Android and iOS, that displays the same quality/level of detail on both platforms (unlike most major games, which do more work on iOS), that uses the GPU like a major game title would. I'd also like to see some GPU compute benchmarks.
 
If anyone has specINT for the SD8G4 L-core from a reputable source like Geekerwan (not some QC marketing shill) please post it. I'm especially interested to se how that sub-bench stacks up against the M4.
Oops. Crap. Aside from the minor spelling error, I left out what subtest I was interested in. It's the compiler test. Very curious to know how that compares to the A18pro.

@Mahua, thanks very much for pulling out all this info.
 
  • Like
Reactions: Mahua
Right. Not caught up yet. Though if they implement a good SME unit they will make up a lot of ground.
SPEC2017 doesn't use SME. It's Geekbench 6.3 that uses SME.

8 EliteA18 ProA18
Advantage
SPEC2017 INT8.910.8~20%
SPEC 2017 FP14.016.0~15%
Geekbench 6.3 ST32003600~12%

There is a disparity - the gap between 8 Elite and A18 Pro is larger in SPEC2017 than in Geekbench 6.3, when it should be the other way around! After all, A18 Pro has SME whereas 8 Elite does not, and Geekbench 6.3 does utilise SME. So shouldn't A18 Pro beat 8 Elite by a larger margin in Geekbench 6.3 ?

The following information comes from a Qualcomm engineer:

"Geekerwan's using NDK binaries on SPEC for Android which will have an inherent handicap vs iOS shared runtime libraries
it's fine given that this represents the userspace experiences between the OS', however if you would really want to look at just µarch you'd deploy glibc+jemalloc binaries on Android get get somewhat of a similar allocator behavior to what iOS does in which case the competitive performance differences here are going to be fundamentally smaller

Geekbench difference is smaller because of the way it's built counteracts some of these differences at the moment, and actually if you run 6.2 (non-SME) that's probably as close as you can reasonable get for a 1:1 µarch comparison"

Unfortunately, this doesn't really tell us much - we already know the QC GPU has more basic crunch than the A18.

This is Steel Nomad Light, a more modern, more complex and compute-heavy benchmark than Wildlife Extreme or GFXBench Aztec Ruins.

Also the Snapdragon 8 Elite upgrades to a new GPU architecture: Adreno 8 series. It should be an improvement over previous generations in that regard.
 
Last edited:
SPEC2017 doesn't use SME. It's Geekbench 6.3 that uses SME.
Of course. I didn't say spec would go faster. GB is a significant figure of merit, for better and for worse, and they'll do better on that with SME.

There is a disparity - the gap between 8 Elite and A18 Pro is larger in SPEC2017 than in Geekbench 6.3[...]
Yes, it was well known that SXE (and now SD8) underperform on spec. The explanation you provided is quite interesting.

This is Steel Nomad Light, a more modern, more complex and compute-heavy benchmark than Wildlife Extreme or GFXBench Aztec Ruins.

Also the Snapdragon 8 Elite upgrades to a new GPU architecture: Adreno 8 series. It should be an improvement over previous generations in that regard.
My understanding, which is not nearly as good here as for CPU, is that that is still relatively low-complexity compared to what high-end games may do, much less GPGPU stuff.
 
  • Like
Reactions: crazy dave
Of course. I didn't say spec would go faster. GB is a significant figure of merit, for better and for worse, and they'll do better on that with SME.


Yes, it was well known that SXE (and now SD8) underperform on spec. The explanation you provided is quite interesting.


My understanding, which is not nearly as good here as for CPU, is that that is still relatively low-complexity compared to what high-end games may do, much less GPGPU stuff.
As far as I can tell Steel Nomad Light is the equivalent to Steel Nomad, just rendered at 1440P instead of 4K. They designed the test to not be CPU limited (where Apple would have a clear advantage). If anything I think the test is GPU memory bandwidth sensitive (I get better scores upping my vram speeds on Windows than upping my GPU clock).
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.