Intel Alder Lake vs. Apple M1

thenewperson · Jan 6, 2022

JMacHack said:
Refusal to open source their Linux drivers, forcing game devs to use unoptimized code for AMD gpus to get partnered, (by not culling out of frame polys specifically on AMD hardware), gimping performance on previous gen hardware by removing optimizations that already existed, forcing PhysX to use the cpu on AMD cards, and then pushing partnered devs to use it more. GPP (which thankfully died), Gsync (which required an NVidia chip of course), as opposed to freesync which didn’t.

Sources say (being fair of course) all this can be said about Apple’s efforts with Metal.

jdb8167 · Jan 6, 2022

thenewperson said:
Sources say (being fair of course) all this can be said about Apple’s efforts with Metal.

Well it is obvious that Apple isn't not releasing open source drivers for Apple silicon GPUs for lock-in. They already have lock-in since no one else will be using their SoC. Apple is keeping their GPU drivers private so that they have more flexibility to make under the covers changes as their hardware changes.

JMacHack · Jan 6, 2022

thenewperson said:
Sources say (being fair of course) all this can be said about Apple’s efforts with Metal.

Difference: Apple sells the complete machine. They have no reason to gimp other hardware because there is no other hardware. Your choice for an Apple machine is what they give you, or you go PC. For better or worse.

Likewise, they cannot intentionally gimp PCs in a similar way NVidia leverages against AMD.

In fairness, they could document their own processors better to make Open source OSes easier to port I agree.

Apple has control over Apple, NVidia can stranglehold more than just NVidia.

leman · Jan 6, 2022

jeanlain said:
To be fair, the same could be said of Apple and all its proprietary APIs, namely Metal.

I think there is an important difference: Apple has its own software platform with its unique APIs and characteristics. Nvidia does not - they sell hardware for existing platforms yet actively manipulate the ecosystem and the market to put their competitors at the disadvantage. Furthermore, Apples hardware is sufficiently different to justify new API patterns (Nvidia‘s is not), and Metal is sufficiently different from other existing approaches to justify its existence as a separate API with a unique forward-thinking design philosophy (in comparison, DX12 and Vulkan are mostly cosmetic variations of the same design).

Finally, let’s not forget that Apple tried hard to play the open standards games and got burned - multiple times. Apple was the one who developed OpenCL and donated it to Khronos, and it was Nvidia who essentially killed OpenCL by refusing to support it properly and pushing their own proprietary CUDA instead. Apple was initially on board with Vulkan working group but after other committee members refused to support its effort of creating an ergonomic API they backed out to focus on Metal.
More recently, Apple spearheaded the development of the next-gen web GPU API standard, ”donating” Metal as the basis of the upcoming WebGPU standard (you can read more about it here: https://gpuweb.github.io/gpuweb/)

senttoschool said:
Examples of this?

Well, CUDA? There was a perfectly fine GPGPU API around - OpenCL, developed by Apple, managed by Kronos and fully embraced by AMD. Instead of helping to nurture this vision of unified GPU compute API, Nvidia used their market leader position to sabotage its adoption, pushing their own CUDA instead and denying the HPC GPU market to their competitors. Of course, it was a move that made perfect sense from business perspective for them, but it did end up making things worse for everyone else - especially the users - because now tons of useful software is locked behind the parasitic CUDA.

thenewperson said:
Sources say (being fair of course) all this can be said about Apple’s efforts with Metal.

How that? Where are the examples of Apple forcing developers to sabotage their apps from running properly on other platforms? How do your statements relate to Metal specifically? For example, where would Apple withhold specific features from third-party GPUs capable of implementing these features? Metal runs great on AMD and Intel hardware btw, not because Apple has been “sabotaging” OpenGL somehow but simple because OpenGL is a terrible API by modern standards that makes is infinitely more difficult to write bug-free drivers and high-performance software.

Andropov · Jan 6, 2022

crazy dave said:
I'm not sure how @Andropov in his talkedabout post measured the difference in performance the graph but assuming his 8% faster for 11980HK is right that 45% swing almost exactly in line with the boost Intel claims from using ICC.

Just counted the pixels from the bottom of the graph to the highest mark on each SoC series. 433 pixels high for the M1 Max vs 468 pixels high for the i9 11980HK, so (468 - 433) / 433 ≈ 8% faster than the M1 Max, according to the obviously cherry-picked graph.

diamond.g · Jan 6, 2022

leman said:
How that? Where are the examples of Apple forcing developers to sabotage their apps from running properly on other platforms? How do your statements relate to Metal specifically? For example, where would Apple withhold specific features from third-party GPUs capable of implementing these features? Metal runs great on AMD and Intel hardware btw, not because Apple has been “sabotaging” OpenGL somehow but simple because OpenGL is a terrible API by modern standards that makes is infinitely more difficult to write bug-free drivers and high-performance software.

None of the W6000 series GPUs can use the RT hardware with Metal even though it is present (unless something has changed with that).

leman · Jan 6, 2022

diamond.g said:
None of the W6000 series GPUs can use the RT hardware with Metal even though it is present (unless something has changed with that).

That’s an excellent example, I didn’t think about it! Yes, true, makes one wonder. Are they simply using Navi 1 drivers? Is RT acceleration for AMD forthcoming? Did AMD simply not bother to implement hardware RT? Is Apple actively prohibiting AMD to implement these features? Or is it possible that the Metal driver model yet lacks support for HW-accelerated RT?

Andropov · Jan 6, 2022

Also, a note on Intel using the ICC compiler. By default, Xcode uses Clang with the -Os flag set for release builds. This applies all known compiler optimizations that don't typically increase code size. This makes sense, since most apps are not performance critical and having a small binary is actually a benefit (less time to download when distributing the app, for instance). You can change the compiler flags to use -O3, for example (which instructs it to apply basically all known compiler optimizations, even those that result in a higher final binary size).

This are the default settings in Xcode 13.1, so it stands to reason that those are the settings that Intel used, since they didn't say otherwise. But you could get faster code compiling with the -O3 flag, for example. So using a different compiler, one that is specifically designed for HPC, and comparing it to a 'general purpose' compiler setup, which is typically used to run regular apps, not HPC apps, is comparing two different things.

(There are a thousand other things you can change in the compiler settings here and there, this was just an example, I don't know how big the effect of using a different set of -O flags for Clang would be, my point is that comparing a specific subset of tests that favour your architecture from a bigger benchmark while also using a custom compiler tailored for your own hardware that is not even used outside HPC to claim CPU superiority is stupid).

leman · Jan 6, 2022

Andropov said:
Also, a note on Intel using the ICC compiler. By default, Xcode uses GCC as a backend with the -Os flag set for release builds. This applies all known compiler optimizations that don't typically increase code size. This makes sense, since most apps are not performance critical and having a small binary is actually a benefit (less time to download when distributing the app, for instance). You can change the compiler flags to use -O3, for example (which instructs it to apply basically all known compiler optimizations, even those that result in a higher final binary size).

This are the default settings in Xcode 13.1, so it stands to reason that those are the settings that Intel used, since they didn't say otherwise. But you could get faster code compiling with the -O3 flag, for example. So using a different compiler, one that is specifically designed for HPC, and comparing it to a 'general purpose' compiler setup, which is typically used to run regular apps, not HPC apps, is comparing two different things.

(There are a thousand other things you can change in the compiler settings here and there, this was just an example, I don't know how big the effect of using a different set of -O flags for GCC would be, my point is that comparing a specific subset of tests that favour your architecture from a bigger benchmark while also using a custom compiler tailored for your own hardware that is not even used outside HPC to claim CPU superiority is stupid).

Yep, and icc optimizes much more aggressively by default than Apple’s clang. Also, check out slide 29 here: https://crc.pitt.edu/sites/default/files/Intel Compilers Overview.pdf

They are twiddling with compiler parameters to get best performance for each benchmark! It makes sense of course, but did they go through the same rigorous parameter selection procedure when compiling the tests on M1?

diamond.g · Jan 6, 2022

leman said:
That’s an excellent example, I didn’t think about it! Yes, true, makes one wonder. Are they simply using Navi 1 drivers? Is RT acceleration for AMD forthcoming? Did AMD simply not bother to implement hardware RT? Is Apple actively prohibiting AMD to implement these features? Or is it possible that the Metal driver model yet lacks support for HW-accelerated RT?

Unless Apple asked for the TMU to only be a TMU (and not a split TMU/RT unit, which would be weird) all RDNA 2 cards support hardware RT. Now maybe Apple is treating the W6000 line as RDNA 1 in drivers but that wouldn't explain why they don't support the 6700XT (which is "just a tarted up" 5700XT).

JMacHack · Jan 6, 2022

leman said:
That’s an excellent example, I didn’t think about it! Yes, true, makes one wonder. Are they simply using Navi 1 drivers? Is RT acceleration for AMD forthcoming? Did AMD simply not bother to implement hardware RT? Is Apple actively prohibiting AMD to implement these features? Or is it possible that the Metal driver model yet lacks support for HW-accelerated RT?

If I had to hazard a guess, all hardware optimizations for non-apple hardware have halted or drastically slowed to focus on Apple Silicon. Apple probably knows that they could implement the hardware rt in the Mac Pro but are not allocating reasources.

We’ll know for sure if there’s performance degredation after a couple years whether it’s malicious or not.

diamond.g · Jan 6, 2022

diamond.g said:
Unless Apple asked for the TMU to only be a TMU (and not a split TMU/RT unit, which would be weird) all RDNA 2 cards support hardware RT. Now maybe Apple is treating the W6000 line as RDNA 1 in drivers but that wouldn't explain why they don't support the 6700XT (which is "just a tarted up" 5700XT).

Oooh I thought of another one, Primitives IIRC Metal API doesn't support them right? And Smart Access Memory (something Intel added support for).

leman · Jan 6, 2022

JMacHack said:
If I had to hazard a guess, all hardware optimizations for non-apple hardware have halted or drastically slowed to focus on Apple Silicon. Apple probably knows that they could implement the hardware rt in the Mac Pro but are not allocating reasources.

From what I understand it’s still AMD who writes Metal drivers (in cooperation with Apple of course, but still). Frankly, I think the most likely explanation is the driver model one. Since the bulk of the Mac hardware does not support hardware RT, the RT functionality is probably implemented in the common OS layer - before reaching the hardware-specific driver. This is probably the most economical approach as it would allow Apple to reuse much of the previously existing drivers (only adding new features like dynamic function linking and function pointers).

Andropov · Jan 6, 2022

leman said:
Yep, and icc optimizes much more aggressively by default than Apple’s clang. Also, check out slide 29 here: https://crc.pitt.edu/sites/default/files/Intel Compilers Overview.pdf

They are twiddling with compiler parameters to get best performance for each benchmark! It makes sense of course, but did they go through the same rigorous parameter selection procedure when compiling the tests on M1?

Just realised I said GCC 😂

Yep, it makes no sense to tweak parameters individually for each sub-test and then claim it's representative of the general performance of the CPU vs another CPUs compiled with different, non-optimized settings.

leman · Jan 6, 2022

diamond.g said:
Oooh I thought of another one, Primitives IIRC Metal API doesn't support them right?

Whats “Primitives”?

diamond.g said:
And Smart Access Memory (something Intel added support for).

It’s far from obvious whether current Intel-based Macs are in principle capable of supporting resizable BAR. And frankly, even if they were I don’t really see how one can blame Apple for not investing resources into implementatino these features for practically obsoleted machines, especially given the fact that resizeable BAR mostly benefits high-end games (and even then, marginally), which are simply not available on macOS. Anyway, Apple Silicon makes it all moot as it has real unified RAM with shared cache.

diamond.g · Jan 6, 2022

leman said:
Whats “Primitives”?

It’s far from obvious whether current Intel-based Macs are in principle capable of supporting resizable BAR. And frankly, even if they were I don’t really see how one can blame Apple for not investing resources into implementatino these features for practically obsoleted machines, especially given the fact that resizeable BAR mostly benefits high-end games (and even then, marginally), which are simply not available on macOS. Anyway, Apple Silicon makes it all moot as it has real unified RAM with shared cache.

from my understanding primitives are the AMD precursor to Mesh Shaders (which the W6000 also supports, but Apple doesn't in Metal).

That is fair on the SAM support (or lack thereof).

JMacHack · Jan 6, 2022

leman said:
Whats “Primitives”?

It’s far from obvious whether current Intel-based Macs are in principle capable of supporting resizable BAR. And frankly, even if they were I don’t really see how one can blame Apple for not investing resources into implementatino these features for practically obsoleted machines, especially given the fact that resizeable BAR mostly benefits high-end games (and even then, marginally), which are simply not available on macOS. Anyway, Apple Silicon makes it all moot as it has real unified RAM with shared cache.

I think the key caveat to “blame” is whether or not effort was spent to hamper competition. I think it’s evident that Apple allocates resources to improve their devices only, ignoring say, Linux. NVidia has, through partnership leverage, actively put forth efforts to hamper their competition.

leman · Jan 6, 2022

diamond.g said:
from my understanding primitives are the AMD precursor to Mesh Shaders (which the W6000 also supports, but Apple doesn't in Metal).

How many APIs actually support that feature? Never heard about it before. Metal also has to cater to certain common denominator, like any other API. It’s clear that Apple prioritizes features of their own hardware, after all, they are the platform’s future. My point was that it generally does not happen at the expense of other hardware or software manufacturers.

JMacHack said:
I think the key caveat to “blame” is whether or not effort was spent to hamper competition. I think it’s evident that Apple allocates resources to improve their devices only, ignoring say, Linux. NVidia has, through partnership leverage, actively put forth efforts to hamper their competition.

Great summary!

diamond.g · Jan 6, 2022

leman said:
How many APIs actually support that feature? Never heard about it before. Metal also has to cater to certain common denominator, like any other API. It’s clear that Apple prioritizes features of their own hardware, after all, they are the platform’s future. My point was that it generally does not happen at the expense of other hardware or software manufacturers.

Great summary!

Mesh Shaders? 2 (Vulkan and DX12). Primitives? Technically 2, Vulkan and PS5 API, but functionally 1.

leman · Jan 6, 2022

diamond.g said:
Mesh Shaders? 2 (Vulkan and DX12).

Metal does not support mesh shaders, probably because they are poor fit to Apple’s hardware - and virtually no Intel Mac software would support them anyway (kind of weird to implement a heavy feature for a single extremely niche available GPU option). But who knows, maybe next Metal version will offer something similar. It will most likely be Apple GPU exclusive though.

diamond.g said:
Primitives? Technically 2, Vulkan and PS5 API, but functionally 1.

What Vulkan extension is that?

diamond.g · Jan 6, 2022

leman said:
Metal does not support mesh shaders, probably because they are poor fit to Apple’s hardware - and virtually no Intel Mac software would support them anyway (kind of weird to implement a heavy feature for a single extremely niche available GPU option). But who knows, maybe next Metal version will offer something similar. It will most likely be Apple GPU exclusive though.

What Vulkan extension is that?

Actually looking at Mesh Shaders for Vulkan (there isn’t a Primitive Shader extension from what I could find, I think I was under the impression that it was AMD implementation of Mesh Shaders which existed in RDNA [and was broken in Vega]) it looks like the Vulkan implementation uses nvidia’s call. Which apparently isn’t 100% supported on AMD Hardware because nvidia implementation does some “shady stuff” [yall love the pun] That AMD Hardware isn’t expecting:

There are problems with the NV_mesh_shader extension which are not present in eg. D3D12:

The total number of output vertices is not known in runtime. D3D12 solves this with SetMeshOutputCounts which must appear before any outputs are written. NV_mesh_shader doesn't have this guarantee.

Any shader invocation can read the output of any other which is not possible in D3D12.

The NV indirect command buffer format is not supported by the hardware, so we have to emit several copy packets to make it work. Note that D3D12 uses 3D dispatches without an offset: (x, y, z) but NV_mesh_shader uses an 1D dispatch with offset: (taskCount, firstTask).

My bad I got confused there.

EntropyQ3 · Jan 6, 2022

diamond.g said:
Which apparently isn’t 100% supported on AMD Hardware because nvidia implementation does some “shady stuff” [yall love the pun] That AMD Hardware isn’t expecting:

That was … terrible.
.
😂

thenewperson · Jan 6, 2022

@jdb8167 @leman @JMacHack It was sarcasm lol

leman · Jan 6, 2022

thenewperson said:
@jdb8167 @leman @JMacHack It was sarcasm lol

Well, you definitely fooled me 😂

crazy dave · Jan 6, 2022

leman said:
Well, CUDA? There was a perfectly fine GPGPU API around - OpenCL, developed by Apple, managed by Kronos and fully embraced by AMD. Instead of helping to nurture this vision of unified GPU compute API, Nvidia used their market leader position to sabotage its adoption, pushing their own CUDA instead and denying the HPC GPU market to their competitors. Of course, it was a move that made perfect sense from business perspective for them, but it did end up making things worse for everyone else - especially the users - because now tons of useful software is locked behind the parasitic CUDA.

Overall I don't disagree about the things Nvidia did - especially in not updating OpenCl support on their cards to keep it crippled was an especially dick move on their part (which now that they've effectively killed OpenCL, they've now done ... just as Apple has deprecated OpenCL ... irony is dead and buried apparently) - but there are a couple of corrections and amendments:

1) CUDA actually predates OpenCL by just over 2 years. This gave Nvidia a huge lead. So there was not actually an open source GPGPU API to adopt (not that I think that this was would've stopped Nvidia's CUDA development had there been).

2) Like with their failure to compete successfully against DX with OpenGL, quite a bit of the blame goes to the Khronos group.

a) The biggest problem is that they didn't iterate on it quickly enough. In contrast, Nvidia (and this part is actually like Apple) leveraged their tight integration between software and hardware to iterate their features quickly.

b) Nvidia did a much better job of ensuring that that were great teaching materials, that community feedback got listened to and implemented quickly, and that there was an easy installation processes for a full, easy to use toolset.

The net result was that CUDA was simultaneously easier to use and more feature rich.

3) AMD cards, while having very good compute, had difficulty competing for much of the 2010s against Nvidia's graphics cards resulting in a smaller overall market share at all levels. Intel was supposedly going to save OpenCL with Xeon Phi, but that ultimately failed as a product to gain traction. Had Intel released serious GPUs in those days with OpenCL support, the combination of competitive Intel and AMD GPUs *might* have resulted in a different outcome. Or at least forced Nvidia to offer better OpenCL support.

I know we're getting deep off topic of Alder Lake now and, again, I don't want to minimize the things that Nvidia did that were anti-competitive. But a lot of things went wrong for OpenCL. (and this supports your overall thesis of Apple getting burned by supporting the Khronos group's open source projects)

Intel Alder Lake vs. Apple M1

macrumors 65816

macrumors 601

Suspended

macrumors Core

macrumors 6502a

macrumors G4

macrumors Core

macrumors 6502a

macrumors Core

macrumors G4

Suspended

macrumors G4

macrumors Core

macrumors 6502a

macrumors Core

macrumors G4

Suspended

macrumors Core

macrumors G4

macrumors Core

macrumors G4

macrumors 6502a

macrumors 65816

macrumors Core

macrumors 68000

Our Staff