[CPU only] Apple M1/2(Max/Ultra) (TSMC 5nm) vs AMD Zen 4 (TSMC 5nm) - Technical Analysis

name99 · Jul 14, 2023

Xiao_Xi said:
Intel is investing in SYCL to break CUDA's dominance in HPC. It is up to Apple to adopt SYCL.

It's unclear why Apple should be the one to invest in every side project that strikes the internet's fancy.
Molten Vulkan is coming along fine. HipSYCL is coming along fine.
It's up to Apple to decide which 3rd party projects of this type to support (WebGL? Blender? PyTorch?) Unless you have data to show why the cases they have supported vs the cases they have not were badly chosen, I don't think there's much useful to say.

Otherwise where does it end? Should Apple be supporting TeX? Julia? R? FFTW? Everyone can come up with their own pet project...

altaic · Jul 14, 2023

name99 said:
And important new ideas (like control flow or memory lane coalescing) are still being introduced, and in such a way that nothing is portable from one vendor to another (at least in part because until an idea has been used for a few years, it's not even clear the optimal way to access it through language/API). And look at the massive hostility in this thread to Metal, even though Metal is a better solution (ie better thought through, easier to use, better able to grow in future) to the problem of GPU programming than the alternatives.
You say you want this stuff to be abstracted -- but most people in threads like this DON'T WANT abstraction, they want their preferred tribal solution to be instantiated come what may, to hell with whether that locks into place unfortunate ideas that were obsolete five years ago...

Third option: both abstraction and specialization. Something like Clash, and hardware manufacturers would just have to release thorough low-level hardware descriptions akin to a hardware simulator for every piece of hardware they produce. But imagine having pseudo-automatic complexity and energy analysis at compile time

Facetiousness aside, Clash is obviously a domain specific language targeting FPGAs, not GPGPUs. But as evidenced by Haskell, a lot can be encoded in type systems, allowing for really clever representations and compiler optimizations for very different architectural targets. Perhaps a next gen language based on HoTT or similar will obviate architecture-specific compute languages (without being insanely complex).

dgdosen · Jul 14, 2023

Xiao_Xi said:
Such a group already exists, and it is called Khronos. SYCL is one of its standards, but the only major company that cares about it is Intel.

SYCL - C++ Single-source Heterogeneous Programming for Acceleration Offload

Enables code for heterogeneous and offload processors to be written using modern ISO C++ (at least C++ 17).

www.khronos.org

I thought The Incredibles foiled Operation Kronos in the first movie...

Xiao_Xi · Jul 14, 2023

name99 said:
It's up to Apple to decide which 3rd party projects of this type to support (WebGL? Blender? PyTorch?)

If Apple adopted more open standards, it wouldn't need to fund so many projects. For example, Apple has invested in the Blender viewport because it does not support OpenGL 4.3+ and Vulkan. Neither AMD, Intel, nor Nvidia have had to do so, because they support those standard APIs.

name99 said:
It's unclear why Apple should be the one to invest in every side project that strikes the internet's fancy.

Apple doesn't need to invest in every side project, it could opt for open standards such as SYCL or Vulkan. There are many people, especially in HPC, interested in making SYCL the standard computing API.

State of SYCL – ECP BOF Showcases Progress and Performance

Enabling interoperability across U.S. exascale supercomputers is one of the chief goals for the U.S. Exascale Computing Project (ECP), which has broadly overseen development of the early software ecosystem needed […]

www.hpcwire.com

I can understand Apple wanting to invest in its proprietary APIs because it can develop them better and faster than open standards, but it can backfire if it doesn't find enough support.

TechnoMonk said:
I am sure you are not saying windows/Mac is same as Nvidia GPU and Cuda.

I don't understand why you think it's OK for Apple to restrict macOS for their hardware but not OK to restrict Metal. What makes macOS more special than Metal?

ChrisA · Jul 14, 2023

EntropyQ3 said:
We simply cannot measure ”general performance” with a degree of precision that is adequate for the small increments in performance that todays lithographic and architectural engineers struggle hard to eke out. It can’t be done.

Yes, exactly. Our tools are too blunt to measure such subtle difference.

But if you have a specific task that you use your computer for, then you can measure the time it takes to do that one task.

That said, how many of use are limited by the speed of our computer? I do some 3d CAD work and software development, and it's been years, maybe decades, since I actually had to just sit and wait for my computer. That just does not happen anymore.

leman · Jul 15, 2023

name99 said:
and if you give a BS marketing "make the number as big as possible" response, I'll immediately tune you out...

Why would I do that? I am interested in the objective truth, not some internet dick measuring contest. If I am wrong, that would be an opportunity to learn something, which would be great.

name99 said:
- Apple's visible register space is up to 128 4B registers (or 256 2B registers) per thread. I think nV and AMD are the same. So up to 512B of register space per lane, 16K per SIMD.
- a core is the equivalent of an (current nV, not past) SM, and split into four essentially independent quadrants.
- the storage per core is 384KB per core. Which is 96kB per quadrant.
Which is 6 "full" threadblocks (using the entire set of registers) or up to 24 threadblocks (if each uses only a quarter of the maximum register space it is entitled to).

I believe these numbers are essentially 1.5x the equivalents on the nV side.

I suppose you are looking at Philip Turner's numbers? I'd be curious how he arrives at these numbers. Rosenzweig estimates the register file per core to be 208KB large (52KB per partition), so that's the minimum (and the number I personally consider more realistic). Then there is the argument with 24 threadgroups running in parallel on M1, but the details of this were never clear to me. I think this is definitely an area that needs more investigation...

At any rate, the estimates we have right now range from 52KB per partition (12KB less than Nvidia) to 384KB per partition (1.5x more than Nvidia). This is different from what I remembered and so I stand corrected.

name99 said:
I'm not sure that the "limited coordination between individual cores" matters much. What I have SEEN in this space is that most of the nV work in this area is hacks around their performance issues, followed by devs who don't understand Apple Silicon assuming those same hacks are necessary on Apple. I've seen ZERO comparison of the issue in terms of real (important) algorithms written by people who know what they are doing and have genuinely optimized for each platform comparing and contrasting.

I fully agree that it's a niche requirement. But I think this shortcoming of Apple GPUs is relevant in the context of the current discussion, as it makes them less interesting to advanced GPU programmers. Which again has the effect that certain technical discussions and advances are only focused on Nvidia GPUs.

name99 said:
Where Apple is lacking compared to nV and AMD (ie genuine, not BS, complaints):
- no support for 64b FP (just as an MSL, even if poorly performing) [STEP 1]
- no hardware support for 64b integers and FP (the integer support could be added at fairly low cost, even the multiplies if we allow them to take a few cycles; the FP support could be added as slow multicycle in the FP32 unit, or as something like a single FP64 unit shared by sixteen lanes). Along with this, decent "complete" support for 64b atomics. [STEP 2]
- coalescing of both divergent control and divergent memory access. nV has support (kinda, in a lame fashion, and for ray tracing only?) for control, nothing as far as I know for divergent memory access.

I would add support for texture atomics (which is now emulated as a fairly convoluted combination of device memory loads with some SIMD black magic — although I would be curious to know how NV does it). On the topic of divergent memory access, if I remember correctly Apple did mention a technique in a WWDC session last year. It was something about using SIMD intrinsics to reorder threads based on RT hits, but I can't remember the details.

One thing that I am not sure about is FP64 support (and in general 64-bit support). It seems like it would introduce a lot of additional complexity to support a very niche case. There is a good reason why others do it with an anaemic auxiliary processing unit (which Apple doesn't have). What I would find more interesting is addition of assist instructions that make it cheaper to perform 64-bit operations using 32bit hardware. E.g. a 64bit integer addition on my M1 Max compiles as

Code:

// 64 int summands are in r0_r1 and r2_r3, respectively
iadd             r2.cache, r0.cache, r2.discard
iadd             r1.cache, r1.discard, r3.discard
icmpsel          ult, r0l.cache, r2, r0.discard, 1, 0
iadd             r0, r0l.discard, r1.discard

which I think is a neat way to do it (also, Apple's cool icmpsel instruction really comes in handy in all kind of contexts). They could accelerate it further by implementing add with carry bit — would still be strictly 32-bit math, but 2x faster. And I suppose that similar could be done for 64-bit FP — maybe some instructions to help align mantissas etc. that would bring down the number of required instructions to around 8 or so.

name99 said:
I'd hope an Apple solution is better engineered both at the HW level to support both, and at the MSL level to indicate that a particular kernel is one where coalescing by the hardware is allowed (so not just stochastic processes like ray tracing, but also things like fancy multipole PDE solvers, where there's no natural spatial structure to the problem anyway, so coalescing is no big deal).

Their RT patents explicitly talk about the RT unit launching new compacted and coalesced threadblocks to process hits. From what I understand, this would be a completely new way to handle these things. Nvidia still uses monolithic shaders where each thread launches a request and receives a result. An analogy I like is that Nvidia's RT is like async programming (each tread starts an `await test_ray()`) while Apple is more like continuation passing style (the thread block executes `yield test_rays(&hit_epilogue_program)`). The second is obviously more efficient.

leman · Jul 15, 2023

Xiao_Xi said:
If Apple adopted more open standards, it wouldn't need to fund so many projects. For example, Apple has invested in the Blender viewport because it does not support OpenGL 4.3+ and Vulkan. Neither AMD, Intel, nor Nvidia have had to do so, because they support those standard APIs.

IF Apple adopted more open standards, they would be at the mercy of whatever the committee decides. Again, they already tried to go that route and we know very well how that ended up.

Frankly, I get more and more frustrated with Vulkan. It increasingly appears that the working group treats it as an academic exercise in abstracting hardware instead of actually being interested in making a good GPU API. The descriptor buffer extension in particular makes me angry.

Joe Dohn · Jul 15, 2023

name99 said:
Oh yes, eager extrapolation from two data points (one of which is not where Apple wanted to be, but was forced into by covid). Always a good basis for prediction...

Apple is exactly where they want and need to be, optimized for everything that matters going forward. As platforms become newer (desktop to phone to watch to vision) Apple's dominance becomes more total. And even in those apparently less sexy areas of desktop (and workstation, and, invisibly, in the cloud) Apple are laying foundations you can't even imagine.
It's all visible if you bother to look at and understand the patents. But of course most people would prefer not to do that hard work, would prefer to mock patents as meaning nothing.

Worldwide, OS X has 8.97% market share (https://gs.statcounter.com/os-market-share).
Surely it's a high number historically, but that is nowhere near desktop domination.
If Apple really wanted to achieve desktop domination, they would have to target developing markets. But they are more sensitive to price, and at least in Brazil, a Macbook can cost as much as a used car, due to 100% taxes over pricing.

While other markets don't have taxes as high as Brazil, they are still sensitive enough to pricing that a person from a developing market would have to either eat or buy a Macbook. Not a very good picture if you want markets like China, India, or Latin America.

leman · Jul 15, 2023

Joe Dohn said:
Worldwide, OS X has 8.97% market share (https://gs.statcounter.com/os-market-share).
Surely it's a high number historically, but that is nowhere near desktop domination.
If Apple really wanted to achieve desktop domination, they would have to target developing markets. But they are more sensitive to price, and at least in Brazil, a Macbook can cost as much as a used car, due to 100% taxes over pricing.

While other markets don't have taxes as high as Brazil, they are still sensitive enough to pricing that a person from a developing market would have to either eat or buy a Macbook. Not a very good picture if you want markets like China, India, or Latin America.

Market share does not equal market share. It depends what you are after. For example, iOS is half the market share of Android but twice the revenue. Do you care about users having the systems or do you care about users spending money for the system? Do you care about the system copying choices or about the system driving choices? The point is, you can have a low market share while being a trend setter go generates tremendous revenue. Historically, Apple was always an underdog, but they also managed to secure a big chunk of the premium market (where the money is), and they are definitely the trendsetter (there is a reason pretty much every premium laptop like there looks like a MacBook).

name99 · Jul 15, 2023

Xiao_Xi said:
I can understand Apple wanting to invest in its proprietary APIs because it can develop them better and faster than open standards, but it can backfire if it doesn't find enough support.

IF it doesn't find enough support. Evidence for this claim remains nonexistent...
You could make the same claim about everything related to Apple - they should be using a "standard" language like C#, not Swift; they should be using Linux not OSXl; etc etc.

This argument is every bit as tired and boring as its counterparts in every other area of Apple criticism. Apple has been doing things its way, not "the standard way", since 1976. If that upsets you, then you shouldn't be part of the Apple ecosystem, because it ain't gonna change.
Only three years till 1976, the Apple half-anniversary. Believe me, that's gonna be a whole lot of celebration of things Apple did CONTRARY to the popular wisdom, not a chance for Apple to say "we were wrong. Swift, Metal, AppKit, iCloud, ARM, it's all over; we're switching to 'standards'"...

(As a side issue, it's interesting to ponder what Apple will be announcing in 2026 as THE keystone of the year. Perhaps the car will be delayed until then?)

Pet3rK · Jul 15, 2023

Joe Dohn said:
Worldwide, OS X has 8.97% market share (https://gs.statcounter.com/os-market-share).
Surely it's a high number historically, but that is nowhere near desktop domination.
If Apple really wanted to achieve desktop domination, they would have to target developing markets. But they are more sensitive to price, and at least in Brazil, a Macbook can cost as much as a used car, due to 100% taxes over pricing.

While other markets don't have taxes as high as Brazil, they are still sensitive enough to pricing that a person from a developing market would have to either eat or buy a Macbook. Not a very good picture if you want markets like China, India, or Latin America.

Your link doesn't show desktop market share. That's the general OS marketshare and people don't use Android as the desktop OS. Here's the desktop marketshare. macOS is now at 21.32%.

altaic · Jul 15, 2023

Pet3rK said:
Your link doesn't show desktop market share. That's the general OS marketshare and people don't use Android as the desktop OS. Here's the desktop marketshare. macOS is now at 21.32%.

The macOS data there looks pretty legit, but something is screwy with their “other” data. Seems like “other” hovers around 3%, and the rest of the wild swings strongly correlate with windows (expand the graph to several years to see). Like, I’m pretty sure 10s of millions of windows users don’t switch to BeOS or something for a couple months and then switch back en masse. They should really clean up their data sources.

falainber · Jul 15, 2023

Pet3rK said:
Your link doesn't show desktop market share. That's the general OS marketshare and people don't use Android as the desktop OS. Here's the desktop marketshare. macOS is now at 21.32%.

It's not clear what statcounter data means. Firstly, their data is based on web page visits. Obviously, that approach will overlook many computers. Do they even count unique computers (and would they be even be able to do it?) or do they simply count the number of visits by every OS? Besides, their chart is called "Desktop Operating System Market Share Worldwide". What does it even mean? There is "OS X" in their chart but no macOS. Is macOS included into "other" category? Assuming OS X means macOS, do they exclude Mac laptops from this count? How? Based on PC sales data from IDC (https://www.gsmarena.com/idc_pc_market_contracts_by_29_in_q1_due_to_weak_demand-news-58194.php) , Mac market share varied from 4 to 8 percent in recent quarters and this statistic probably includes the laptops. Mac share in desktops might be significantly lower.

Xiao_Xi · Jul 15, 2023

name99 said:
IF it doesn't find enough support. Evidence for this claim remains nonexistent...

So far, Metal has received a lot of support. We will see in the future if this trend continues or if open standards like SYCL or Vulkan end up becoming the industry standard.

quarkysg · Jul 15, 2023

Xiao_Xi said:
So far, Metal has received a lot of support. We will see in the future if this trend continues or if open standards like SYCL or Vulkan end up becoming the industry standard.

IMHO, the one that gains the most support will be the one that makes the developer the most income. Technological superiority is secondary.

Pet3rK · Jul 16, 2023

falainber said:
It's not clear what statcounter data means. Firstly, their data is based on web page visits. Obviously, that approach will overlook many computers. Do they even count unique computers (and would they be even be able to do it?) or do they simply count the number of visits by every OS? Besides, their chart is called "Desktop Operating System Market Share Worldwide". What does it even mean? There is "OS X" in their chart but no macOS. Is macOS included into "other" category? Assuming OS X means macOS, do they exclude Mac laptops from this count? How? Based on PC sales data from IDC (https://www.gsmarena.com/idc_pc_market_contracts_by_29_in_q1_due_to_weak_demand-news-58194.php) , Mac market share varied from 4 to 8 percent in recent quarters and this statistic probably includes the laptops. Mac share in desktops might be significantly lower.

That IDC percentage is based on shipments divided by the number of total shipments. So, 4.1M/56.9M = 0.072. Multiplying it by 100 will give us 7.2%. So, apple to oranges comparison since you are quoting shipment which is different to installed base. I suggest reading the article more slowly. In their website, they recognize macOS and OS X as one. Plus, macOS and OS X isn't a big jump compared to MacOS 9. "Desktop Operating System" should answer your question and I'm dumbfounded how can it be confusing since it is self-explanatory. Per Apple, we know that MacBooks takes the largest share of Mac shipments so it's stupid to think it's separate. And why create another category for macOS notebooks but not Windows notebooks. Is macOS on a desktop different than macOS on notebooks now?

Xiao_Xi · Jul 16, 2023

quarkysg said:
the one that gains the most support will be the one that makes the developer the most income

What about open source projects for scientific computing such as GROMACS or OpenFOAM?

quarkysg · Jul 16, 2023

Xiao_Xi said:
What about open source projects for scientific computing such as GROMACS or OpenFOAM?

Same thing if you ask me. The one that provides the most economic sense.

leman · Jul 16, 2023

falainber said:
Mac market share varied from 4 to 8 percent in recent quarters and this statistic probably includes the laptops. Mac share in desktops might be significantly lower.

”Desktop OS” category usually includes laptops as opposed to “mobile OS” (phones and tablets). When I personally use the term “desktop” I am also referring to laptops. I just don’t see much point in distinguishing stationary and portable desktop computers, especially considering that stationary computing has been in continuous decline for years.

Xiao_Xi · Jul 16, 2023

quarkysg said:
The one that provides the most economic sense.

Most open source projects for scientific computing focus on Linux-based HPC servers with Nvidia GPUs and anything else is an afterthought. For instance, most of them are available on macOS and not on Windows because it is much easier to port them to macOS than to Windows.

name99 said:
This argument is every bit as tired and boring as its counterparts in every other area of Apple criticism. Apple has been doing things its way, not "the standard way", since 1976.

I can only understand you making such a claim if you are an Apple investor. Proprietary closed source only benefits the company that promotes it, and harms everyone else.

name99 said:
If that upsets you, then you shouldn't be part of the Apple ecosystem, because it ain't gonna change.

Where can I get the card that gives me the right to decide who can or cannot belong to the Apple ecosystem?

quarkysg · Jul 16, 2023

Xiao_Xi said:
Most open source projects for scientific computing focus on Linux-based HPC servers with Nvidia GPUs and anything else is an afterthought. For instance, most of them are available on macOS and not on Windows because it is much easier to port them to macOS than to Windows.

The way I see it is because Linux is free and most scientific computing tasks are done by academics where budgets are tight. I’m not so sure for commercial scientific projects tho. Those probably value support and uptime rather than the ease of tinkering provided by commodity equipments.

Xiao_Xi said:
I can only understand you making such a claim if you are an Apple investor. Proprietary closed source only benefits the company that promotes it, and harms everyone else.

I think you are quite an advocate of free software. A balance need to be struck between closed and open source. No for profit company will willingly give up their secret sauce to the world. As much as I like open source projects, I also respect companies trying to one up their competitors. This is how breakthrough progress will be made. Besides, patents are not forever, so any breakthrough made now will eventually end up in the public domain.

As always, nothing is absolute.

Xiao_Xi · Jul 16, 2023

quarkysg said:
A balance need to be struck between closed and open source. No for profit company will willingly give up their secret sauce to the world. As much as I like open source projects, I also respect companies trying to one up their competitors.

I am in favor of open standards. Open standards are not the same as open source projects, but open standards enable open source projects. In fact, there are several examples of open source and commercial projects that support the same open standard. For example, macOS and Linux support POSIX.

Closed proprietary solutions hinder competition. Instead of competing over which company can develop better GPUs and compilers, AMD needed to create a tool that would convert CUDA code into HIP code by changing cuda to hip every time there is an API call. Do you really think AMD couldn't run CUDA code directly if it wasn't legally prohibited?

HIPIFY/docs/tables/CUDA_Runtime_API_functions_supported_by_HIP.md at amd-staging · ROCm/HIPIFY

HIPIFY: Convert CUDA to Portable C++ Code. Contribute to ROCm/HIPIFY development by creating an account on GitHub.

github.com

quarkysg · Jul 16, 2023

Xiao_Xi said:
I am in favor of open standards. Open standards are not the same as open source projects, but open standards enable open source projects. In fact, there are several examples of open source and commercial projects that support the same open standard. For example, macOS and Linux support POSIX.

Closed proprietary solutions hinder competition. Instead of competing over which company can develop better GPUs and compilers, AMD needed to create a tool that would convert CUDA code into HIP code by changing cuda to hip every time there is an API call. Do you really think AMD couldn't run CUDA code directly if it wasn't legally prohibited?

HIPIFY/docs/tables/CUDA_Runtime_API_functions_supported_by_HIP.md at amd-staging · ROCm/HIPIFY

HIPIFY: Convert CUDA to Portable C++ Code. Contribute to ROCm/HIPIFY development by creating an account on GitHub.

github.com

Isn’t that back to the issue of who would be willing to invest in a superior solution if others benefit from one‘s work?

I don’t have a solution, but it’s obvious Open Standards, while it sounds good, is not the solution.

Xiao_Xi · Jul 16, 2023

quarkysg said:
Isn’t that back to the issue of who would be willing to invest in a superior solution if others benefit from one‘s work?

Open standards work well when the companies concerned do not believe that their own solution can give them a competitive advantage. For example, Apple is a member of the groups developing Matter and Passkey.

Our Members | Promoters | Participants | Adopters

Browse through our list featuring hundreds of influential companies we work with.

csa-iot.org

FIDO Alliance Members Directory | FIDO Alliance

Explore the full list of FIDO Alliance members, including industry leaders and innovators driving open authentication standards. Join the movement today.

fidoalliance.org

leman · Jul 16, 2023

Xiao_Xi said:
I am in favor of open standards. Open standards are not the same as open source projects, but open standards enable open source projects. In fact, there are several examples of open source and commercial projects that support the same open standard. For example, macOS and Linux support POSIX.

Open standards are not a panacea. We have plenty of examples where open standards fail because the group responsible for the standard is either incompetent, not agile enough, or is being manipulated. And we also have examples of closed standards, which are well managed and successful. It's not about closed vs. open, it's about whether the people involved in it act in good faith and what they ultimately want to achieve.

I used to be an activist for open standards for GPU, but in the meantime I became disillusioned by how these large committees work. I just don't see anything good coming out from these large bodies, be it Khronos, or the C++ committee, or even the Rust Foundation. It's all drama, bureaucracy, and ultimately mediocre results. Personally, I think the best way forward is for every GPU vendor to cook their own APIs, with open source umbrella frameworks that lower to the GPU-specific API.

[CPU only] Apple M1/2(Max/Ultra) (TSMC 5nm) vs AMD Zen 4 (TSMC 5nm) - Technical Analysis

macrumors 68030

macrumors 6502a

macrumors 68030

macrumors 68000

macrumors G5

macrumors Core

macrumors Core

macrumors 6502a

macrumors Core

macrumors 68030

macrumors member

macrumors 6502a

macrumors 68040

macrumors 68000

macrumors 65816

macrumors member

macrumors 68000

macrumors 65816

macrumors Core

macrumors 68000

macrumors 65816

macrumors 68000

macrumors 65816

macrumors 68000

macrumors Core

Our Staff