Apple dedicated GPU

leman · Sep 28, 2022

mode11 said:
This. Plus, let's not forget that whatever the current performance / Watt advantages, the primary motivation for moving Macs to Apple Silicon is to lower costs / increase profits. Not just because they make the chips themselves, but because porting macOS to ARM means it can increasingly share development with their main operating system, iOS.

And to differentiate their products from the rest of the PC market. By moving to their own hardware platform Apple can do what they want and make the Mac different enough. Makes it much more difficult for others to copy.

mode11 said:
Because Nvidia makes GPUs with massive heatsinks, whereas Apple makes energy-sipping SoCs. Can't see Apple putting an RTX3090's (or even a 3060's) worth of firepower on the same SoC die as a load of CPU cores, neural engines and whatever else. Not unless the die will be the size of a beer mat and consume 500W.

Don’t see any problem here. There are many chips on the market smaller than M1 Ultra that draw 300+ watts. Cooling such chips is a solved engineering problem. Additionally, Apple’s performance to power ratio is always going to be higher. A substantial chunk of power for these big dGPUs goes to the RAM which has to be run as very high frequencies. Apple instead can dedicate most of that power to compute. And let’s not forget that they will continue having the node advantage for the foreseeable future. There is enough headroom there.

The thing is, if Apple wants to be competitive in the desktop space they will have to up the frequency and power at some point. It’s just not realistic to compete with a 300W GPU by limiting yourself to 80W, no matter how much better your tech is, nor does it make much sense. Sure, in the laptop space the limitations are hard constrains and there Apple will continue to reign supreme, but not on desktop. So they either have to scrap their desktop market (or limit themselves to energy-efficient home and office computers) or up the power.

EntropyQ3 · Sep 28, 2022

leman said:
The thing is, if Apple wants to be competitive in the desktop space they will have to up the frequency and power at some point. It’s just not realistic to compete with a 300W GPU by limiting yourself to 80W, no matter how much better your tech is, nor does it make much sense. Sure, in the laptop space the limitations are hard constrains and there Apple will continue to reign supreme, but not on desktop. So they either have to scrap their desktop market (or limit themselves to energy-efficient home and office computers) or up the power.

I think this is a reasonable analysis of the situation. And my honest belief is that Apple will not follow the Win/x86 market up to kW power draws. While Steve Jobs is long dead, it is not in Apples DNA or desktop tradition.

The vast majority of that market is tech-enthusiast gaming, a market segment Apple has problems adressing for reasons that are much more difficult to resolve than mere hardware performance.

Both AMD and Intel are now quite up front about their desktop offerings being targeted at gaming. Professional usage barely merits a mention at their presentations. It exists, but the volume is modest. Also - and this is important for Apple - professionals care about the ergonomics of their workspace. So they can adress professionals (and of course amateur photographers, videographers et cetera) with nice desktop offerings. That market segment doesn’t necessarily need or even want the most powerful and power hungry systems available. They want reliable tools that they enjoy using.

If that niche is interesting to Apple beyond the Studio form factor, I believe Apple will still maintain a similar ethos - capable systems in nice form factors with terrible fan curves.

deconstruct60 · Sep 28, 2022

leman said:
. And let’s not forget that they will continue having the node advantage for the foreseeable future. There is enough headroom there.

Is that really a 'sure thing' ? MediaTek Dimension 9000 on TSMC N4 shipping a quarter ago before Apple's N4 A16. It actually has a working 5G modem on N4. The M2 shipped in June about a year after TSMC N5P came online. So pretty far off the bleeding edge. There are a couple of rumors that the M2 Pro is on N5P when ... M1 Pro was on basically the same thing.

Before they got ensnared in the Trade war battles with patent embargoes, HiSilicon(Hauwei ) was signed up for advanced nodes at the same space. Intel ,unsurprisingly, stumbled , but they too were signed up to roll out iGPU on N3 at the bleeding edge.

Nvidia and AMD ? Yes. For next year or so. Their is cadence is longer ( don't have an "every September" big dog and pony show to do.). However, will Apple get off of N4 before AMD moves the Zen 4 APU onto N4? Maybe , maybe not at this reduced adoption pace for the M-series ( if big die M1 -> M2 is a sideways update).

If the big die M2 is on TSMC N3 then there is a bigger gap.

There is a front page story today about Apple balking at TSMC prices...

https://www.macrumors.com/2022/09/28/apple-refuses-to-accept-chip-price-hike/

Apple isn't going to pay anything to stay out in front. M2 on N5P is illustrative. ( I suspect Apple may be haggling over excess N3 capacity that TSMC wants them to soak up 'sooner rather than later'. But if it more so driven by the seeing the 'bust' coming after the scarcity boom. There will be lots of wafer slots for folks with money ( Qualcomm, AMD , etc) to move up if they want for specific products. ).

Additionally, RAM density reductions are going slower than logic density reductions. So the "way bigger than the other guys" caches won't get as much help with newer nodes. It is going to be tougher to say "way bigger" than everyone else going forward.

leman said:
The thing is, if Apple wants to be competitive in the desktop space they will have to up the frequency and power at some point. It’s just not realistic to compete with a 300W GPU by limiting yourself to 80W, no matter how much better your tech is, nor does it make much sense. Sure, in the laptop space the limitations are hard constrains and there Apple will continue to reign supreme, but not on desktop. So they either have to scrap their desktop market (or limit themselves to energy-efficient home and office computers) or up the power.

That's hyperbole. The vast bulk of the desktop market is not sitting on 300W dGPUs. The Mini is doing fine (actually way better than it was GPU wise given its constraints). The 24" iMac isn't pressed.... although if Apple goes into Rip Van Winkle mode because can't walk and chew gum at the same time , and coasts the iMac on the M1 for 3 years ... yeah that could be an issue.

The Mac Pro is not Apple's desktop line up by revenue or unit volumes by a huge percentage. If Apple chops the Mac Pro unit numbers in half with some self limitations that won't substantively move the overall Mac revenues numbers much at all. It didn't from 2013-2019. It won't now either.

mode11 · Sep 28, 2022

leman said:
A substantial chunk of power for these big dGPUs goes to the RAM which has to be run as very high frequencies. Apple instead can dedicate most of that power to compute.

Is there any reason that dGPUs can't use HBM too (as various Radeons have)? Especially for workstation-level cards.

leman said:
And let’s not forget that they will continue having the node advantage for the foreseeable future. There is enough headroom there.

Sure, they will likely get the smallest nodes at TSMC before others. Though with Nvidia now jumping from 8nm to 5nm with Lovelace, so the gap is closing quite a bit.

leman said:
The thing is, if Apple wants to be competitive in the desktop space they will have to up the frequency and power at some point. It’s just not realistic to compete with a 300W GPU by limiting yourself to 80W, no matter how much better your tech is, nor does it make much sense. Sure, in the laptop space the limitations are hard constrains and there Apple will continue to reign supreme, but not on desktop. So they either have to scrap their desktop market (or limit themselves to energy-efficient home and office computers) or up the power.

This is the issue all over. Their iPhone-derived SoCs absolutely kick arse in portable / thin applications. Unfortunately, the form factor that departs furthest from this, the expandable desktop, is also the very nichest of their products. It's therefore extremely unlikely Apple will diverge much from their current SoC offerings, and the worry is they'll essentially not bother. I guess we'll find out by the end of this year (or perhaps not - knowing Apple we might not hear anything until WWDC).

EntropyQ3 said:
I believe Apple will still maintain a similar ethos - capable systems in nice form factors with terrible fan curves.

Throughout the last couple of decades, they've always had pretty good CPUs (PPC limitations aside, but not through lack of effort), and relatively weak GPUs. Apple have always liked compact, quiet form factors, and the requirements of cooling a hot GPU on an expansion card directly fly in the face of that. Apple have traditionally happily forfeited gaming performance for form-factor (a reasonable trade-off for most professional use), and with the rise of GPGPU, are trying to side-step it with the use of dedicated functional blocks like the Media Engine. High-end real-time 3D is increasingly being used in mainstream media production, however.

deconstruct60 · Sep 28, 2022

leman said:
To be honest, I do not share your pessimism. Apple currently generally offers higher RAM bandwidth per unit of GPU compute throughput than existing dGPUs (e.g. 40GBs per TFLOP for M1 Pro/Max/Ultra, ~25GBs per TFLOP for 3090, ~10GBs per TFLOP for 4090 etc.) so they still seem to have some headroom here (sure, some of this bandwidth is consumed by the CPU but that's just going to be some crumbles if you are going for a massive GPU workload). And they can still get more out of LPDDR5 by using the 8533 MT/s variant. For example a hypothetical Ultra based on the 5-core GPU cluster layout would have 10240 compute cores and memory bandwidth comparable to the RTX 4090. If Nvidia believes that 1TB/s is enough for 16k shader cores at 2.2Ghz, who am I to argue otherwise? The same hypothetical Ultra at, say, 2.2Ghz GPU clock could deliver ~45TFLOPs of sustained FP32 performance and would probably still consume under 200W...

I think that is a somewhat shallow system analysis. I don't think Apple has the headroom you think they do. What you are likely ignoring is the L3/System Cache churn that having the diverse CPU , NPU , Video Decode , PCIe/TB DMA , SSD controller , and relatively quite large Display Controllers ( different workloads with different width access patterns ) have on cache evictions. You are just looking this from the perspective of just the GPU matters and that the SoC doesn't have to walk and chew gum at the same time. That is nice for micro benchmarks. It doesn't necessarily play so well in real life.

Some of the 'headroom' is likely going into the eviction backfills for data that other processor wants because it has a different access patterns than the GPU. Sharing everything is dual edged. Upside is cut down copies while you can. Downside is have to share limited resources. Multiple addresses map to a cache line area. To evictions down some of the data isn't going to go through the cache on the M-series SoC that could go through the cache on the dGPU (less conflicting heterogenous traffic).

Somewhat related is the tile cache memory. If have to replicated redundant data into multiple caches then "headroom" is degraded into making some duplicates on fills. There is overhead along with the headroom. Pragmatically, it isn't all 'excessive wasted' bandwidth.

NT1440 · Sep 28, 2022

deconstruct60 said:
I think that is a somewhat shallow system analysis. I don't think Apple has the headroom you think they do. What you are likely ignoring is the L3/System Cache churn that having the diverse CPU , NPU , Video Decode , PCIe/TB DMA , SSD controller , and relatively quite large Display Controllers ( different workloads with different width access patterns ) have on cache evictions. You are just looking this from the perspective of just the GPU matters and that the SoC doesn't have to walk and chew gum at the same time. That is nice for micro benchmarks. It doesn't necessarily play so well in real life.

Some of the 'headroom' is likely going into the eviction backfills for data that other processor wants because it has a different access patterns than the GPU. Sharing everything is dual edged. Upside is cut down copies while you can. Downside is have to share limited resources. Multiple addresses map to a cache line area. To evictions down some of the data isn't going to go through the cache on the M-series SoC that could go through the cache on the dGPU (less conflicting heterogenous traffic).

Somewhat related is the tile cache memory. If have to replicated redundant data into multiple caches then "headroom" is degraded into making some duplicates on fills. There is overhead along with the headroom. Pragmatically, it isn't all 'excessive wasted' bandwidth.

Without being a master of this topic, I’ve seen you reference Apple Silicon not being able to “walk and chew gum at the same time”. Can you provide some real world examples where this problem has come up?

I ask because I see a lot of users post from a deep understanding of what the industry as a whole has been doing for years and then look at some deeper diving analysis where it’s discovered that Apple is already doing X,Y, or Z in alternate way.

Can you provide some of the real limitations you’re seeing that you’ve been commenting on, and maybe dumb it down just a shade for those of us that don’t have the deepest knowledge on these topics?

mode11 · Sep 28, 2022

deconstruct60 said:
To evictions down some of the data isn't going to go through the cache on the M-series SoC that could go through the cache on the dGPU (less conflicting heterogenous traffic).

Although you're likely talking about something a bit different here, is the traditional model of having a large pool of memory attached to the GPU not such a problem in practice then? I.e., is the majority of it used for texture storage etc., which the CPU doesn't particularly need access to? And when it does, 16 lanes of PCIe 5.0 should provide quite a bit of bandwidth to access VRAM (though latency may be an issue).

Boil · Sep 28, 2022

The N3X process should be what a M3 "All The Cores" Mac Pro is based on...

Higher transistor density, higher clock rates, higher power limits...

IF Apple decides to make ASi Mac Pro-specific SoCs, I am talking about SoCs ONLY available in the ASi Mac Pro; well, I feel they could produce a DCC workstation that blows everything else out there clean out of the water, especially once third-party DCC software developers get fully on board with Metal & TBDR...!

leman · Sep 28, 2022

deconstruct60 said:
I think that is a somewhat shallow system analysis. I don't think Apple has the headroom you think they do. What you are likely ignoring is the L3/System Cache churn that having the diverse CPU , NPU , Video Decode , PCIe/TB DMA , SSD controller , and relatively quite large Display Controllers ( different workloads with different width access patterns ) have on cache evictions. You are just looking this from the perspective of just the GPU matters and that the SoC doesn't have to walk and chew gum at the same time. That is nice for micro benchmarks. It doesn't necessarily play so well in real life.

How much do we know about how the SLC on Apple Silicon works? I’m fairly certain for example that display controllers bypass the cache etc. The entire setup just might be a bit smarter than what you give it credit for.

But regardless… workloads that push the GPU to its limits while simultaneously requiring a lot of memory access for other components are very rare in practice. So again, I don’t see much problem here. And even if you are right, it’s not like the situation for Nvidia is that any better. Their GPU caches sizes are pitiful related to the size compute engine. So even if Apple can’t use the SLC at all for the GPU they still have tons more available bandwidth per GPU core.

leman · Sep 28, 2022

mode11 said:
Is there any reason that dGPUs can't use HBM too (as various Radeons have)? Especially for workstation-level cards.

Sure, and they do. But that’s super niche and these products are very expensive ( more expensive than what Apple offers). And you’ll find that workstation class GPUs usually have lower clocks because, well, reliability matters

mode11 said:
This is the issue all over. Their iPhone-derived SoCs absolutely kick arse in portable / thin applications. Unfortunately, the form factor that departs furthest from this, the expandable desktop, is also the very nichest of their products. It's therefore extremely unlikely Apple will diverge much from their current SoC offerings, and the worry is they'll essentially not bother. I guess we'll find out by the end of this year (or perhaps not - knowing Apple we might not hear anything until WWDC).

Yeah, that’s why I wouldn’t be surprised if they forego high-performance desktop altogether, at least in its traditional form. Business-wise, focusing on energy efficient ultra compact desktop is not stupid. The mini is perfect for the office and the Studio is a champion for photo/video editors thanks to media engine etc. Maybe that’s the master plan.

exoticSpice · Sep 28, 2022

deconstruct60 said:
Is that really a 'sure thing' ? MediaTek Dimension 9000 on TSMC N4 shipping a quarter ago before Apple's N4 A16. It actually has a working 5G modem on N4. The M2 shipped in June about a year after TSMC N5P came online. So pretty far off the bleeding edge. There are a couple of rumors that the M2 Pro is on N5P when ... M1 Pro was on basically the same thing.

I assure you that TSMC N3 or N3E will be Apple first to implement. N4 was still part of the N5 family and not real node jump like N3E.

Colstan · Sep 29, 2022

leman said:
Yeah, that’s why I wouldn’t be surprised if they forego high-performance desktop altogether, at least in its traditional form. Business-wise, focusing on energy efficient ultra compact desktop is not stupid. The mini is perfect for the office and the Studio is a champion for photo/video editors thanks to media engine etc. Maybe that’s the master plan.

I'm going to tell my grandchildren of the hard days past, when energy prices were sky-high, inflation ruined the global economy, and we nearly froze during a harsh winter. Our only way of sustaining the body warmth we desperately needed was to huddle around the RTX 4090 and 13900K in our custom gaming PC, furnished inside an anime-themed case, the flashing LEDs our sole light source in the darkest of times. Otherwise, we would have surely perished on this bleak, blasted plane of meager existence.

Apple concentrated on small form factor desktops, thus dooming the company, for they did not provide the life-giving wattage that we needed to survive during this most woeful epoch.

leman · Sep 29, 2022

Hey, how dare you insult anime-themed cases!

mode11 · Sep 29, 2022

leman said:
Sure, and they do. But that’s super niche and these products are very expensive ( more expensive than what Apple offers). And you’ll find that workstation class GPUs usually have lower clocks because, well, reliability matters

Nothing's more expensive than what Apple offers*. If someone's got the budget for a Mac Pro, they could have their pick of the PC market's prosumer / workstation graphics cards. HBM is relatively rare, but that could change if prices come down and / or VRAM power consumption becomes a significant issue. Though the latter is less important when you have the surface area of a PCIe card to deal with the heat, then when it's concentrated in an SoC package.

*Well, for the level of performance.

leman said:
Yeah, that’s why I wouldn’t be surprised if they forego high-performance desktop altogether, at least in its traditional form. Business-wise, focusing on energy efficient ultra compact desktop is not stupid. The mini is perfect for the office and the Studio is a champion for photo/video editors thanks to media engine etc. Maybe that’s the master plan.

I wouldn't be surprised either, but not because the high-performance desktop is some relic of past that Apple has the vision to see beyond. Regardless of whether that's true, Apple is ultimately guided by what's profitable for them to build. There's no question that sticking to laptops and iMacs would capture the majority of the Mac market, with a chip design that has a high degree of commonality across all their products (including, crucially, those running iOS).

The issue is whether the lack of a machine with high-end GPU capabilities hurts the Mac platform in the long term. Apple's 2017 'mea culpa' event suggested they thought it would (hence the massively expandable 2019 MP, with multiple MPX GPU slots). Certainly for those working in 3D visualisation, PC workstations are a far more economical way of delivering high performance. Businesses also appreciate long-term road maps. Keeping your cards close to your chest is fine in the consumer space, but we don't even know if the next Mac Pro will accept graphics cards. How can you plan around that?

playtech1 · Sep 29, 2022

I'm enjoying the technical discussion, but I agree it's the business issues that will win out: a powerful Mac Pro-only dGPU won't sell enough to make it worth Apple abandoning its SoC philosophy.

If keeping the GPU on the SoC presents an insurmountable wall to reach the peak of performance, I am pretty sure Apple will simply not climb that wall as it doesn't want or need to be in every market segment.

leman · Sep 29, 2022

playtech1 said:
I'm enjoying the technical discussion, but I agree it's the business issues that will win out: a powerful Mac Pro-only dGPU won't sell enough to make it worth Apple abandoning its SoC philosophy.

If keeping the GPU on the SoC presents an insurmountable wall to reach the peak of performance, I am pretty sure Apple will simply not climb that wall as it doesn't want or need to be in every market segment.

I think this is a great summary.

Colstan · Sep 29, 2022

I asked our favorite Opteron architect where he sees the M-series for the Mac in another half-decade, with say an M6 or M7. Cliff's response:

Obviously it would be complete speculation, and it’s not really something I’ve even thought about. But that many generations out, if I had to guess, I think, at least for Macs, we’d see much bigger packages with much more powerful GPUs that live on their own die. I’d expect more heterogenous compute units across-the-board, but I don’t know how that will actually shake out because I don’t know enough about trends in software. Maybe there will be three levels of CPU, maybe there will be much bigger ML components, etc. More and more of the auxiliary die are going to end up in the package. Really, I expect the packaging to become as important as the die. Bandwidth is Apple’s focus, and they will find ways to get data into and out of the package much faster, and across the package faster too.

mi7chy · Sep 29, 2022

Meier hasn't been active in silicon design industry since 2006 vs someone like Jim Keller who's worked since 2006 at Palo Alto Semi, Apple, AMD, Tesla, Intel and currently at AI Tenstorrent so has more comprehensive, accurate and up to date perspective.

Colstan · Sep 29, 2022

mi7chy said:
Meier hasn't been active in silicon design industry since 2006 vs someone like Jim Keller who's worked since 2006 at Palo Alto Semi, Apple, AMD, Tesla, Intel and currently at AI Tenstorrent so has more comprehensive, accurate and up to date perspective.

Keller is the guy who can't stay in one place for more than a few years, thinks that CISC and RISC are basically the same, and has turned into a cult hero for internet nerds who desperately need an idol to worship. He's a smart guy, but there are plenty in the industry, the tech media just latched on him for whatever reason. It's obvious you don't like Maier, and it's even more obvious why, so you'll forgive me for summarily dismissing your opinion on him.

leman · Sep 29, 2022

Colstan said:
thinks that CISC and RISC are basically the same

I still have difficulty understanding why these notions are useful. Haven’t they’ve been rendered obsolete by the decades of CPU advancements?

mode11 · Sep 29, 2022

As I understand it, modern CISC cores are essentially RISC inside, with little-used CISC instructions automatically broken down into multiple, more commonly used instructions.

One of ARM's advantages is that its instructions are all the same length, though, whereas x86 ones aren't. This apparently vastly simplifies the instruction decoding, which enables ARM decoders to be much wider than the equivalent on x86 (for which 4-wide is the practical limit).

mi7chy · Sep 29, 2022

Colstan said:
Keller is the guy who can't stay in one place for more than a few years

Now that I think about it, I remember Meier making that same statement.

You make that sound like a bad thing. Have you ever worked in tech industry? Ask anyone and they'll tell you that's a common strategy to climb the ladder, pay, benefits, work location, etc.

You've only mentioned Meier a million times. Didn't mean to offend any special relationship stating the obvious about resume experience once.

https://forums.macrumors.com/search/3673473/?q=cliff&c[users]=Colstan&o=relevance

leman · Sep 29, 2022

mode11 said:
As I understand it, modern CISC cores are essentially RISC inside, with little-used CISC instructions automatically broken down into multiple, more commonly used instructions.

One of ARM's advantages is that its instructions are all the same length, though, whereas x86 ones aren't. This apparently vastly simplifies the instruction decoding, which enables ARM decoders to be much wider than the equivalent on x86 (for which 4-wide is the practical limit).

My problem with RISC and CISC is that these notions as commonly used conflate multiply things. You mention fixed-length encoding as an advantage for ARM, and yet many ARM CPU are broken down into multiple operations on modern CPU which would makes them CISC like. In contrast Apple GPU uses variable length encoding but the instructions appear to be executed directly and immediately etc.

If someone tells me that CPU is RISC or CISC they don’t really tell me anything. Besides, the CPU and the ISA are two different things.

leman · Sep 29, 2022

mi7chy said:
You make that sound like a bad thing. Have you ever worked in tech industry? Ask anyone and they'll tell you that's a common strategy to climb the ladder, pay, benefits, work location, etc.

I can’t believe that a day would come where I would agree with mi7chy. And yet here we are 😁

Colstan · Sep 30, 2022

mi7chy said:
You've only mentioned Meier a million times. Didn't mean to offend any special relationship stating the obvious about resume experience once.

I don't have any special relationship with Maier, I just appreciate his advice. I will accept that you didn't mean to offend, and apologize for being overly harsh myself. I am used to you being, simply put, a pain in the ass for years now. You seem to be attempting to improve your demeanor around here, so I will give you credit for that. I hope that continues and no further misunderstandings will take place. I will attempt to be more even-tempered in the future, if you will do the same. It's easy to forget that we are speaking to living, breathing human beings on the other side of the screen.

Yes, I have worked in the tech industry, and job switching happens often. My overall point with Keller is that he seems to be given an outsized influence in the tech press and a subset of internet nerds. This is not at all his fault, I don't think he decided to be a semiconductor superstar. It was going to be somebody, and it just happened to be him. I do take issue with his "CISC vs. RISC = no difference" claims, but that's another matter entirely.

Apple dedicated GPU

macrumors Core

macrumors 6502a

macrumors G5

macrumors 68000

macrumors G5

macrumors P6

macrumors 68000

macrumors 68040

macrumors Core

macrumors Core

Suspended

macrumors 6502

macrumors Core

macrumors 68000

macrumors 6502a

macrumors Core

macrumors 6502

Suspended

macrumors 6502

macrumors Core

macrumors 68000

Suspended

macrumors Core

macrumors Core

macrumors 6502

Our Staff