Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
I am not convinced it means what people think it means. In particular, moving to 2.5D or 3D designs is not guaranteed to make the die arrangement more flexible. What I can see however is a larger GPU complex enabled by stacking.

Considering M4 Max gets hotter than the previous chips separating the CPU and GPU would allow more thermal headroom and higher clock speeds?
 
Considering M4 Max gets hotter than the previous chips separating the CPU and GPU would allow more thermal headroom and higher clock speeds?

Not an expert, you’d have to ask someone in semiconductor manufacturing. I’ve read that there are some new technologies that can help with power management, like back-side power delivery, but again, not an expert.

What I can add is that these chips are really low-power compared to the industry average. M4 Max is what, 100 watts or so at peak power draw? Yet others use 200+ watt systems.

I see two main advantages to stacked packages. First, they can pack more stuff in the same area. If you overlap dies, you can compress a large planar design into a fairly compact space. Second, they can help optimize manufacturing costs by leveraging different nodes simultaneously. N3 is very expensive and does not add much advantage to some of the larger structures like caches or memory controllers. By splitting them off to an N5 die, for example, one could save quite a bit of the premium N3 die manufacturing capability. There is also the idea of modularity, I’m not sure how feasible it is in practice since you’d have to maintain different production lines.
 
I am not convinced it means what people think it means.
Kuo is worse than Gurman.

BTW, the author of that article has a delightful YouTube channel. He recently did an unboxing video for a Mac Studio and it's the best unboxing video I've seen.

Back to the "stacking" issue:

The problem is cost. Now instead of one die you've basically got two. TSMC has stated it will cost more, like the 2nm process. Both will raise the price of the SoC for the same die size.

So Apple will likely shrink the die size for the M5 Pro and M5 Max. Thus, more sellable chips per wafer.

TSMC has stated they will be able to do 3nm on top of 5nm this year.

What you put on the 5nm are those things that cannot take advantage of the 3nm process. (Some things can't be shrunk well.) All those I/O controllers on SoC, for example.

Kuo is likely wrong about the processors, but perhaps the Efficiency cores can be put on the 5nm process as they run at only a bit over 1GHz anyway.

This leaves the top (3nm process) for the Performance cores, the Graphics cores, and the hardly-nobody-really-uses-it "Neural" engine.

If Apple does shrink the area for cost cutting, then the M5 Pro and M5 Max may have the same core-specs they did in the M4. The customers who buy for pride may not like that. (See the M3 Pro brouhaha.) The customers who buy on price will.
 
I’m not sure how feasible it is in practice since you’d have to maintain different production lines.
I believe (but cannot be sure) that TSMC will use what looks like one line. That's why they had to build a new plant and fab to do this. The tricky part is the so-called "through silicon" connections. So the wafer takes a trip down a very long path.

TSMC has opened multiple new plants in Taiwan. For the 2.5D and for the 2nm processes.

Allegedly TSMC has claimed that they will be able to do 2nm on top of 3nm in 2026, though I wouldn't look for it before late 2026, given the upheaval in global economies.
 
I don't think they're "falling" behind, Apple Silicon has always been behind in terms of GPU.
 
PrYou guys need to get away from benchmarks and other meaningless crap and start using the machine for work.

The Apple Silicon Macs; desk and laptop SMOKE ANY Windows machine on the market at nearly everything that’s utilizes a GPU for processing - LLMs, Rendering/Exporting video, DNA folding - running a bazillion lines of code, or using ANY of MS’s, Adobe’s, or Google’s suites/apps.

The M series Macs destroy 99% of GPU laden tasks unrelated to gaming, as gaming isn’t a profitable business model. As well, most buy consoles to play, where games generally release first, cost $500 for the entire machine and get a redesign once a decade - not annually.

Xbox plays Flight Simulator, and well on my series X, has a marketplace and plenty of peripheral hardware to use with XBox; yoke, rudder pedals, throttle quadrants with trim, flaps, gear, and plenty of mappable switches, hats, buttons and triggers. Yes, PC looks a little better on a 5090 @ 10 times the price (for a system) - but I no longer am a kid without responsibilities, and a few years from retirement. Perhaps I will invest in a PC for a Sim Pit when done working - but if you need to create; photos, stills, music (NO PC has EVER run Logic faster than an M series Mac, not a Hackintosh, or any machine using x86 - guaranteed), art (with iPad as Sidecar), or writing, building presentations (PP is a joke compared to Keynote), and a thousand other tasks
And it didn’t take long for the big players to join Apple by updating their software. Whether I’m doing a project in After Effects or Lightroom, Final Cut or Logic, Excel or Audition - my MacBook M4 Pro/Pro smokes my PC laptop with a 4080 and Intel 14 series CPU. And without the noise of a jet engine with a few Chrome tabs open… no 2 pound wall wart or single (or zero) Thunderbolt I/O - a port that with a single wire run to a dock can charge your computer while offering another pair of Thunderbolt controllers, HDMI 2.1, 2.5GB/s ethernet, SDXCm and audio I/O, 3 or 4 USB A 3.1, another USB C 3.2, and DisplayPort or eSATA, and no external power supply, it’s built in! A Single cord to the dock and you’re still able to use the other two TB5 ports on the computer, it’s HDMI, SD, and audio ports.

Windows doesnt have a fraction of the volume of games available on iOS/iPadOS, and those are HUGE moneymakers. AAA titles are too risky anymore. They require teams of coders to build and a year or two or three to finish, with no guarantee to recoup income - and who gives two turds about legacy titles!

When you grow up, the days spent playing Diablo for 10 hours straight end. Other interests and responsibilities pop up. Like rent or mortgage - a girlfriend or spouse, boyfriend or car payments, a job! Kids, and dogs and — and — I could go on forever as to why the millions of games in the App Store for iOS and iPadOS are far more conducive to my schedule as an adult than the hundreds of hours spent on much lesser RPGs and racing games, flight sims and the other Sims. They ALL look, play and work well, reliably and are in significantly more genres and niche formats for any and everyone. For every HALO or Spider-Man, there’s dozens of Angry Birds and Subway Surfers coded by individuals or small teams who are able to make their dreams a reality - by learning Swiift - C+ - some JavaScript and start building your app.

Today’s smartphone is 90+% of people’s primary computer today - the phone almost a tertiary capability of the device. It’s our everything with power that destroys supercomputers of just a decade or two ago, and scaling that RISC architecture up to the lap and desktop machines after a decade of hiring the brightest nVidia, AMD, and other chip engineers to build their own in house SoC was a brilliant move and a revolutionary shift in computer manufacturing. No more highways or cheese boards with separate system RAM, CPU slot, PCIe lanes, and GPUs that use 500 watts and cost more than a couple MacBook Airs.

Most folks - EVEN here, on MacRumors, aren’t capable of running an M4 machine to it’s limits - and I’m talking about a small bump to 24GB of RAM and 1TB storage with a Pro chip, not Max…. You aren’t doing anything capable of beachballing the last couple generations of laptops - some can, few - VERY few is that number though, of local LLM builds, folding DNA and compiling operating system size code builds. There are maybe a dozen or two 3D modelers that frequent the site, real, true animators, and rendering farm supervisors that will benefit from the power in these machines.

Will the 5090 slay the MBP playing Red Dead Redemption? Yep.

Does anyone really care? I mean, an XBOX is 3-500 bucks and a PS is $500. You can game all ya want and only have to upgrade every ten years!!!!

Meanwhile your MacBook Pro is still worth 50% of what you bought it for on the used market a decade later! Try that with an MSI!
 
Because Apple has no interest in gamers.

:D "no interest in gamers"


They're also brining over several AAA titles.

Although they may not have the amount of interest in gamers that you would like, they clear ARE interested in gamers to some extent. You've obviously seen the news about the AAA titles and you're obviously aware of the porting kit. You've seen the addition of ray tracing capabilities right?
 
The problem is cost. Now instead of one die you've basically got two. TSMC has stated it will cost more, like the 2nm process. Both will raise the price of the SoC for the same die size.

Not if the disaggregated dies cost less overall. E.g. if the new packaging allows them to use 30% smaller N3 dies, and the cost of packaging is less than the N5 die, the entire thing can be economically advantageous. Not to mention they they could build a better product (more area for compute, more caches etc.)

Kuo is likely wrong about the processors, but perhaps the Efficiency cores can be put on the 5nm process as they run at only a bit over 1GHz anyway.

This leaves the top (3nm process) for the Performance cores, the Graphics cores, and the hardly-nobody-really-uses-it "Neural" engine.

Putting CPU cores on different dies is probably not the best idea, and E-cores are super tiny. But stuff like the display engine, I/O, and maybe even the SLC + memory controllers could easily go onto a separate die.
 
I don't think they're "falling" behind, Apple Silicon has always been behind in terms of GPU.

Exactly. If anything, they are catching up.

Let's look at Blender benchmarks. M1 Max was 5.5x slower than its contemporary Nvidia desktop flagship RTX 3090. The M4 Max is 2.9x slower than RTX 5090. Since 2000, Apple improved the performance by a factor of ~5x, Nvidia ~3x. I expect this distance to shrink down further as Apple implements their new tech.
 
Big picture, I think Apple does a good job with its laptop offerings relative to NVIDIA, but falls short in comparison to their desktop line.

Yes, the 5090 laptop GPU is significantly more powerful than the top M4 Max GPU in the MBP. But the former requires significant compromises, namely a hot, noisy laptop with limited battery life. Most people want a more balanced package, and AFAIK only the MBP offers that with a top-end GPU.

Where Apple continues to fall short is in desktop GPU's. You should be able to build a 5090 desktop PC with much less significant compromises. Yes, it's going to be bigger and hotter than the M3 Ultra Studio. But with sufficient cooling it should be reasonably quiet (maybe not quite a quiet as the Studio), and battery life becomes a non-issue. And it will have significantly more GPU processing power (losing out only* in the amount of RAM available to the GPU: 32 GB for the 5070 vs. ≈0.5 TB for the M3 Ultra).

[*Here I'm referring to the video side only; I believe the M3 Ultra's CPU side equals or exceeds the other top workstation CPU's, except in max RAM, where the latter can be >512 GB.]
My guess/hope is that within the next 12 months we will see the next Mac Pro bring an interesting useful direction to high desktop performance. The M3 Ultra with 500 GB RAM available was a very interesting start.
 
I wouldn't expect Apple to put the CPU and GPU on different SoC's, since I think they'd want to keep them as close together as possible to minimize latency (though, granted, thermal management would be easier if they were on separate SoC's).

However, what they might do for their higher-end devices is to keep the CPU and GPU on the same SoC, but put the GPU on a separate die or dies. This would enable them to offer more GPU cores, and also offer more flexibility to their buyers in choosing the relative number of CPU and GPU cores for their machine.
 
Last edited:
  • Love
Reactions: throAU
Considering M4 Max gets hotter than the previous chips separating the CPU and GPU would allow more thermal headroom and higher clock speeds?
A) Physically separating chip units reduces performance (physics) and is exactly opposite to the engineering direction Apple committed to with its Unified Memory Architecture.

B) Apple clearly is not about simplistically chasing higher clock speeds with concomitant additional heat like Intel et al. have done for decades.

It will be interesting to see how Apple manages A & B as they build a new Mac Pro.
 
  • Like
Reactions: Unregistered 4U
Exactly. If anything, they are catching up.

Let's look at Blender benchmarks. M1 Max was 5.5x slower than its contemporary Nvidia desktop flagship RTX 3090. The M4 Max is 2.9x slower than RTX 5090. Since 2000, Apple improved the performance by a factor of ~5x, Nvidia ~3x. I expect this distance to shrink down further as Apple implements their new tech.
It's my understanding that a lot of the improvement in Blender perfomance on AS is due to Apple's contribution to Blender software development, making Blender better optimized to run on AS:


Thus to use Blender to assess how much AS hardware is catching up, you'd need to look at M1, M4, 3090, and 5090 on the same version of Blender (ideally the latest version, which should be well-optimized for both platforms). Is that the case with your figures?

This also points up that a lot of the deficiency in AS performance vs. NVIDIA, particularly on complex software that was originally written for PC, and then ported to AS, could be due to software optimization rather than just the hardware. A good example is AAA games. "Native version" doesn't necessarily mean "optimized"—most of those are still ports of software written from the ground up for the PC, and that has been subject to years of optimization for the PC.
 
Last edited:
  • Like
Reactions: Technerd108
A) Physically separating chip units reduces performance (physics) and is exactly opposite to the engineering direction Apple committed to with its Unified Memory Architecture.

I do not understand how it reduces performance?

It's my understanding that a lot of the improvement in Blender perfomance on AS is due to Apple's contribution to Blender software development, making Blender better optimized to run on AS:


Thus to use Blender to assess how much AS hardware is catching up, you'd need to look at M1, M4, 3090, and 5090 on the same version of Blender (ideally the latest version, which should be well-optimized for both platforms). Is that the case with your figures?

You are correct, and this is why I used the same Blender version (4.3.0) for all results.
 
  • Like
Reactions: theorist9
I do not understand how it reduces performance?
Performance is reduced

A) Because the farther that the electrons have to travel the longer it takes.

B) As components get moved farther away and off-chip they require controllers that add both complexity (extra clock cycles) and latency (start/stop time) to the process. Apple's Unified Memory Architecture is an effective methodology to reduce complexity and latency.
 
  • Like
Reactions: throAU
that's fair, so what would be a good way to go about comparing apple silicon to Nvidia gpu?

I suppose it depends on what you want to do with your computer, my m4 pro is not as performant playing games as my 1400k/rx6800 hackintosh was, but I was willing to make that trade off for the many other advantages (though I am bummed that cities: skylines 2 stopped working in whiskey even if it was choppier)

also a well done price/performance comparison would be interesting to see

high end discrete gpu's are expensive for sure, but so is adding an 80 core gpu to a Mac Studio...
I’m not sure there’s ever going to be a way to do a real apple to apples or apples to oranges comparison of GPU between Apple and NVIDIA.

NVIDIA bar none is probably always going to have the absolute fastest GPU at the end of the day. That’s their specialty, that’s what their whole business is built on, so they sort of have to be the best…their existence depends on them staying on top.

Flip side, Apple is able to get better performance per watt or with the actual total power draw of Apple silicon. The good side of this for all of us is that if Apple intends on catching up with AI, they are now under pressure to upgrade their SoC GPUs all the way around, because local LLMs are going to be a thing moving forward and more DRAM and more GPU is going to be paramount to Apple being seen a legitimate competitor in the space. I’m happy to hear arguments to the contrary, but my gut tells me AI is going to drag Apple along with it and in order for sales and margins to keep from sliding too
much, Apple is going to have to play catch up specs wise.

Just my 2¢.
 
When the M1 Max launched, Apple said that it rivaled the flagship NVIDIA GPU at the time, the RTX 3080.

However, in 2025, when comparing the M4 Max MacBook Pro to PC laptops equipped with the flagship NVIDIA GPU, the RTX 5090, which costs the same as a M4 Max MacBook Pro, the MacBook Pro gets destroyed, it is not even close.

And this is true even on battery power.

View attachment 2498773

View attachment 2498774
I'm late to the discussion and I'm sure it's already been said but I'll post my results anyway.
GeekBench 7,1 2025-02-15 at 9.22.54 PM.png
 
I am not GPU expert and my requirements for GPU are modest. However I like PC gaming and have had many gaming laptops with latest AMD and Nvidia dedicated graphics.

Those gaming laptops were great. Gaming on them was a lot of fun. Had a few Alienware, Asus ROG, Razer, and more. The problem they all suffer from was poor battery life, halving performance on Battery unless tinkerer with, lower framerates unless plugged in. At best you would get 4 hours but most 2 hours, fans going crazy.

I have never thought of Apple or MacBooks as a gaming computer. Now that Apple silicon has greatly improved GPU on pro machines the potential is there.

However, optimization of certain codecs native to Mac's like Metal and Pro res, etc. were more focused on professional graphics and photography than gaming. And now Apple is just starting to make that transition. AI is another factor where GPU and NPU are important and as others have said will push GPU development.

I can see a potential in say 5 to 10 years where Mac's become a gaming standard. Simply because of the efficiency of m series chips and architecture. This may allow up to 8 hours gaming on battery or more while barely hearing fans.

Who knows what will happen and direct comparison between Nvidia and others vs Apple will always be difficult. I don't see gaming PC'S going anywhere anytime soon. I don't see Nvidia being a second rate GPU any time soon. Certain optimized applications on Windows with dedicated GPU's will probably always be more powerful and a cheaper solution.

At the end of the day, having better iGPU is always a huge plus for Apple consumers and I am happy to see this change even being possible due to the work on my series chips. I doubt any of this gaming talk would even start to happen if Apple was still using Intel. Apple would still be using AMD graphics and it would be a bottleneck.
 
I’m not sure there’s ever going to be a way to do a real apple to apples or apples to oranges comparison of GPU between Apple and NVIDIA.

NVIDIA bar none is probably always going to have the absolute fastest GPU at the end of the day. That’s their specialty, that’s what their whole business is built on, so they sort of have to be the best…their existence depends on them staying on top.

Flip side, Apple is able to get better performance per watt or with the actual total power draw of Apple silicon. The good side of this for all of us is that if Apple intends on catching up with AI, they are now under pressure to upgrade their SoC GPUs all the way around, because local LLMs are going to be a thing moving forward and more DRAM and more GPU is going to be paramount to Apple being seen a legitimate competitor in the space. I’m happy to hear arguments to the contrary, but my gut tells me AI is going to drag Apple along with it and in order for sales and margins to keep from sliding too
much, Apple is going to have to play catch up specs wise.

Just my 2¢.
Shouldn’t AI on Apple Silicon be using the Neural Engine rather then the GPU in the SoC.

In much the way that my video software uses the media engine to encode/decode as opposed to the GPU cores.

So in effect tne GPU cores in Apple Silicon have much less to do than an Nvidia GPU in a PC? As Apple put separate engines into the SoC.
 
Performance is reduced

A) Because the farther that the electrons have to travel the longer it takes.

B) As components get moved farther away and off-chip they require controllers that add both complexity (extra clock cycles) and latency (start/stop time) to the process. Apple's Unified Memory Architecture is an effective methodology to reduce complexity and latency.

Nobody is talking about moving components far away. We are talking about multi-chip modules, which are connected with high bandwidth interfaces. And you don’t need extra controllers for vertical wires between dies. Intel builds CPUs like that and the performance didn’t go down. Same for Mx Ultra series.

The only drawback I can see (and I might be wrong) is increased power consumption.
 
Shouldn’t AI on Apple Silicon be using the Neural Engine rather then the GPU in the SoC.

They are for different purposes. The NPU is optimized for low-power inference and is ideal to power many models that Apple uses in their software (e.g. FaceTime, camera, photo classification). At the same time, its performance is limited. One could make the NPU large enough, but that would come at the expense of other units (such as the CPU and the GPU).

For high-performance ML, it makes a lot of sense to integrate accelerators into the GPU. The GPU is already a very wide parallel processor, and it has access to a lot of memory bandwidth that larger models need. Also, it’s more flexible when it comes to writing programs, which makes it great for advanced ML research and applications.

Sometimes it makes sense to replicate functionality. Apple already has four different ways to do machine learning in their systems. There is the NPU, optimized for streaming models of low to medium complexity. There is the CPU, with limited performance but advanced programmability. There is a matrix/vector CPU coprocessor that can be used to accelerate scientific and ML workloads. And there is a GPU, with thousand of processing units that can be used to accelerate large data applications. Improving the Mal capabilities of the GPU would make Macs much better value for anyone working with larger models, from data scientists and researchers to folks using language and image generation models locally. And it won’t make the NPU obsolete, as the latter is still useful for tasks where energy efficiency is important.
 
Nobody is talking about moving components far away. We are talking about multi-chip modules, which are connected with high bandwidth interfaces. And you don’t need extra controllers for vertical wires between dies. Intel builds CPUs like that and the performance didn’t go down. Same for Mx Ultra series.

The only drawback I can see (and I might be wrong) is increased power consumption.
I'm wondering if there might be some confusion due to the use of word "chip", which can refer either to the die, or to the SoC (=System-on-Chip)

I agree putting the CPU and GPU on separate, connected dies on the same SoC wouldn't add significant distance, especially if they're vertically stacked so you can have close communication at multiple edges (and maybe some in the middle). But putting them on separate chips (SoC's) would, right?

Or if you have vertically-stacked dies (which would be close), is each die considered to be on its own SoC (in which case having the dies on two separate SoC's in that sense woudn't indicate a distant arrangment)?
 
I'm wondering if there might be some confusion due to the use of word "chip", which can refer either to the die, or to the SoC (=System-on-Chip)

I agree putting the CPU and GPU on separate, connected dies on the same SoC wouldn't add significant distance, especially if they're vertically stacked so you can have close communication at multiple edges (and maybe some in the middle). But putting them on separate chips (SoC's) would, right?

Or if you have vertically-stacked dies (which would be close), is each die considered to be on its own SoC (in which case having the dies on two separate SoC's in that sense woudn't indicate a distant arrangment)?

What I can say is that in all Apple patents discussing 2.5D and 3D packaging, they refer to the package as a single SoC comprised of multiple heterogeneous dies. This is in contrast to the multi-SoC systems (Ultra), which are described using a different methodology.
 
The NPU is optimized for low-power inference and is ideal to power many models that Apple uses in their software (e.g. FaceTime, camera, photo classification). At the same time, its performance is limited. ...

...Apple already has four different ways to do machine learning in their systems. There is the NPU, optimized for streaming models of low to medium complexity. There is the CPU, with limited performance but advanced programmability.
I thought, at least for LLM's, that the GPU is the unit most performant for training, while the CPU is the unit most performant for inference, indicating that it's not always the case that the CPU has limited performance for AI. And based on what you've written, the NPU would be used as a CPU-alternative for inference when you want to optimize efficiency. Do I have that right?
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.