Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

nquinn

macrumors 6502a
Jun 25, 2020
829
621
And remeber the 16” with intel still needs a dgpu that is also power hungry

My guess is they will release a version this round with just integrated graphics and maybe add a more robust dgpu later. The big question is if they would ever go to nvidia/amd for a dgpu again, or if they will only scale up their apple silicon.
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
My guess is they will release a version this round with just integrated graphics and maybe add a more robust dgpu later. The big question is if they would ever go to nvidia/amd for a dgpu again, or if they will only scale up their apple silicon.
No, they will not go to nvidia or amd. They’ve been working on a separate GPU for a long time, which will sit in the same package as the CPU die.

The only machine that could conceivably involve a conventional third-party DGPU is the mac pro, and even there I seriously doubt it.
 

nquinn

macrumors 6502a
Jun 25, 2020
829
621
No, they will not go to nvidia or amd. They’ve been working on a separate GPU for a long time, which will sit in the same package as the CPU die.

The only machine that could conceivably involve a conventional third-party DGPU is the mac pro, and even there I seriously doubt it.
Ya, I agree with this, though the main problem is to match top tier gpu's they would need to scale up their AS chips by something like 30x for mac pro.

For the 14"/16" though I think they will already be at around mobile 1050 to 1060 levels, not quite 2060/3080 mobile levels so probably near 0% chance they use any 3rd party for the laptops. They are only off by maybe 30-50% for those I think which they can probably hit by just scaling up.
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
Ya, I agree with this, though the main problem is to match top tier gpu's they would need to scale up their AS chips by something like 30x for mac pro.

For the 14"/16" though I think they will already be at around mobile 1050 to 1060 levels, not quite 2060/3080 mobile levels so probably near 0% chance they use any 3rd party for the laptops. They are only off by maybe 30-50% for those I think which they can probably hit by just scaling up.

The rumors are 16x the number of GPU cores, plus who knows how much faster each of those cores will be.
 

quarkysg

macrumors 65816
Oct 12, 2019
1,247
841
Ya, I agree with this, though the main problem is to match top tier gpu's they would need to scale up their AS chips by something like 30x for mac pro.

For the 14"/16" though I think they will already be at around mobile 1050 to 1060 levels, not quite 2060/3080 mobile levels so probably near 0% chance they use any 3rd party for the laptops. They are only off by maybe 30-50% for those I think which they can probably hit by just scaling up.
For graphics rendering, I think Apple’s GPU using TBDR with UMA can outperform a lot of IMR dGPUs as the PCIe bus bottleneck is holding the dGPUs back. That’s the reason why I think most high end dGPUs have massive VRAMs, to mitigate somewhat the bottleneck of the PCIe bus.

Apple is most definitely upping the UMA bandwidth from its existing 68GB/s for the next Mac SoC, and I think it’ll be at least double what the M1 has. With more GPU cores and a larger pipe, graphics rendering prowess will increase. I see a lot of YouTube videos of games running on the M1 Macs with 60 FPS, and many are thru Rosetta 2. Quite impressive for an entry level SoC. I would think most current games can achieve good frame rates with proper optimistation for the M1.

As for GPU compute, it’ll probably be trailing the top end dGPUs like the RTX3090, but the Apple Silicon has other compute engines like the NPUs that could already produce 11 TOPS according to Apple, which IMHO, is not used at all at the moment by third party developers, and probably also due to Apple’s limited APIs to take advantage of it. From what I read, the NPU is essentially a matrix computation unit, so I imagine the NPU and the GPU of the Apple Silicon can be combined together to achieve quite high level of compute performance, without the overhead of PCIe bus memory copy.
 

nquinn

macrumors 6502a
Jun 25, 2020
829
621
For graphics rendering, I think Apple’s GPU using TBDR with UMA can outperform a lot of IMR dGPUs as the PCIe bus bottleneck is holding the dGPUs back. That’s the reason why I think most high end dGPUs have massive VRAMs, to mitigate somewhat the bottleneck of the PCIe bus.

Apple is most definitely upping the UMA bandwidth from its existing 68GB/s for the next Mac SoC, and I think it’ll be at least double what the M1 has. With more GPU cores and a larger pipe, graphics rendering prowess will increase. I see a lot of YouTube videos of games running on the M1 Macs with 60 FPS, and many are thru Rosetta 2. Quite impressive for an entry level SoC. I would think most current games can achieve good frame rates with proper optimistation for the M1.

As for GPU compute, it’ll probably be trailing the top end dGPUs like the RTX3090, but the Apple Silicon has other compute engines like the NPUs that could already produce 11 TOPS according to Apple, which IMHO, is not used at all at the moment by third party developers, and probably also due to Apple’s limited APIs to take advantage of it. From what I read, the NPU is essentially a matrix computation unit, so I imagine the NPU and the GPU of the Apple Silicon can be combined together to achieve quite high level of compute performance, without the overhead of PCIe bus memory copy.
This new 16" macbook is gonna be ridiculous :)
 

xraydoc

Contributor
Oct 9, 2005
11,030
5,489
192.168.1.1
Apple might release an updated Intel Mac Pro (since I suspect the Mac Pro will be the last machine in the lineup to go Apple Silicon), but I'm 99.9999999999999999999% certain that no other Intel Macs will ever be released.
 

Kung gu

Suspended
Oct 20, 2018
1,379
2,434
Ya, I agree with this, though the main problem is to match top tier gpu's they would need to scale up their AS chips by something like 30x for mac pro.

For the 14"/16" though I think they will already be at around mobile 1050 to 1060 levels, not quite 2060/3080 mobile levels so probably near 0% chance they use any 3rd party for the laptops. They are only off by maybe 30-50% for those I think which they can probably hit by just scaling up.
the M1 is already at 1050 mobile levels and thats using a 8 core chip.

Using more clocks and a bigger bus they can easily reach 2060 lvls with 16 core GPU
 

Sydde

macrumors 68030
Aug 17, 2009
2,563
7,061
IOKWARDI
Apple did release some new 68K models more than a year after the PowerPC transition began and carried them forward another year. It is worth noting, though, they had scores of models with numbers and trim code letters and goofy names. The purchase of NeXT brought the simplification system that did away with all that confusing nonsense.

The Intel transition was a line in the sand. No new PowerPC Macs appeared after the Intel models came out. It may also be worth noting that there were no "Pro" models during the PowerPC era, so they could well decide to drop "Pro" entirely. With Rosetta, I seriously doubt that there will be any new Intel Macs at all.
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
Apple did release some new 68K models more than a year after the PowerPC transition began and carried them forward another year. It is worth noting, though, they had scores of models with numbers and trim code letters and goofy names. The purchase of NeXT brought the simplification system that did away with all that confusing nonsense.

The Intel transition was a line in the sand. No new PowerPC Macs appeared after the Intel models came out. It may also be worth noting that there were no "Pro" models during the PowerPC era, so they could well decide to drop "Pro" entirely. With Rosetta, I seriously doubt that there will be any new Intel Macs at all.
Agreed. At best they may offer updated configurations on a couple of lingering models (the Mac Pro is the likeliest), but they aren’t going to come out with new Intel machines.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,679
As for GPU compute, it’ll probably be trailing the top end dGPUs like the RTX3090, but the Apple Silicon has other compute engines like the NPUs that could already produce 11 TOPS according to Apple, which IMHO, is not used at all at the moment by third party developers, and probably also due to Apple’s limited APIs to take advantage of it. From what I read, the NPU is essentially a matrix computation unit, so I imagine the NPU and the GPU of the Apple Silicon can be combined together to achieve quite high level of compute performance, without the overhead of PCIe bus memory copy.

Careful here. NPU is a specialized processing unit and its TFLOPs don't mean much in the grand scale of things, unless all you care about is multiplying limited-precision matrices. The NPU cannot be used for general-purpose computation. And 11 TFLOPS is nothing too impressive in the grand scale of things. An RTX 2060 has over 50 "tensor" (matrix) TFLOPS.

Besides, the NPU is absolutely getting used by developers. It's just it's hidden by Apple API and it's not really clear what it can do. It seems to be specialized on a subset of signal processing and won't run every ML model. Apple also has AMX accelerators that is another (more flexible?) matrix multiplication unit, and they also support ML workloads on the GPU as well.
 

quarkysg

macrumors 65816
Oct 12, 2019
1,247
841
Careful here. NPU is a specialized processing unit and its TFLOPs don't mean much in the grand scale of things, unless all you care about is multiplying limited-precision matrices. The NPU cannot be used for general-purpose computation. And 11 TFLOPS is nothing too impressive in the grand scale of things. An RTX 2060 has over 50 "tensor" (matrix) TFLOPS.

Besides, the NPU is absolutely getting used by developers. It's just it's hidden by Apple API and it's not really clear what it can do. It seems to be specialized on a subset of signal processing and won't run every ML model. Apple also has AMX accelerators that is another (more flexible?) matrix multiplication unit, and they also support ML workloads on the GPU as well.
I don’t think Apple is advertising 11 TFLOPS but 11 TOPS. So it could be integer ops as well. Anyway, I’m not familiar with these ops, but just think that the NPU has other potential other than for ML type of applications, tho. I understand most NN are matrix multiplications. So for applications in graphics or any field that need high thruput matrix computation, the NPU would be a good target to exploit.

I understand AMX are the CPUs co-processor where if developers need low latency matrix computation, it’ll be useful. But for high thruput matrix ops, it’s probably not suitable, as it’ll tie down the CPU with the matrix ops. NPU are likely useful where we need bulk matrix computation but there’s a need set up and program the NPU cores, so not good for low latency responses.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,679
I don’t think Apple is advertising 11 TFLOPS but 11 TOPS. So it could be integer ops as well. Anyway, I’m not familiar with these ops, but just think that the NPU has other potential other than for ML type of applications, tho. I understand most NN are matrix multiplications. So for applications in graphics or any field that need high thruput matrix computation, the NPU would be a good target to exploit.

ML often relies on reduced precision making these multiplication units fairly useless for general purpose matrix operations.

I understand AMX are the CPUs co-processor where if developers need low latency matrix computation, it’ll be useful. But for high thruput matrix ops, it’s probably not suitable, as it’ll tie down the CPU with the matrix ops. NPU are likely useful where we need bulk matrix computation but there’s a need set up and program the NPU cores, so not good for low latency responses.

AMX is the high-throughput "general purpose" matrix unit. It doesn't really tie down the CPU since it's a separate unit that uses dedicated hardware, the CPU can speculatively run other stuff in the meantime while waiting for the AMX to be done. I think the big giveaway is that Apple does not use the NPU for it's ML compute framework at all — it's either AMX or the GPU. This makes me think that the NPU lacks flexibility and is only designed to accelerate a limited set of use cases such as image classification and audio stuff.

For bulk matrix multiplication you can also use the GPU. Metal has built-in matrix multiplication intrinsics that rely on the SIMD nature of the GPU. I am not quite sure how the AMX performance compares to the GPU. If I understand Dougall Johnson's documentation correctly, AMX is capable of doing around 1.6 FP32 TOPS per second. You might be able to beat it with the GPU if your data set is very large. It's probably best to just use Apple's Accelerate for matrix multiplication anyway.
 
Last edited:

quarkysg

macrumors 65816
Oct 12, 2019
1,247
841
This makes me think that the NPU lacks flexibility and is only designed to accelerate a limited set of use cases such as image classification and audio stuff.
It’s an awful waste of die space tho. if the NPU has limited uses.
 

Kung gu

Suspended
Oct 20, 2018
1,379
2,434
It’s an awful waste of die space tho. if the NPU has limited uses.
the NPU is useful in some cases and where it is useful it is REALLY good.

one example would in Photoshop’s Super Resolution where AI is used to sharpen the image.

in the video below The intel i7 11th gen took 10 minutes and the M1 took 13 seconds for the Photoshop test. Skip to 13:23 for the test.

 

leman

macrumors Core
Oct 14, 2008
19,521
19,679
It’s an awful waste of die space tho. if the NPU has limited uses.

Not at all. First, NPU does not take much space at all. Second, it allows advanced, high-performance image and audio processing with very low energy expenditure. It’s the reason why iPhones can do on-device image classification or low-light photography without breaking a sweat. If you used the GPU for that you’d end up burning ten times more energy and using five times as long. ML is used a lot across modern Apple ecosystem, so dedicated hardware is absolutely justified. NPU is one of the reasons why M1 seems to punch way above it’s weight in things like video editing.
 

Sydde

macrumors 68030
Aug 17, 2009
2,563
7,061
IOKWARDI
Whatever the NPU is used for, the API obfuscates it. The cores or the GPU can run the same code, and I have seen at least one anecdote by a programmer saying that it was not possible to determine which subsystem their ML processes were running on (that is, they could get snapshot data showing that information, but it might run on a different subsystem in other system-wide circumstances). Which is probably a good thing, since the OS can make functional prioritization of resource usage.
 
  • Like
Reactions: JMacHack
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.