Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

theorist9

macrumors 68040
May 28, 2015
3,880
3,059
I think the reasonable possibilities for the Mac pro GPU is one of the following:

- a single SoC with a big GPU (e.g. four dies, around 40-50k FP32 ALUs). That’s the easiest option for Apple and it won’t really be able to challenge any of the high-end multi-GPU systems

- a single SoC with a VERY BIG GPU (multiple GPU-only tiles, a lot of cores). Very expensive, very custom, very big, but still has uses the same programming model as any other model

- multiple SoCs on separate compute boards, connected via some sort of PCIe-facilitated bus (maybe cache-coherent CLX), maybe with a shared pool of traditional RAM. This is something I’ve been thinking about for a while, as this approach would solve the issues with modularity and expandability. But it will require a new programming model that can efficiently use non-local compute clusters.
Here's an alternate way to get increased GPU power: Instead of adding separate GPU-only dies to the SoC, they could have multiple dies with the current design (CPU and GPU on the same die), but with a much higher ratio of GPU cores : CPU cores.

E.g., instead of the 2.4 GPU : 1 CPU ratio they have now, what about 10 GPU : 1 CPU? If they used enough dies to give them 40 CPU cores (i.e., the same number of CPU cores as a current 2 x Ultra), and increased the TFLOPS/GPU core by 20% over the M1's, that would give them ~200 TFLOPS, i.e., about twice what people are estimating for the future 4090 Ti. And if they also offered a 4 x Ultra with 80 CPU cores, that would give them ~ 4 x 4090 Ti. Those would be killer machines.

Would this benefit them them by maintaining close "local" CPU-GPU integration (local to each die), or hurt them by scattering the CPU cores over many different dies, rather than a few?

If they wanted to make it modular, they could offer 20 CPU cores/SoC, with the machine taking, say, up to four of these. And if they wanted to make it more customizable, they could offer SoC's with, say, two different ratios of GPU cores : CPU cores, catering to both CPU-heavy and GPU-heavy workloads.
 
Last edited:

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,663
OBX
Here's an alternate way to get increased GPU power: Instead of adding separate GPU-only dies to the SoC, they could have multiple dies with the current design (CPU and GPU on the same die), but with a much higher ratio of GPU cores : CPU cores.

E.g., instead of the 2.4 GPU : 1 CPU ratio they have now, what about 10 GPU : 1 CPU? If they used enough dies to give them 40 CPU cores (i.e., the same number of CPU cores as a current 2 x Ultra), and increased the TFLOPS/GPU core by 20% over the M1's, that would give them ~200 TFLOPS, i.e., about twice what people are estimating for the future 4090 Ti. And if they also offered a 4 x Ultra with 80 CPU cores, that would give them ~ 4 x 4090 Ti. Those would be killer machines.

Would this benefit them them by maintaining close "local" CPU-GPU integration (local to each die), or hurt them by scattering the CPU cores over many different dies, rather than a few?

If they wanted to make it modular, they could offer 20 CPU cores/SoC, with the machine taking, say, up to four of these. And if they wanted to make it more customizable, they could offer SoC's with, say, two different ratios of GPU cores : CPU cores, catering to both CPU-heavy and GPU-heavy workloads.
At 5nm are Apple's ALU's smaller than nvidia/amd at the "same size"?
 

theorist9

macrumors 68040
May 28, 2015
3,880
3,059
At 5nm are Apple's ALU's smaller than nvidia/amd at the "same size"?
Don't know -- you could estimate those values from the die sizes plus Locuza's annotated die shots on semianalysis.com, e.g.:

 

DeepIn2U

macrumors G5
May 30, 2002
13,051
6,984
Toronto, Ontario, Canada
I think you are kind of missing the joke of the post your have quoted ;)

That said “dedicated” these days is mostly an emotional label. Folks use it as synonymous to “fast” or “powerful”. Well, Apple GPUs are plenty fast. We should just retire the term altogether. For the purpose of this thread it makes more sense to talk about a “modular” or “swappable” GPU because that’s what people really mean.
I’m not so certain if we should retire the term just yet lol
 

tomO2013

macrumors member
Feb 11, 2020
67
102
Canada
I’m sharing this interview with a panel of architects at Imagination Technologies (the IP provider for much of Apples GPU).
They were major contributors back in the day (if anybody can remember 3dfx Vs PowerVR, OpenGL, Glide library, etc… these guys were and still are key contributors in the graphics scene). It’s a nostalgia look back at Imagination Tech‘s contributions over 30 years and commentary on the future of graphics.


What’s especially interesting is around 12.30 in the video (Shared above) and the commentary on the future of the GPU in the industry at large and the drive towards efficiency. They gave an example earlier of where their mobile silicon solution was rendering nVidia’s own ray trace demo at double the frame rate of the NVidia desktop card at a fraction of the power on the same scene (this goes back to 2019 demo) [8.17 in the video]. These guys have a history focussed on power efficiency. Their current CXT technology is specifically designed for scalable architectures from mobile up to high density compute data centers.

I’m convinced that Apple will utilize their ‘Photon’ ray tracing derivative technology in a future M2 Bodacious or M3 Cowabunga solution.

Enjoy :)


Edit: to provide a link to this detail white paper on their Ray tracing architecture.
 
  • Like
Reactions: singhs.apps

leman

macrumors Core
Oct 14, 2008
19,521
19,674
I’m convinced that Apple will utilize their ‘Photon’ ray tracing derivative technology in a future M2 Bodacious or M3 Cowabunga solution.

A while ago I have linked a bunch of Apple's raytracing patents which are loosely based on IMG technology but take the things a bit further. They describe a very energy- and area-efficient way to do raytracing.
 
  • Like
Reactions: spaz8

leman

macrumors Core
Oct 14, 2008
19,521
19,674

Why does Metal support both? Isn't explicit sync superior to implicit sync?

Developer convenience. Metal is designed to scale with your needs. It can do some tedious low-level plumbing stuff for you (like resource tracking, synchronisation, and memory management) or let you do it yourself. For simpler applications the provided automatic tracking is absolutely sufficient and can save you a lot of error-prone work. And as @mr_roboto says, it can help with porting code from older APIs.
 

Pressure

macrumors 603
May 30, 2006
5,179
1,544
Denmark
I'm really excited for Asahi Linux and the progress they have made with the GPU drivers!

It will be opening up gaming on Apple Silicon so much with support for Vulkan and Steam / Proton.

Apple should really be doing more than they have been doing and Microsoft should also get their act together on AArch64 support. The hardware is capable.
 
  • Like
Reactions: sauria

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
It will be opening up gaming on Apple Silicon so much with support for Vulkan and Steam / Proton.
It would be ironic if the "best" way to play games on macOS was to have to boot Linux to play Windows games.

Presumably it's there to give developers options. If you're porting code written for 3D APIs which relies on implicit sync, having it available makes the initial porting job easier.
Is it possible that Metal needs implicit sync due to the OpenGL to Metal translation layer?
 

leman

macrumors Core
Oct 14, 2008
19,521
19,674
Is it possible that Metal needs implicit sync due to the OpenGL to Metal translation layer?

Hardly. In the OpenGL supported by Apple all bindings are explicit, so it's easy to track dependencies. So the translation layer/driver should be able to insert appropriate fences just by examining the state.
 
  • Like
Reactions: Xiao_Xi

Yebubbleman

macrumors 603
May 20, 2010
6,024
2,616
Los Angeles, CA
What are the odds that Apple will move away from integrated graphics for their Mac Pro and iMac Pro?

Put it this way: I wouldn't go to Vegas to gamble that such a thing will exist. Odds are extremely unlikely.

1 big GPU card is better than 4 M2 Max fused together in the end.

Combining multiple AMD or NVIDIA GPU’s also wasn’t very good using SLI or Crossfire. It is better to just have 1 big powerful one.
Honestly, the people wanting a traditional GPU in an Apple Silicon Mac Pro don't really get how Apple GPUs in Apple SoCs work. Because, if they did, they'd realize (a) it's not going to happen, (b) it doesn't NEED to happen in order to still achieve insane amounts of GPU performance, and (c) they're still thinking about Apple Silicon Macs like they're Intel Macs despite the fact that the system architecture (beyond the processor architecture) is completely different.
 

senttoschool

macrumors 68030
Nov 2, 2017
2,626
5,482
What are the odds that Apple will move away from integrated graphics for their Mac Pro and iMac Pro?

1 big GPU card is better than 4 M2 Max fused together in the end.

Combining multiple AMD or NVIDIA GPU’s also wasn’t very good using SLI or Crossfire. It is better to just have 1 big powerful one.
The future of the GPU is exactly how it’s done in the M1 Ultra. MCM approach. It’s uneconomical to keep producing bigger and bigger GPUs on a single die due to defects.

AMD is already there with RDNA3. Nvidia will use MCM approach too. Apple has done it with M1 Ultra.

The reason SLI and Crossfire sucked was because those techniques still presented the OS with 2 GPUs instead of one. The holy grail is to connect multiple GPUs together and the system only sees one GPU. This means software do not have to be written manually to account for multiple GPUs, unlike SLI or Crossfire.
 
Last edited:

Boil

macrumors 68040
Oct 23, 2018
3,477
3,173
Stargate Command
So over in this thread...

For an useful Mac Pro doesn't represents an huge deprive not getting ASi M3, maybe only if it includes dedicated ray tracing (ASi Matrix coprocessor does good RT but not close dedicated ASIC).

For developers neither an issue as according multiple sources ASi Mac Pro Will support compute/rendering on AMD dGPU (rt native support) at least for PCIe version, and those hoping for cheap GPU an ASi dedicated GPU likely joining the party but it at much Will be 30%-60% (duo/quad) as powerful as AMD rx7900xtx single.

Maybe the ASi dedicated GPU(s) offer an Apple silicon way to get hardware ray-tracing into the M2 Ultra/Extreme Mac Pro...?
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
While we wait for Apple to unveil its hardware-based ray-traced GPU, ChipsAndCheese has written about the tricks that Nvidia and AMD use for their hardware-based ray-tracing solution. I wonder what tricks Apple will use.
 
  • Like
Reactions: jujoje

leman

macrumors Core
Oct 14, 2008
19,521
19,674
While we wait for Apple to unveil its hardware-based ray-traced GPU, ChipsAndCheese has written about the tricks that Nvidia and AMD use for their hardware-based ray-tracing solution. I wonder what tricks Apple will use.

We can actually make a fairly good guess based on the patents published last year. A very rough summary:

- specialised parallel ray-tracing coprocessor that traverses the BVH and performs testing using limited precision to minimise power usage and required die area (this idea seems to be borrowed from IMG); these results might include false positives and have to be checked again using regular precision shaders

- the ray intersection hardware sorts the results and launches a compute kernel that will re-check the results and perform the shading

- the BVH uses a flexible layout and can encode many-to-many spatial relationships, from a cursory glance the idea behind all this is make the tree wider

The result of all this is an asynchronous system involving specialised hardware units and general-purpose units which feed each other with work. All of this is designed to improve hardware utilisation and reduce energy usage. You have advanced techniques like adaptive precision computation, ray compacting, thread reordering with just-in time SIMD group issue, on-GPU BVH generation etc...

I'm very curious to see how this system will behave. The big issue with ray tracing is that it is inherently work and cache inefficient. AMD just brute-forces their way through this in a very lazy way and hopes that their cache hierarchy can amortise most of it. Nvidia does something much more sophisticated (but I am still not very clear what exactly). Apple's approach seems like it could significantly improve the efficiency, at least compared to AMD.
 
  • Like
Reactions: Xiao_Xi

leman

macrumors Core
Oct 14, 2008
19,521
19,674
AMD is already there with RDNA3.

What's interesting is that AMD still uses a monolithic GPU die, but cache with memory controllers are on separate dies. It's really a technique to optimise manufacturing costs, more than anything. Cache memory density doesn't scale well with newer nodes, so they can make the cache blocks using cheaper node and just bridge it with the main die. It's good use of the technology if power consumption is of secondary concern.

The reason SLI and Crossfire sucked was because those techniques still presented the OS with 2 GPUs instead of what. The holy grail is to connect multiple GPUs together and the system only sees one GPU. This means software do not have to be written manually to account for multiple GPUs, unlike SLI or Crossfire.

All of this really boils down to the speed of communication between the processing blocks. People often tend to gloss over this, but GPUs already contain dozens of clusters. The only difference is that it's easier to make a fast communication bus if these clusters are on the same die than when they are on separate dies. So you end up with a system that has NUMA characteristics and the best way to deal with that is make the software aware of these things.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Does the Apple GPU have any advantage over PC GPUs when a game/application uses tiled deferred rendering?
 

leman

macrumors Core
Oct 14, 2008
19,521
19,674
Does the Apple GPU have any advantage over PC GPUs when a game/application uses tiled deferred rendering?

Not 100% sure what your question is, so I'll try to reply to it in a way that seems most relevant. Apple GPUs perform pixel processing as operations on shared memory, and Apple fully expose this fact to the software, permitting you to do complex processing on the contents of all pixels within a tile. In some situations, this can enable simpler and more performant algorithms.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Not 100% sure what your question is
I read a post about a game that is going to increase its FPS because it is going to use tiled deferred shading.
https://store.steampowered.com/news/app/780310?emclan=103582791464180675&emgid=3722829157005911102

And now I'm wondering if that game would increase its performance more on Apple's GPU because Apple's GPU uses a tiled-based deferred rendering architecture.
1681028839634.png
 

leman

macrumors Core
Oct 14, 2008
19,521
19,674
I read a post about a game that is going to increase its FPS because it is going to use tiled deferred shading.
https://store.steampowered.com/news/app/780310?emclan=103582791464180675&emgid=3722829157005911102

And now I'm wondering if that game would increase its performance more on Apple's GPU because Apple's GPU uses a tiled-based deferred rendering architecture.

Tiled deferred shading is completely orthogonal to TBDR, they are completely different things that have nothing to do with each other. But yes, the deferred shading method they describe in the blog post can most likely be implemented much more efficiently on Apple GPUs as they can eliminate multiple passes and memory copies required on desktop architectures.

But from what I see this game doesn't have a macOS port, so all of this remains purely academical.
 
  • Like
Reactions: Xiao_Xi
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.