Apple dedicated GPU

theorist9 · Nov 24, 2022

leman said:
I think the reasonable possibilities for the Mac pro GPU is one of the following:

- a single SoC with a big GPU (e.g. four dies, around 40-50k FP32 ALUs). That’s the easiest option for Apple and it won’t really be able to challenge any of the high-end multi-GPU systems

- a single SoC with a VERY BIG GPU (multiple GPU-only tiles, a lot of cores). Very expensive, very custom, very big, but still has uses the same programming model as any other model

- multiple SoCs on separate compute boards, connected via some sort of PCIe-facilitated bus (maybe cache-coherent CLX), maybe with a shared pool of traditional RAM. This is something I’ve been thinking about for a while, as this approach would solve the issues with modularity and expandability. But it will require a new programming model that can efficiently use non-local compute clusters.

Here's an alternate way to get increased GPU power: Instead of adding separate GPU-only dies to the SoC, they could have multiple dies with the current design (CPU and GPU on the same die), but with a much higher ratio of GPU cores : CPU cores.

E.g., instead of the 2.4 GPU : 1 CPU ratio they have now, what about 10 GPU : 1 CPU? If they used enough dies to give them 40 CPU cores (i.e., the same number of CPU cores as a current 2 x Ultra), and increased the TFLOPS/GPU core by 20% over the M1's, that would give them ~200 TFLOPS, i.e., about twice what people are estimating for the future 4090 Ti. And if they also offered a 4 x Ultra with 80 CPU cores, that would give them ~ 4 x 4090 Ti. Those would be killer machines.

Would this benefit them them by maintaining close "local" CPU-GPU integration (local to each die), or hurt them by scattering the CPU cores over many different dies, rather than a few?

If they wanted to make it modular, they could offer 20 CPU cores/SoC, with the machine taking, say, up to four of these. And if they wanted to make it more customizable, they could offer SoC's with, say, two different ratios of GPU cores : CPU cores, catering to both CPU-heavy and GPU-heavy workloads.

diamond.g · Nov 30, 2022

theorist9 said:
Here's an alternate way to get increased GPU power: Instead of adding separate GPU-only dies to the SoC, they could have multiple dies with the current design (CPU and GPU on the same die), but with a much higher ratio of GPU cores : CPU cores.

E.g., instead of the 2.4 GPU : 1 CPU ratio they have now, what about 10 GPU : 1 CPU? If they used enough dies to give them 40 CPU cores (i.e., the same number of CPU cores as a current 2 x Ultra), and increased the TFLOPS/GPU core by 20% over the M1's, that would give them ~200 TFLOPS, i.e., about twice what people are estimating for the future 4090 Ti. And if they also offered a 4 x Ultra with 80 CPU cores, that would give them ~ 4 x 4090 Ti. Those would be killer machines.

Would this benefit them them by maintaining close "local" CPU-GPU integration (local to each die), or hurt them by scattering the CPU cores over many different dies, rather than a few?

If they wanted to make it modular, they could offer 20 CPU cores/SoC, with the machine taking, say, up to four of these. And if they wanted to make it more customizable, they could offer SoC's with, say, two different ratios of GPU cores : CPU cores, catering to both CPU-heavy and GPU-heavy workloads.

At 5nm are Apple's ALU's smaller than nvidia/amd at the "same size"?

theorist9 · Dec 2, 2022

diamond.g said:
At 5nm are Apple's ALU's smaller than nvidia/amd at the "same size"?

Don't know -- you could estimate those values from the die sizes plus Locuza's annotated die shots on semianalysis.com, e.g.:

Apple M2 Die Shot and Architecture Analysis – Big Cost Increase And A15 Based IP

Apple announced their new 20 billion transistor M2 SoC at WWDC.

www.semianalysis.com

DeepIn2U · Dec 9, 2022

leman said:
I think you are kind of missing the joke of the post your have quoted

That said “dedicated” these days is mostly an emotional label. Folks use it as synonymous to “fast” or “powerful”. Well, Apple GPUs are plenty fast. We should just retire the term altogether. For the purpose of this thread it makes more sense to talk about a “modular” or “swappable” GPU because that’s what people really mean.

I’m not so certain if we should retire the term just yet lol

tomO2013 · Dec 9, 2022

I’m sharing this interview with a panel of architects at Imagination Technologies (the IP provider for much of Apples GPU).
They were major contributors back in the day (if anybody can remember 3dfx Vs PowerVR, OpenGL, Glide library, etc… these guys were and still are key contributors in the graphics scene). It’s a nostalgia look back at Imagination Tech‘s contributions over 30 years and commentary on the future of graphics.

What’s especially interesting is around 12.30 in the video (Shared above) and the commentary on the future of the GPU in the industry at large and the drive towards efficiency. They gave an example earlier of where their mobile silicon solution was rendering nVidia’s own ray trace demo at double the frame rate of the NVidia desktop card at a fraction of the power on the same scene (this goes back to 2019 demo) [8.17 in the video]. These guys have a history focussed on power efficiency. Their current CXT technology is specifically designed for scalable architectures from mobile up to high density compute data centers.

I’m convinced that Apple will utilize their ‘Photon’ ray tracing derivative technology in a future M2 Bodacious or M3 Cowabunga solution.

IMG CXT GPU - Imagination

Ray tracing for mobile. Bringing high-performance, desktop-quality ray traced visuals to the palm of your hand.

www.imaginationtech.com

Enjoy

Edit: to provide a link to this detail white paper on their Ray tracing architecture.

PowerVR Photon - Imagination

The most advanced ray tracing architecture in the world, enabling desktop-level ray tracing to mobile.

www.imaginationtech.com

leman · Dec 9, 2022

tomO2013 said:
I’m convinced that Apple will utilize their ‘Photon’ ray tracing derivative technology in a future M2 Bodacious or M3 Cowabunga solution.

A while ago I have linked a bunch of Apple's raytracing patents which are loosely based on IMG technology but take the things a bit further. They describe a very energy- and area-efficient way to do raytracing.

Xiao_Xi · Mar 21, 2023

Metal supports both explicit sync and implicit sync for some reason

Paving the Road to Vulkan on Asahi Linux - Asahi Linux

asahilinux.org

Why does Metal support both? Isn't explicit sync superior to implicit sync?

mr_roboto · Mar 21, 2023

Xiao_Xi said:
Paving the Road to Vulkan on Asahi Linux - Asahi Linux

asahilinux.org

Why does Metal support both? Isn't explicit sync superior to implicit sync?

Presumably it's there to give developers options. If you're porting code written for 3D APIs which relies on implicit sync, having it available makes the initial porting job easier.

leman · Mar 21, 2023

Xiao_Xi said:
Paving the Road to Vulkan on Asahi Linux - Asahi Linux

asahilinux.org

Why does Metal support both? Isn't explicit sync superior to implicit sync?

Developer convenience. Metal is designed to scale with your needs. It can do some tedious low-level plumbing stuff for you (like resource tracking, synchronisation, and memory management) or let you do it yourself. For simpler applications the provided automatic tracking is absolutely sufficient and can save you a lot of error-prone work. And as @mr_roboto says, it can help with porting code from older APIs.

Pressure · Mar 22, 2023

I'm really excited for Asahi Linux and the progress they have made with the GPU drivers!

It will be opening up gaming on Apple Silicon so much with support for Vulkan and Steam / Proton.

Apple should really be doing more than they have been doing and Microsoft should also get their act together on AArch64 support. The hardware is capable.

Xiao_Xi · Mar 22, 2023

Pressure said:
It will be opening up gaming on Apple Silicon so much with support for Vulkan and Steam / Proton.

It would be ironic if the "best" way to play games on macOS was to have to boot Linux to play Windows games.

mr_roboto said:
Presumably it's there to give developers options. If you're porting code written for 3D APIs which relies on implicit sync, having it available makes the initial porting job easier.

Is it possible that Metal needs implicit sync due to the OpenGL to Metal translation layer?

leman · Mar 22, 2023

Xiao_Xi said:
Is it possible that Metal needs implicit sync due to the OpenGL to Metal translation layer?

Hardly. In the OpenGL supported by Apple all bindings are explicit, so it's easy to track dependencies. So the translation layer/driver should be able to insert appropriate fences just by examining the state.

Yebubbleman · Mar 23, 2023

Zest28 said:
What are the odds that Apple will move away from integrated graphics for their Mac Pro and iMac Pro?

Put it this way: I wouldn't go to Vegas to gamble that such a thing will exist. Odds are extremely unlikely.

Zest28 said:
1 big GPU card is better than 4 M2 Max fused together in the end.

Combining multiple AMD or NVIDIA GPU’s also wasn’t very good using SLI or Crossfire. It is better to just have 1 big powerful one.

Honestly, the people wanting a traditional GPU in an Apple Silicon Mac Pro don't really get how Apple GPUs in Apple SoCs work. Because, if they did, they'd realize (a) it's not going to happen, (b) it doesn't NEED to happen in order to still achieve insane amounts of GPU performance, and (c) they're still thinking about Apple Silicon Macs like they're Intel Macs despite the fact that the system architecture (beyond the processor architecture) is completely different.

spaz8 · Mar 24, 2023

If your definition of insane graphics is a GeForce GTX 1650

(M1 Ultra is actually slightly weaker).

leman · Mar 24, 2023

spaz8 said:
If your definition of insane graphics is a GeForce GTX 1650 (M1 Ultra is actually slightly weaker).

If by weaker you mean 7x faster, then yes.

senttoschool · Mar 24, 2023

Zest28 said:
What are the odds that Apple will move away from integrated graphics for their Mac Pro and iMac Pro?

1 big GPU card is better than 4 M2 Max fused together in the end.

Combining multiple AMD or NVIDIA GPU’s also wasn’t very good using SLI or Crossfire. It is better to just have 1 big powerful one.

The future of the GPU is exactly how it’s done in the M1 Ultra. MCM approach. It’s uneconomical to keep producing bigger and bigger GPUs on a single die due to defects.

AMD is already there with RDNA3. Nvidia will use MCM approach too. Apple has done it with M1 Ultra.

The reason SLI and Crossfire sucked was because those techniques still presented the OS with 2 GPUs instead of one. The holy grail is to connect multiple GPUs together and the system only sees one GPU. This means software do not have to be written manually to account for multiple GPUs, unlike SLI or Crossfire.

Boil · Mar 24, 2023

So over in this thread...

For an useful Mac Pro doesn't represents an huge deprive not getting ASi M3, maybe only if it includes dedicated ray tracing (ASi Matrix coprocessor does good RT but not close dedicated ASIC).

For developers neither an issue as according multiple sources ASi Mac Pro Will support compute/rendering on AMD dGPU (rt native support) at least for PCIe version, and those hoping for cheap GPU an ASi dedicated GPU likely joining the party but it at much Will be 30%-60% (duo/quad) as powerful as AMD rx7900xtx single.

Maybe the ASi dedicated GPU(s) offer an Apple silicon way to get hardware ray-tracing into the M2 Ultra/Extreme Mac Pro...?

Xiao_Xi · Mar 24, 2023

While we wait for Apple to unveil its hardware-based ray-traced GPU, ChipsAndCheese has written about the tricks that Nvidia and AMD use for their hardware-based ray-tracing solution. I wonder what tricks Apple will use.

Raytracing on AMD’s RDNA 2/3, and Nvidia’s Turing and Pascal

Note: Jake has commented that Nvidia’s tools may not show the true BVH structure. That’s a distinct possibility, as the structure implied by Nsight is indeed ridiculously wide. The rest…

chipsandcheese.com

leman · Mar 25, 2023

Xiao_Xi said:
While we wait for Apple to unveil its hardware-based ray-traced GPU, ChipsAndCheese has written about the tricks that Nvidia and AMD use for their hardware-based ray-tracing solution. I wonder what tricks Apple will use.

We can actually make a fairly good guess based on the patents published last year. A very rough summary:

- specialised parallel ray-tracing coprocessor that traverses the BVH and performs testing using limited precision to minimise power usage and required die area (this idea seems to be borrowed from IMG); these results might include false positives and have to be checked again using regular precision shaders

- the ray intersection hardware sorts the results and launches a compute kernel that will re-check the results and perform the shading

- the BVH uses a flexible layout and can encode many-to-many spatial relationships, from a cursory glance the idea behind all this is make the tree wider

The result of all this is an asynchronous system involving specialised hardware units and general-purpose units which feed each other with work. All of this is designed to improve hardware utilisation and reduce energy usage. You have advanced techniques like adaptive precision computation, ray compacting, thread reordering with just-in time SIMD group issue, on-GPU BVH generation etc...

I'm very curious to see how this system will behave. The big issue with ray tracing is that it is inherently work and cache inefficient. AMD just brute-forces their way through this in a very lazy way and hopes that their cache hierarchy can amortise most of it. Nvidia does something much more sophisticated (but I am still not very clear what exactly). Apple's approach seems like it could significantly improve the efficiency, at least compared to AMD.

leman · Mar 25, 2023

senttoschool said:
AMD is already there with RDNA3.

What's interesting is that AMD still uses a monolithic GPU die, but cache with memory controllers are on separate dies. It's really a technique to optimise manufacturing costs, more than anything. Cache memory density doesn't scale well with newer nodes, so they can make the cache blocks using cheaper node and just bridge it with the main die. It's good use of the technology if power consumption is of secondary concern.

senttoschool said:
The reason SLI and Crossfire sucked was because those techniques still presented the OS with 2 GPUs instead of what. The holy grail is to connect multiple GPUs together and the system only sees one GPU. This means software do not have to be written manually to account for multiple GPUs, unlike SLI or Crossfire.

All of this really boils down to the speed of communication between the processing blocks. People often tend to gloss over this, but GPUs already contain dozens of clusters. The only difference is that it's easier to make a fast communication bus if these clusters are on the same die than when they are on separate dies. So you end up with a system that has NUMA characteristics and the best way to deal with that is make the software aware of these things.

Xiao_Xi · Apr 9, 2023

Does the Apple GPU have any advantage over PC GPUs when a game/application uses tiled deferred rendering?

leman · Apr 9, 2023

Xiao_Xi said:
Does the Apple GPU have any advantage over PC GPUs when a game/application uses tiled deferred rendering?

Not 100% sure what your question is, so I'll try to reply to it in a way that seems most relevant. Apple GPUs perform pixel processing as operations on shared memory, and Apple fully expose this fact to the software, permitting you to do complex processing on the contents of all pixels within a tile. In some situations, this can enable simpler and more performant algorithms.

Xiao_Xi · Apr 9, 2023

leman said:
Not 100% sure what your question is

I read a post about a game that is going to increase its FPS because it is going to use tiled deferred shading.
https://store.steampowered.com/news/app/780310?emclan=103582791464180675&emgid=3722829157005911102

And now I'm wondering if that game would increase its performance more on Apple's GPU because Apple's GPU uses a tiled-based deferred rendering architecture.

leman · Apr 9, 2023

Xiao_Xi said:
I read a post about a game that is going to increase its FPS because it is going to use tiled deferred shading.
https://store.steampowered.com/news/app/780310?emclan=103582791464180675&emgid=3722829157005911102

And now I'm wondering if that game would increase its performance more on Apple's GPU because Apple's GPU uses a tiled-based deferred rendering architecture.

Tiled deferred shading is completely orthogonal to TBDR, they are completely different things that have nothing to do with each other. But yes, the deferred shading method they describe in the blog post can most likely be implemented much more efficiently on Apple GPUs as they can eliminate multiple passes and memory copies required on desktop architectures.

But from what I see this game doesn't have a macOS port, so all of this remains purely academical.

Xiao_Xi · Apr 21, 2023

Could the Nvidia RTX 4000 SFF be more efficient than Apple's GPU?

NVIDIA RTX 4000 SFF Ada Generation Graphics Card

Powered by the NVIDIA Ada Lovelace Architecture.

www.nvidia.com

Apple dedicated GPU

macrumors 601

macrumors G5

macrumors 601

macrumors G5

macrumors member

macrumors Core

macrumors 68000

macrumors 6502a

macrumors Core

macrumors 603

macrumors 68000

macrumors Core

macrumors 603

macrumors 6502

macrumors Core

macrumors 68030

macrumors 68040

macrumors 68000

macrumors Core

macrumors Core

macrumors 68000

macrumors Core

macrumors 68000

macrumors Core

macrumors 68000

Our Staff