Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

leman

macrumors Core
Oct 14, 2008
19,521
19,674
P.S. The same reasoning applies to most dichotomies, be in in technical domain (e.g. RISC vs. CISC) or societal domain (e.g. left vs. right). Most of these dichotomous concepts are very simple and we’re created to refer to a concrete things, but people use them in abstract sense that usually breaks down when you look at the details. This not only renders these notions virtually useless for learning or understanding the actual topic but also makes them highly susceptible to manipulation. That’s why one always needs to look at details. When you use notion X what exactly do you mean and why do you consider this interesting?
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
P.S. The same reasoning applies to most dichotomies, be in in technical domain (e.g. RISC vs. CISC) or societal domain (e.g. left vs. right). Most of these dichotomous concepts are very simple and we’re created to refer to a concrete things, but people use them in abstract sense that usually breaks down when you look at the details.

'Discrete' vs 'integrated' would be a dichotomy scope/usage mismatch. 'Dedicated' vs 'integrated' is flawed dichotomy even from the start. The central component of a GPU is solely dedicated to being a GPU in either one. 'Dedicated' is pointing far more at the VRAM subsystem than it is pointing at the central GPU component. Perhaps the word 'dedicated' translates more easily into multiple languages but it isn't a particularly a good fit in the first place.

Discrete connotes that it is a separate chip in a separate package attached to separate RAM directly attached. It is not a 'card'.

CXL shared cache coherence across a standard PCI-e slot bus really doesn't turn a discrete GPU into an integrated one. So unified (coherently ) memory at the application level really doesn't make a material difference to the dichotomy either.

The problem in many of these threads is that folks throw in misplaced euphemisms. For example "GPU PCIe standard format add-in card" == "dedicated GPU". That is more a mismash of concepts than dichotomy problems. People aren't even agreeing on core semantics/meanings of words. Grossly butchered and/or conflicting semantics leads to communications problems regardless of whether a dichotomy is present or not.

The folks talking about "removable cards" should be calling it 'modular/removable' GPU. Not 'dedicated'. Those are really two different adjectives. But if want to push an agenda into what is really a separate topic it is convenient to mash up the adjectives as though they were truly synonyms (when they really are not) .


When Intel releases their GPU tile + GPU tile + SoC tile inside of a single package on Gen14 (Meteor Lake) that is not going to be a dGPU because that GPU tile is fabricated in a different factory. Nor should strip away dGPU from the GPU package on a card that has both display and USB sockets on it ( put a USB Type C socket and HDMI socket on a same card. There is not a singular function there ( GPU onlY) anymore. ).

There is a quote attributed to Einstein.
EVERYTHING SHOULD BE MADE AS SIMPLE AS POSSIBLE, BUT NOT SIMPLER—ALBERT EINSTEIN

The problem with this dedicated vs integrated realm is that it is often drifting into the "too simpler" range. where the RAM is the only determiner of the difference. Or the chip die is the only one. etc. It is really multiple criteria combined. ( essentially a 3D (or 4D) issue being cast as a 2D issue. )
 
  • Like
Reactions: leman

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
We use dGPU to mean a card, typically non-unified memory, on PCIe. By contrast, the M1 has an iGPU, meaning integrated rather than dedicated.


That is not really what dGPU means. A 2019 MBP 16" with an AMD Radeon Pro 5600M with 8GB of HBM2 memory is a discrete GPU even though it is soldered to the same motherboard as the Intel CPU. Same for iMac Pro 2017 its Vega dGPU.

'dedicated' as in serving a single, specific purpose isn't really a good way to separate these out. The GPU cores in an M1 are just a dedicated to serving the GPU function as if there were stuff in a separate die. A the GPU cores in a M1 Ultra on the 'second' Max die not integrated even though they could have come from a completely different wafer ?

Real Discrete GPU has to do with the packaging as well as the RAM as well some memory cache coherency, segmentation , and addressing factors. It is a combination of more than two factors that truly outlines the different camps.

Not some card or not. That is end user replaceable modularity not discrete or dedication.
 

scottrichardson

macrumors 6502a
Jul 10, 2007
716
293
Ulladulla, NSW Australia
Apple already builds their own dedicated GPU cards in the form of the afterburner card. A very small niche within a niche product, which ultimately led the way to the media encoders you see in the M1 Pro/Max etc.

My thought is that we may end up with the ENTIRE chipset being a card that you can slide in and out of your new Apple Silicon Mac Pro. So if you want to upgrade your Mac Pro, you switch out the M2 Max card for the M2 Ultra Card. This would certainly meet the 'user upgradeable' notion, while also being a solid money maker for Apple. You keep the chassis, but you can swap the CPU/GPU as a single card (or whatever shape/name it has).

I imagine it will be a 3 slot machine. One slot for the custom Apple Silicon Chipset, one slot for the secondary RAM banks, and a third PCiE slot for custom I/O etc.
 

quarkysg

macrumors 65816
Oct 12, 2019
1,247
841
Apple already builds their own dedicated GPU cards in the form of the afterburner card. A very small niche within a niche product, which ultimately led the way to the media encoders you see in the M1 Pro/Max etc.

My thought is that we may end up with the ENTIRE chipset being a card that you can slide in and out of your new Apple Silicon Mac Pro. So if you want to upgrade your Mac Pro, you switch out the M2 Max card for the M2 Ultra Card. This would certainly meet the 'user upgradeable' notion, while also being a solid money maker for Apple. You keep the chassis, but you can swap the CPU/GPU as a single card (or whatever shape/name it has).

I imagine it will be a 3 slot machine. One slot for the custom Apple Silicon Chipset, one slot for the secondary RAM banks, and a third PCiE slot for custom I/O etc.
Highly unlikely for the SoC to sit in a card with memory in another card. That would absolutely negate the bandwidth and latency advantage with the existing AS Macs.

IMHO, at most the AS Mac Pro will have DDR5 DIMMs slots and take a latency hit, but they will have to interleave the hell out of the slots and with many slots to make up the bandwidth deficit they have with LPDDR5 modules.
 

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,866
Apple already builds their own dedicated GPU cards in the form of the afterburner card. A very small niche within a niche product, which ultimately led the way to the media encoders you see in the M1 Pro/Max etc.
Afterburners are FPGA cards (Field Programmable Gate Arrays), not GPUs.
 
  • Like
Reactions: Xiao_Xi

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Afterburners are FPGA cards (Field Programmable Gate Arrays), not GPUs.
FPGAs can contain whatever you want (CPU, GPU...). Technically, Afterburners is not a GPU, but they have some blocks that modern PC GPUs have.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,674
Do Afterburner cards have TMU’s?

Depends on what you understand as "TMU". If for you a "TMU" is a dedicated hardware circuit block specialised in texture processing, then no, a FPGA array cannot contain a "TMU" by definition. If instead you understand "TMU" as a functional block, then sure, a FPGA array can perfectly well implement a TMU.

It's an interesting point though: people like to talk about hardware vs software, but where does "hardware" and and "software" begin? If one wants to be pedantic, one can claim that Apple GPU does primitive fetch or shader scheduling in "software". Only that this software is running on a highly specialised dedicated and optimised for this purpose hardware :)

Edit: to clarify - it should be absolutely possible to build a chip that combines a FPGA array and a dedicated TPU circuitry, if one wants that kind of thing. Can’t imagine a useful application for it right know, but who knows?
 
Last edited:

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,663
OBX
Depends on what you understand as "TMU". If for you a "TMU" is a dedicated hardware circuit block specialised in texture processing, then no, a FPGA array cannot contain a "TMU" by definition. If instead you understand "TMU" as a functional block, then sure, a FPGA array can perfectly well implement a TMU.

It's an interesting point though: people like to talk about hardware vs software, but where does "hardware" and and "software" begin? If one wants to be pedantic, one can claim that Apple GPU does primitive fetch or shader scheduling in "software". Only that this software is running on a highly specialised dedicated and optimised for this purpose hardware :)
Yeah it is an interesting question. I think AMD considers a GPU a device with video/display out. So the MI line isn't considered a GPU because it is missing those blocks, they consider it an accelerator. I think nvidia considers everything they make a GPU even if it doesn't have display out (like A100). So it appears "the industry" doesn't have a set definition...
 
  • Like
Reactions: singhs.apps

leman

macrumors Core
Oct 14, 2008
19,521
19,674
Yeah it is an interesting question. I think AMD considers a GPU a device with video/display out. So the MI line isn't considered a GPU because it is missing those blocks, they consider it an accelerator. I think nvidia considers everything they make a GPU even if it doesn't have display out (like A100). So it appears "the industry" doesn't have a set definition...

Not surprising. These kind of definitions are usually very limited and ultimately become less and less useful as the domain evolves. Humans love categorizing things, it keeps the world neat and organized for us, but failing to update the categories as the world and our understanding of it evolves is like trying to put on your favorite onesie from when you were a toddler. I mean, one might even succeed, but there won’t be much left of the onsie and one will look like a moron.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Do Afterburner cards have TMU’s?

Honestly, I don't know, I would say no because the afterburner only offloads work from the CPU, and not the GPU.

afterburner.png


I think AMD considers a GPU a device with video/display out. So the MI line isn't considered a GPU because it is missing those blocks, they consider it an accelerator. I think nvidia considers everything they make a GPU even if it doesn't have display out (like A100).
I like AMD's nomenclature better. I wouldn't consider a chip a GPU if it doesn't have the rendering pipeline.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,674
I like AMD's nomenclature better. I wouldn't consider a chip a GPU if it doesn't have the rendering pipeline.

What's a "rendering pipeline"?

BTW, according to AMD nomenclature (at least as presented here) Apple Silicon does not have a GPU, since Apple GPU does not output any video signals. It jus takes data from memory, does some processing on it and writes it back to memory. What happens to that data afterwords is none of GPU's concern. There is a completely independent hardware unit that reads this data and sends it to the display.
 
  • Like
Reactions: singhs.apps

singhs.apps

macrumors 6502a
Oct 27, 2016
660
400
What's a "rendering pipeline"?

BTW, according to AMD nomenclature (at least as presented here) Apple Silicon does not have a GPU, since Apple GPU does not output any video signals. It jus takes data from memory, does some processing on it and writes it back to memory. What happens to that data afterwords is none of GPU's concern. There is a completely independent hardware unit that reads this data and sends it to the display.
Ah. Interesting detail.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,674
The graphics pipeline

Yeah, but what does it mean exactly? Historically, a graphics processor implemented consecutive hardware stages, e.g. vertex shading -> primitive assembly -> rasterisation -> fragment shading -> blending etc, which formed the graphics pipeline.

These days, "graphics pipeline" is mostly a software terms, as many of these stages (vertex shading, fragment shading, mesh shading, blending in case of Apple) runs on the same hardware units. It's just a mix of various programs and work items, executed in a particular order and results synchronised, and an odd fixed function unit or two to accelerate repetitive stuff. Frankly, one could probably make a GPU API that completely goes away with the idea of the graphics pipeline — just compute shaders, synchronisation primitives and special API calls to invoke the fixed-function hardware.

So where would you personally draw a line? Presence of a hardware rasteriser? Tessellator? Dedicated TMU? ROPs? (do Apple GPUs even have ROPs?)
 

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,663
OBX
What's a "rendering pipeline"?

BTW, according to AMD nomenclature (at least as presented here) Apple Silicon does not have a GPU, since Apple GPU does not output any video signals. It jus takes data from memory, does some processing on it and writes it back to memory. What happens to that data afterwords is none of GPU's concern. There is a completely independent hardware unit that reads this data and sends it to the display.
Yeah, looking at this further, does Apple have/use ROPs? It isn't clear if it is possible to "display" graphics from a GPU without them, and it also looks like that is the block/unit that AMD leaves out of it's accelerators (they have TMU's) along with the display engine.

More annoyingly AMD still calls the Instinct line Datacenter GPU's even though they also refer to it as an accelerator (on the site you can't get to it via the Graphics link, you get to it via the Accelerator link...).
 

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,663
OBX
Yeah, but what does it mean exactly? Historically, a graphics processor implemented consecutive hardware stages, e.g. vertex shading -> primitive assembly -> rasterisation -> fragment shading -> blending etc, which formed the graphics pipeline.

These days, "graphics pipeline" is mostly a software terms, as many of these stages (vertex shading, fragment shading, mesh shading, blending in case of Apple) runs on the same hardware units. It's just a mix of various programs and work items, executed in a particular order and results synchronised, and an odd fixed function unit or two to accelerate repetitive stuff. Frankly, one could probably make a GPU API that completely goes away with the idea of the graphics pipeline — just compute shaders, synchronisation primitives and special API calls to invoke the fixed-function hardware.

So where would you personally draw a line? Presence of a hardware rasteriser? Tessellator? Dedicated TMU? ROPs? (do Apple GPUs even have ROPs?)
meshlets_pipeline.png

The Traditional Pipeline is what is going away, these days (it is kinda how nanite works). Well if you believe nvidia, lol.

I think Apples GPU's don't need ROPs because they write to the framebuffer directly. I think Apple's GPU do have hardware rasterizers (and it appears they take far less power to use than bypassing them and using the compute side, assuming we trust GravityMark).
 

leman

macrumors Core
Oct 14, 2008
19,521
19,674
Yeah, looking at this further, does Apple have/use ROPs? It isn't clear if it is possible to "display" graphics from a GPU without them, and it also looks like that is the block/unit that AMD leaves out of it's accelerators (they have TMU's) along with the display engine.

I'm wondering about the same thing. Frankly, I am not even sure what exactly is the function of ROPs on modern GPUs as any discussion of the term I find dates at least a decade ago. In a traditional dGPU where there can be a data race for the same pixel and blending is done in an extra step, it kind of makes sense to have a dedicated hardware unit responsible for pixel ordering and composition (although I am not quote sure how such unit would function). But in a TBDR GPU pixel access is exclusive and there are no races within the same tile. Similar consideration gos for MSAA. You still need some specialised hardware to compress the tile when it's flushed, but that's hardly a ROP anymore, more of a memory controller task.

meshlets_pipeline.png

The Traditional Pipeline is what is going away, these days (it is kinda how nanite works). Well if you believe nvidia, lol.

That's the logical pipeline (as presented by the API), not how the GPU actually works. All these shader stages run at the same hardware, and it's unclear if "mesh generation" is even a thing (it probably refers to sheduling/launching the mesh shader compute tasks). The point is that you don't need any dedicated hardware to implement this logical pipeline.

I think Apples GPU's don't need ROPs because they write to the framebuffer directly. I think Apple's GPU do have hardware rasterizers (and it appears they take far less power to use than bypassing them and using the compute side, assuming we trust GravityMark).

Oh, Apple definitely has hardware rasterisers and many other fixed-function hardware units in their GPUs.
 

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,866
FPGAs can contain whatever you want (CPU, GPU...). Technically, Afterburners is not a GPU, but they have some blocks that modern PC GPUs have.
I work with FPGAs so I'm well aware of what they can do. :)

Yes, you could put a GPU design into the FPGA on an Afterburner card, but nobody would want that. There are many problems. First, Apple almost certainly did not provision the FPGA on Afterburner with enough (or fast enough) RAM for a GPU. Next, at $2K for the entire card at Apple markups, I guarantee you the FPGA is a relatively modest one. Even the $10K+ monster FPGAs I have sometimes worked with (and that's $10K just for the chip, not the whole thing) probably would not be able to compete with a sub-$100 consumer GPU. Making a chip field reprogrammable has enormous costs: area, clock speed, and power are all far worse. There's a reason why people don't just do everything with FPGAs.

Apple's not crazy, so they aren't using Afterburner as a GPU. Its function is to accelerate video stream encode and decode. This is a much smaller block better suited to the limits of FPGA technology than a GPU would be.
 
  • Like
Reactions: Xiao_Xi

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,663
OBX
I'm wondering about the same thing. Frankly, I am not even sure what exactly is the function of ROPs on modern GPUs as any discussion of the term I find dates at least a decade ago. In a traditional dGPU where there can be a data race for the same pixel and blending is done in an extra step, it kind of makes sense to have a dedicated hardware unit responsible for pixel ordering and composition (although I am not quote sure how such unit would function). But in a TBDR GPU pixel access is exclusive and there are no races within the same tile. Similar consideration gos for MSAA. You still need some specialised hardware to compress the tile when it's flushed, but that's hardly a ROP anymore, more of a memory controller task.



That's the logical pipeline (as presented by the API), not how the GPU actually works. All these shader stages run at the same hardware, and it's unclear if "mesh generation" is even a thing (it probably refers to sheduling/launching the mesh shader compute tasks). The point is that you don't need any dedicated hardware to implement this logical pipeline.



Oh, Apple definitely has hardware rasterisers and many other fixed-function hardware units in their GPUs.
Yeah from my understanding GPU's actually work better under the "mesh" pipeline than the legacy one (supposedly it is closer to how GPU's work anyways).

That made it odd to see it take Apple as long as it did to add it/them (the logical side of the mesh pipeline) to Metal.

Also I wonder if Apple could separate out each block into it's own "die" on top of their existing interposer (to keep bandwith up) in order to make mixing and matching parts easier (and to remove duplication of parts that they don't use wasting die space). AMD has an intersting talk about how keeping some stuff on older nodes in a MCM package is actually helpful in making it "faster" to design newer parts. They did mention the downside of breaking out blocks (bandwidth is the biggest one), that it seems like Apple has an answer for already.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,674
Yeah from my understanding GPU's actually work better under the "mesh" pipeline than the legacy one (supposedly it is closer to how GPU's work anyways).

That's not necessarily the case. From what I've seen, the official recommendation is not to use mesh shaders instead of the traditional pipeline. What you are probably referring to is that mesh shaders can end up being much more efficient for geometry generation since they can directly feed the rasteriser without the data ever leaving the chip. And of course, they are much better than the older "geometry shaders" stage, which was a horrible mistake to begin with (no wonder Apple never implemented it).

I wanted to do some in-depth testing of mesh shading performance on Apple Silicon but never got to it... maybe something for the holiday season, let's see.

That made it odd to see it take Apple as long as it did to add it/them (the logical side of the mesh pipeline) to Metal.

There were some speculations last year that Apple won't add mesh shading at all. The thing is, mesh shading could be a bit at odds with the entire TBDR story. The big benefit of mesh shaders as touted by Nvidia was that the GPU can directly generate and consume (rasterise) the geometry on the GPU cluster, saving I/O and synchronisation. But a TBDR GPU rasterises the primitives in small tiles, where each tile is processed on a different GPU core. Depending on how these things actually work it might (or might not) be a problem in practice. It is possible that Apple needs to write out the results produced by the mesh shaders into RAM so that they can be consumed by the rasteriser. But then again there could be smart ways to work around it.


Also I wonder if Apple could separate out each block into it's own "die" on top of their existing interposer (to keep bandwith up) in order to make mixing and matching parts easier (and to remove duplication of parts that they don't use wasting die space). AMD has an intersting talk about how keeping some stuff on older nodes in a MCM package is actually helpful in making it "faster" to design newer parts. They did mention the downside of breaking out blocks (bandwidth is the biggest one), that it seems like Apple has an answer for already.

What AMD is currently doing is mostly about optimising cost (and maybe production capacity). I don't think that Apple is that concerned about this, unless the costs of newer nodes will truly become astronomical. I could see them doing more modular packages in the future, just to offer more flexible configurations, but I wouldn't be surprised if they don't go that way either.
 

gpat

macrumors 68000
Mar 1, 2011
1,931
5,341
Italy
If they are serious about this, they'll engineer an ARM CPU-only chip and include dGPU support from AMD.
No way they replace the Mac Pro with a totally integrated SOC.
But who knows. The effort may not be worth it for them.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.