Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

thedocbwarren

macrumors 6502
Nov 10, 2017
430
378
San Francisco, CA
I've seen a couple of early reviews of the dedicated Xe cards, the performance is laughable to say the least.
I have too. I seriously wonder why they want to go through all the trouble of producing these given how poor they are. I mean, they fall somehwere between an old GT 1030 and GTX-1050.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,678
@dmcloud I am afraid your understanding of these things are a bit incomplete. I will do my best to clarify some points below. Mom importantly: yes, Intel's CPU and GPU access the same physical memory and their implementation is a fully cache-coherent UMA. Your claims about Intel's implementation are unfortunately only speculation that is directly refuted by technical documentation and API behavior. I assume that AMD's iGPUs are also full UMA, but I haven't looked into the technical documentation there, so I can't comment.



For operations where the data has to be manipulated by both the CPU and iGPU, it is copied twice across the system bus into RAM (once per partition), then the system has to reconcile the two sets of data once passed back from RAM, which adds additional processing time. With the UMA setup the M1 uses, both the CPU and iGPU can access the full system RAM simultaneously (i.e., there is no partitioning of the RAM between CPU and GPU.) This means that data is only copied to RAM once, and since all operations can happen simultaneously, there is no overhead associated with reconciling two versions of the same data once passed back to the CPU from RAM.

I think this is where the true source of confusion is. An UMA system can achieve zero copy data sharing between CPU and GPU. It does not mean that it is always zero copy. Our graphics and compute API work with driver-managed buffers. The usual programming model is that the data comes from somewhere and gets copied into the buffer for the GPU use. This is no different in Metal: when you create a buffer or a texture, you will usually incur a memory copy. On a system with a dGPU, this copy is mandatory anyway.

However, both Apple and Intel offer means to avoid the copy. Metal has a way to use CPU-side memory allocation as a buffer on UMA systems. Intel offers very similar functionality (@mi7chy has provided a link in their post). Furthermore, the restrictions on both platforms are very similar: the data has to be page-aligned and its size has to be aligned, which agains tells us that the technical implementation is very similar.

To sum it up: you need to do some extra work on an UMA system to avoid the copy. Using the API as normal will still require you to copy your data.

x86 does NOT use unified memory for system RAM.

First and foremost, x86 is a processor architecture. It does not mandate GPU setup. There are different implementations of the x86 platform, with different characteristics. Some are UMA some are not.

UMA refers to the RAM setup in the system, not CPU cache.

UMA refers to everything. RAM setup is not enough, you need to synchronize GPU and CPU caches on common memory accesses or you risk getting garbage (CPU not . This is what "cache coherency" means. On both Apple and Intel current implementation, this is partly achieved by having the CPU and the GPU share the last level (SoC level cache, Intel also calls this L3 cache). The CPU and the GPU share the same memory subsystem (memory controllers etc.), and can address exactly the same RAM.

With the x86 platform, the system partitions the RAM into a CPU and iGPU section. For the iGPU, the system usually allocates around 2GB for GPU operations, meaning that only 6GB are available for the CPU.

This is just driver behavior. Some memory has to be reserved for GPU-only operation. Apple also does the same on M1 machines (RAM fro the frame buffers and various GPU-internal state, plus some overhead for the window manager). It has nothing to do with the ability of the GPU to address physical RAM. Both Intel and Apple GPUs are able to address the entirety of the system RAM.

One thing you have to realize is that Intel's use of the term "UMA" is misleading. For Intel's purposes, they just renamed Intel HD to UMA, but made no changes to the underlying architecture. On the other hand, Apple's approach is essentially what AMD has been trying to do for years with the development of their Infinity Fabric technology for Ryzen-series CPUs.

As I explain above, Intel has offered full UMA implementation for many years now. Appel didn't invent UMA. We do not know what the difference between Intel's and Apple's implementation is. I have a suspicion that in Apple's implementation virtual memory is fully under OS control (all M1 processors share the same page table), so the OS can very efficiently swap GPU memory when necessary (it is unclear whether an Intel-based system can do the same). And yes, it's very similar in spirit to Infinity Fabric. Again, Intel had this implemented with their iGPU for a couple of years now, it's just that their implementation is not particularly performant or scalable.

By the way, who is really mismarketing UMA is Nvidia. The have claimed to have "CUDA UMA" some time ago, but it's just BS.

Intel still partitions the CPU and GPU RAM, so you have to copy data from the CPU side to GPU side and vice versa (bolded section is Apple's approach, the italicized section is Intel's):

These quotes do describe some older implementations of iGPUs (like 10 years old). Again, that's not how more recent Intel GPUs (Sandy bridge and up) work.

My understanding after re-skimming the video (it’s been a few months since I last watched it) is that this doesn’t necessarily impact the amount of video memory this needs all that much, but rather the pressure placed on memory bandwidth.

I still need X MB for a texture of a given size, and X MB for the frame buffer in either design. However, TBDR reduces how often you need to reach out to (V)RAM, especially in situations where you need to make multiple passes. It *might* reduce intermediate buffers a little, but that assumes intermediate buffers are a noticeable contribution compared to the other buffers in use. My understanding was that the back buffer itself was used as the intermediate buffer, so I am a bit skeptical that there’s big gains to be had there. Draw to texture seems to be common these days, so there might be more than I expect, assuming these scenarios can be all done at the tile level, rather than a texture.

You are exactly spot on. TBDR reduces the number of data accesses in fragment shaders, which means a reduction in required bandwidth. You still need the RAM to hold the data though. There are of course things like texture compression etc. and Apple's support here is industry-leading.

In regards to intermediate buffers, TBDR makes it possible to eliminate them in many common cases. Since all processing occurs in tile, you can mix multiple shader invocations (compute and fragment) that apply different processing effects to the tile data without the need to move it to the memory. You only need to store it if you are using this data for something else entirely (e.g. a sharpening pass), but even then you can only store what you need and eliminate other intermediate buffers. It also allows completely new applications, such as fast vector rendering using compute shaders. I don't know if any other API than Metal that would allow you to chain compute shaders that share on-chip memory.
 
  • Like
Reactions: crevalic and Wizec

Gnattu

macrumors 65816
Sep 18, 2020
1,107
1,671
Both Intel and Apple GPUs are able to address the entirety of the system RAM.
This is not true at least on macOS. Intel GPU cannot address all of the system RAM. The area of memory it can address is static and you have to reboot if you want to change size of the addressable area. Even though you manually change that value to a very large one, the Intel GPU driver on macOS has a hard-coded upper-limit of 2GB and it will not use any memory beyond that(Don't know if that's changed now).

It is true that an Intel UHD 630 can use up all 64GB memory of the CPU integrates it, but as you said, there is much more than hardware capability to make a good use of UMA, and Apple excels at integrating software with hardware.

By the way, Apple and Intel defines "UMA" differently. From the Intel document:

By zero copy, we mean that no buffer copy is necessary since the physical memory is shared.

Which means, Intel's UMA is available only for buffers, not textures, if the texture is accessed frequently by both CPU and GPU, you need to make 2 copies in each "memory space", and sync the change, which uses extra memory as a result. On iOS and tvOS where all hardware features Apple GPU, the texture and buffer are all shared between CPU and GPU. Apple's document still states that shared mode is only available for buffer on macOS. I don't know if they are doing anything special for Apple Silicon Macs, but Apple Silicon should have the capability to share texture because it is already available on iOS.
 
  • Like
Reactions: jdb8167

leman

macrumors Core
Oct 14, 2008
19,521
19,678
This is not true at least on macOS. Intel GPU cannot address all of the system RAM. The area of memory it can address is static and you have to reboot if you want to change size of the addressable area. Even though you manually change that value to a very large one, the Intel GPU driver on macOS has a hard-coded upper-limit of 2GB and it will not use any memory beyond that(Don't know if that's changed now).

If this is the case, then it's a deliberate limitation of macOS and the Apple Intel GPU driver. It is entirely possible that the software implementation on Apple platforms is limited, but it doesn't make the hardware itself less capable. We should at least give Intel credit where it's due :)

Which means, Intel's UMA is available only for buffers, not textures, if the texture is accessed frequently by both CPU and GPU, you need to make 2 copies in each "memory space", and sync the change, which uses extra memory as a result. On iOS and tvOS where all hardware features Apple GPU, the texture and buffer are all shared between CPU and GPU. Apple's document still states that shared mode is only available for buffer on macOS. I don't know if they are doing anything special for Apple Silicon Macs, but Apple Silicon should have the capability to share texture because it is already available on iOS.

Well, "buffer" is a generic term. Texture data definitely resides in CPU-addressable memory on Intel, it's just that you application does not have a pointer to that data (this is also true for normally created textures on iOS). But the big limitation here is the API itself. From what I have seen, Intel only exposes zero-copy data in OpenCL. Frankly, I don't whether Intel or AMD expose zero copy in any graphical APIs, I was not able to find any information on shared memory for Vulkan or DX12. Given that these GPUs are relatively show, there is probably very little practical interest from the consumers.

In Metal you can create buffers backed by previously allocated content (with alignment restrictions), and you can use the data from these buffers to create textures (https://developer.apple.com/documentation/metal/mtlbuffer/1613852-maketexture). So you absolutely can have zero-copy textures on supported hardware. But there is a big caveat: if you create textures this way, GPU will be prevented from optimizing their layout and your texture filtering performance might suffer. For textures, it's almost always best to have them copied to driver-owned memory and optimized.

So as usual, the reality is a bit trickier than the theory. Graphical and compute API is fundamentally based around copying data (even in Metal, see basic buffer creation API https://developer.apple.com/documentation/metal/mtldevice/1433429-makebuffer), so your average app won't be 'zero-copy" at all. It will still benefit from UMA on Apple platforms, as you are copying from system memory to system memory, which is much faster than copying the data to the GPU over the PCIe bus. And if the app uses memory mapping APIs to modify resources, that will also be much cheaper and faster than with dGPUs. Ultimately, productivity apps that use GPU extensively will want to allocate their memory accordingly and use specialized APIs like https://developer.apple.com/documentation/metal/mtldevice/1433382-makebuffer to allow true zero-copy behavior.

P.S. Just looked at Vulkan spec, it seems that VK_EXT_external_memory_host would be a way to do zero-copy on UMA systems, but there are some nuances to consider. You could also check the API-provided memory heaps, "guess" which of them are the UMA memory and just use those to ensure zero copy. But that is tricky again as modern Nvidia and AMD implementations often provide a small pinned memory buffer that is actually synchronized via PCI-e. Apple certainly makes it much more straightforward.
 
Last edited:

dmccloud

macrumors 68040
Sep 7, 2009
3,142
1,899
Anchorage, AK
@dmcloud I am afraid your understanding of these things are a bit incomplete. I will do my best to clarify some points below. Mom importantly: yes, Intel's CPU and GPU access the same physical memory and their implementation is a fully cache-coherent UMA. Your claims about Intel's implementation are unfortunately only speculation that is directly refuted by technical documentation and API behavior. I assume that AMD's iGPUs are also full UMA, but I haven't looked into the technical documentation there, so I can't comment.





I think this is where the true source of confusion is. An UMA system can achieve zero copy data sharing between CPU and GPU. It does not mean that it is always zero copy. Our graphics and compute API work with driver-managed buffers. The usual programming model is that the data comes from somewhere and gets copied into the buffer for the GPU use. This is no different in Metal: when you create a buffer or a texture, you will usually incur a memory copy. On a system with a dGPU, this copy is mandatory anyway.

However, both Apple and Intel offer means to avoid the copy. Metal has a way to use CPU-side memory allocation as a buffer on UMA systems. Intel offers very similar functionality (@mi7chy has provided a link in their post). Furthermore, the restrictions on both platforms are very similar: the data has to be page-aligned and its size has to be aligned, which agains tells us that the technical implementation is very similar.

To sum it up: you need to do some extra work on an UMA system to avoid the copy. Using the API as normal will still require you to copy your data.



First and foremost, x86 is a processor architecture. It does not mandate GPU setup. There are different implementations of the x86 platform, with different characteristics. Some are UMA some are not.



UMA refers to everything. RAM setup is not enough, you need to synchronize GPU and CPU caches on common memory accesses or you risk getting garbage (CPU not . This is what "cache coherency" means. On both Apple and Intel current implementation, this is partly achieved by having the CPU and the GPU share the last level (SoC level cache, Intel also calls this L3 cache). The CPU and the GPU share the same memory subsystem (memory controllers etc.), and can address exactly the same RAM.



This is just driver behavior. Some memory has to be reserved for GPU-only operation. Apple also does the same on M1 machines (RAM fro the frame buffers and various GPU-internal state, plus some overhead for the window manager). It has nothing to do with the ability of the GPU to address physical RAM. Both Intel and Apple GPUs are able to address the entirety of the system RAM.



As I explain above, Intel has offered full UMA implementation for many years now. Appel didn't invent UMA. We do not know what the difference between Intel's and Apple's implementation is. I have a suspicion that in Apple's implementation virtual memory is fully under OS control (all M1 processors share the same page table), so the OS can very efficiently swap GPU memory when necessary (it is unclear whether an Intel-based system can do the same). And yes, it's very similar in spirit to Infinity Fabric. Again, Intel had this implemented with their iGPU for a couple of years now, it's just that their implementation is not particularly performant or scalable.

By the way, who is really mismarketing UMA is Nvidia. The have claimed to have "CUDA UMA" some time ago, but it's just BS.



These quotes do describe some older implementations of iGPUs (like 10 years old). Again, that's not how more recent Intel GPUs (Sandy bridge and up) work.

Go look at any Intel laptop on the shelf at Best Buy, Office Max/Staples, etc. If you check the system settings, it will show the amount of available system RAM (i.e., the RAM NOT partitioned off for the iGPU) as right around 1.5-2GB less than the installed system RAM. That's because Intel still partitions off the iGPU RAM separately from the RAM allocated to CPU operations. Even on my 10th Gen MSI gaming laptop, it shows 16.0 GB installed RAM with 14.8 usable, and this is a system with both an iGPU and dedicated GPU.
 

Tchakatak

macrumors member
Dec 15, 2020
52
69
How is that supposed to support the claim you are making? What I am asking is - do you have any factual evidence, or a reference to a source presenting such factual evidence that GPU memory allocation works differently on Intel and Apple GPUs? Both use unified memory architecture with last level cache shared between CPU and GPU. M1 definitely reserves some memory for GPU use, although I am unsure how much.
Just read the anandtech article on the m1..
 

leman

macrumors Core
Oct 14, 2008
19,521
19,678
Go look at any Intel laptop on the shelf at Best Buy, Office Max/Staples, etc. If you check the system settings, it will show the amount of available system RAM (i.e., the RAM NOT partitioned off for the iGPU) as right around 1.5-2GB less than the installed system RAM. That's because Intel still partitions off the iGPU RAM separately from the RAM allocated to CPU operations. Even on my 10th Gen MSI gaming laptop, it shows 16.0 GB installed RAM with 14.8 usable, and this is a system with both an iGPU and dedicated GPU.

I can certainly imagine that the OS will reserve some RAM for basic GPU operation. I think I have mentioned this before myself. But this doesn’t mean that the GPU can’t access more RAM.

This is what Intel itself says: https://www.intel.com/content/www/us/en/support/articles/000020962/graphics.html

I’ll see if I can write a quick test tomorrow and see how much GPU memory I can allocate on Intel, AMD and Apple GPUs.

Anyway, it’s an established fact that modern Intel and AMD iGPUs are UMA architectures with very similar properties as Apple Silicon. There is just no way how one can argue against it given the available technical documentation and verified hardware behavior.
 
  • Like
Reactions: Wizec

leman

macrumors Core
Oct 14, 2008
19,521
19,678
Just read the anandtech article on the m1..

I’ve had plenty of fruitful discussions with Andrei (the author of that article) on Twitter and elsewhere. There is it much in the article itself pertaining to this topic that has not already been mentioned in the thread.
 

ArPe

macrumors 65816
May 31, 2020
1,281
3,325
Go look at any Intel laptop on the shelf at Best Buy, Office Max/Staples, etc. If you check the system settings, it will show the amount of available system RAM (i.e., the RAM NOT partitioned off for the iGPU) as right around 1.5-2GB less than the installed system RAM. That's because Intel still partitions off the iGPU RAM separately from the RAM allocated to CPU operations. Even on my 10th Gen MSI gaming laptop, it shows 16.0 GB installed RAM with 14.8 usable, and this is a system with both an iGPU and dedicated GPU.
Disable iGPU in BIOS
 

Sydde

macrumors 68030
Aug 17, 2009
2,563
7,061
IOKWARDI
It only requires a few megabytes to run a display but GPUs are also used for machine learning, pixel arrays, and specialized calculations. It really depends on the software you are using.

Look around in the prefs and see if you can spot GPU settings. For example, Photoshop can make heavy use of GPUs but it can also be disabled.

I fail to see an upside. Using the GPU could greatly improve PhotoShop performance. With unified memory in the M1, the data being accessed by the GPU would be exactly the same memory that PhotoShop would use the CPU to work on. Actual memory usage would not change significantly, if at all, so what is there to gain?
 

warp9

macrumors 6502
Jun 8, 2017
450
641
I fail to see an upside. Using the GPU could greatly improve PhotoShop performance. With unified memory in the M1, the data being accessed by the GPU would be exactly the same memory that PhotoShop would use the CPU to work on. Actual memory usage would not change significantly, if at all, so what is there to gain?
I assumed it was a Rosetta app and I assumed it would be duplicating buffers and arrays and such things. I don't actually know how Apple deals with that internally though so I can see how I would be wrong.
 

ArPe

macrumors 65816
May 31, 2020
1,281
3,325
I fail to see an upside. Using the GPU could greatly improve PhotoShop performance.

Meh not really. A basic GPU performs most of that apps functions as fast as the most powerful GPU. There are a few filters and options that utilize OpenCL but you won’t notice the difference unless the file is like a ridiculously intense benchmark suite that doesn’t reflect real world uses.

OpenCL should be replaced by now as it is depreciated. We still not seeing Adobe make the change to Metal. The beta is still quite buggy (patch tool and healing brush produce errors) and the neural filters fail to process even though they work in the cloud.
 

Toutou

macrumors 65816
Jan 6, 2015
1,082
1,575
Prague, Czech Republic
Go look at any Intel laptop on the shelf at Best Buy, Office Max/Staples, etc. If you check the system settings, it will show the amount of available system RAM (i.e., the RAM NOT partitioned off for the iGPU) as right around 1.5-2GB less than the installed system RAM. That's because Intel still partitions off the iGPU RAM separately from the RAM allocated to CPU operations. Even on my 10th Gen MSI gaming laptop, it shows 16.0 GB installed RAM with 14.8 usable, and this is a system with both an iGPU and dedicated GPU.
I must agree with @leman that your understanding is a bit incomplete. He's not disagreeing with what you're saying above. AFAIK the newer Intel UMA chips really do partition some RAM off for the iGPU.
The documentation linked earlier clearly states that the zero-copy functionality is opt-in, that the programmer has to declare a buffer explicitly to be shared between the CPU and the GPU. Software written before that API was introduced is not able to use the functionality, so there still needs to be the notion of GPU-exclusive memory. The UMA capabilities are completely optional.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,678
Software written before that API was introduced is not able to use the functionality, so there still needs to be the notion of GPU-exclusive memory. The UMA capabilities are completely optional.

The point is that the "GPU-exclusive" memory in this case is an artificial construct. The memory is still allocated in the CPU-visible area and both CPU and GPU have equal access to it. It's just that you as a user don't have the pointer to the data, so you can't write to it without using the appropriate APIs. This is the same on Apple GPUs as well — when you create a texture using the standard API, Metal will reserve the texture data in the system unified memory, but your application won't have direct access to that memory.
 

LinkRS

macrumors 6502
Oct 16, 2014
402
331
Texas, USA
Hi,

How much ram does the M1 take up from the memory pool to run the GPU?

I want to get it for music production and I'd be happy with 16 gigs but I'm unsure as to how much memory the GPU using, especially with an external display?
Howdy majormike,

I thought I would pipe in here real quick :). In order to drive the built-in display on the 13" M1 MacBook Pro, which is 2560x1600 pixels will need 16 MBs (yes MegaBytes) for a single-buffered frame buffer at 32bpp. Assuming that macOS does double buffering (safe assumption), you can extrapolate that out to 32 MBs for the display. This does NOT account any memory needed for 3D composting or texture storage, but these days unless you are running 3D, texture based workloads, the amount of RAM needed for the display is relatively minimal. Your choice of 16 GBs will have no problem with your audio-based workload. Some would argue that you could save some money and go with the 8 GB version, but I think going 16 GBs is a good choice. Good luck!

Rich S.
 

Krevnik

macrumors 601
Sep 8, 2003
4,101
1,312
You are exactly spot on. TBDR reduces the number of data accesses in fragment shaders, which means a reduction in required bandwidth. You still need the RAM to hold the data though. There are of course things like texture compression etc. and Apple's support here is industry-leading.

In regards to intermediate buffers, TBDR makes it possible to eliminate them in many common cases. Since all processing occurs in tile, you can mix multiple shader invocations (compute and fragment) that apply different processing effects to the tile data without the need to move it to the memory. You only need to store it if you are using this data for something else entirely (e.g. a sharpening pass), but even then you can only store what you need and eliminate other intermediate buffers. It also allows completely new applications, such as fast vector rendering using compute shaders. I don't know if any other API than Metal that would allow you to chain compute shaders that share on-chip memory.

The bolded bit is where I get a bit fuzzy. Mostly because I’ve been out of the loop long enough that I couldn’t really say how much memory gets spent on these sort of things, and how much TBDR actually saves.

Although since you bring up vector rendering, I can’t help but think Apple has been chomping at the bit for such functionality. Is that really that new? I know there are definite wins if you can avoid rasterizing CoreGraphics draw calls on the CPU or to textures for later compositing, but for some reason I thought that was doable a while ago. But maybe I’m mis-reading your comment here.

The point is that the "GPU-exclusive" memory in this case is an artificial construct. The memory is still allocated in the CPU-visible area and both CPU and GPU have equal access to it. It's just that you as a user don't have the pointer to the data, so you can't write to it without using the appropriate APIs. This is the same on Apple GPUs as well — when you create a texture using the standard API, Metal will reserve the texture data in the system unified memory, but your application won't have direct access to that memory.

It feels like folks (and I’m a bit guilty here) are getting hung up on “unified memory”. Since we’re discussing memory usage, it seems that any changes that allow for fewer duplicated buffers (or cut out buffers entirely) are the key metric.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,678
The bolded bit is where I get a bit fuzzy. Mostly because I’ve been out of the loop long enough that I couldn’t really say how much memory gets spent on these sort of things, and how much TBDR actually saves.

It depends on your rendering algorithm. Let's look at a common scenario — deferred rendering. Here you are not shading the pixels immediately but instead collect per-pixel information in a fat buffer (G-buffer). This buffer can contain data such as type of a material at the given pixel, the normal vector, texture coordinates and other things. Afterwards you can perform per-pixel shading using the values from the G-buffer, which allows you to do more complex lighting and other nifty stuff. One problem with deferred rendering is that the G-buffer is a relatively large structure, so moving the data back and forth requires a lot of memory bandwidth. At least on a traditional forward rendering pipeline, where G-buffer is stored as a series of textures that are created during one render pass and then consumed during a different render pass. On Apple GPUs, you know that you are doing your processing one small pixel tile at a time. So you don't need to store the complete G-buffer — only the information for pixels in the given tile. That part of the G-buffer is small enough to fit into GPU on-chip memory (you only get 32KB per GPU cluster). Since Metal allows you to persist on-chip memory between different shader invocations, you don't have to write the per-pixel data into a texture — you just keep it on chip, run your computations and only write out the final values.

A scenario I was mentioning is when you need parts of the G-buffer for some subsequent processing, e.g. you need depth, normals and motion vectors for an AA pass or an image enhancement pass. You will stream out this data as regular textures, but you are still saving a lot of bandwidth on other parts of G-buffer that you don't need afterwards.



Although since you bring up vector rendering, I can’t help but think Apple has been chomping at the bit for such functionality. Is that really that new? I know there are definite wins if you can avoid rasterizing CoreGraphics draw calls on the CPU or to textures for later compositing, but for some reason I thought that was doable a while ago. But maybe I’m mis-reading your comment here.

No, it's not new at all and there has been some really cool work in this department (e.g. https://raphlinus.github.io/rust/graphics/gpu/2020/06/12/sort-middle.html), but using persistent on-chip memory is something exclusive to Apple as far as I am aware. When you do this work with GPUs, you have to use intermediate data buffers to communicate between shader passes. On Apple GPU, you can use tiled compute shaders that will share the same persistent on-chip memory, which simplifies the programming model and potentially improves performance. I am still experimenting with this stuff, but I think it's a really cool feature that is still poorly understood. I'd expect people come up with all kinds of crazy rendering techniques using this functionality that cannot be done efficiently with a traditional rendering pipeline.
 
  • Like
Reactions: Krevnik
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.