Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,628
1,101
Only Nvidia has proper hardware RT, Intel and AMD just slapped-on some intersection acceleration.
According to Imagination's ray tracing levels system,
Intel's ray tracing hardware is more advanced than Nvidia's because it can sort rays.
 
  • Like
Reactions: jujoje

leman

macrumors Core
Oct 14, 2008
19,522
19,679
Intel's ray tracing hardware is more advanced than Nvidia's because it can sort rays.

Ah, thanks for pointing out that Intel's hardware is more sophisticated than I thought! Nvidia has that stuff too though, probably since the beginning.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,628
1,101
The 40 series does Shader Execution Reordering, which I believe is the exact same thing.
You're right! Although I saw the presentation, I completely forgot about it.

ser.png
 

leman

macrumors Core
Oct 14, 2008
19,522
19,679
Btw, I was just made aware of this very interesting Apple patent:


It describes an energy-efficient way to do RT intersection tests - dedicated hardware that runs intersection tests using limited precision and then follows up in a compute shader if the initial test indicates a hit. Since most tests in RT are misses it should massively cut down the time hardware spends doing RT checks.
 

diamond.g

macrumors G4
Mar 20, 2007
11,437
2,665
OBX
Btw, I was just made aware of this very interesting Apple patent:


It describes an energy-efficient way to do RT intersection tests - dedicated hardware that runs intersection tests using limited precision and then follows up in a compute shader if the initial test indicates a hit. Since most tests in RT are misses it should massively cut down the time hardware spends doing RT checks.
Is that different than what AMD is currently doing?
 

terminator-jq

macrumors 6502a
Nov 25, 2012
719
1,515
Btw, I was just made aware of this very interesting Apple patent:


It describes an energy-efficient way to do RT intersection tests - dedicated hardware that runs intersection tests using limited precision and then follows up in a compute shader if the initial test indicates a hit. Since most tests in RT are misses it should massively cut down the time hardware spends doing RT checks.
Wow that was interesting! Great find!

Based on the file date and publish date, it seems like we could see this ray tracing tech debut in the Mac Pro and then make its way into the M3 series (the M2 Pro/ Max could be possible as well).

It’s good to see Apple making a move like this on the GPU front. On the CPU side, they have definitely demonstrated their ability to beat Intel and AMD with better performance and efficiency but on the GPU side… they need some work. There’s no excuse for a “Pro” level chip to not have at least some sort of hardware raytracing built in. Let’s hope Apple can implement this quickly and effectively.
 
  • Like
Reactions: vel0city

leman

macrumors Core
Oct 14, 2008
19,522
19,679
Is that different than what AMD is currently doing?

No idea. The way I interpret the documentation is that AMD has dedicated instructions that do intersection tests. I do not know whether they have a separate traversal hardware unit or any other optimisations.
 

diamond.g

macrumors G4
Mar 20, 2007
11,437
2,665
OBX
No idea. The way I interpret the documentation is that AMD has dedicated instructions that do intersection tests. I do not know whether they have a separate traversal hardware unit or any other optimisations.
From my understanding the texture processing unit does that and intersection traversal. So in a way I guess that part isn't dedicated, but AMD has shaders do the rest (where Nvidia has more hardware dedicated for other stuff that helps the speed along). It seems like Apple is leaning towards the AMD route, per the patent, or at least my understanding of it. Which admittedly could be wrong.
 

jmho

macrumors 6502a
Jun 11, 2021
502
996
I found another RT patent.
Can the persons here explain it a bit more?

Could this patent work together with the other patent posted above?


Shaders in metal have essentially three levels:

Threads - this is the number of times you want the shader to run, often in the millions.
Threadgroups - Threads that run together in a batch. The GPU will often take say 512 threads and run them together, these threads can potentially work together and share memory.
Simdgroups: Threadgroups are then broken down even further into simdgroups for actual parallel execution on the GPU. Every thread in a simdgroup executes the exact same code, so it's really important that shaders in a simdgroup do basically the same thing. If shaders in a simdgroup do task A or B or C, then every single shader in the group will have to do A and B and C which is obviously very slow. Ideally you would want to put all task A shaders in one simd group, and all task B in another etc.

This patent seems to be about forming optimal simd groups while raytracing, which is probably a bit like nVidia's Shader Execution Reordering.
 

leman

macrumors Core
Oct 14, 2008
19,522
19,679
I found another RT patent.
Can the persons here explain it a bit more?

Could this patent work together with the other patent posted above?



I find patents notoriously difficult to read, but from a cursory glance it seems to me that this patent simply describes hardware-accelerated RT as such. It’s appears to be about the ability of the ray-tracing hardware to issue work to compute cores and via versa. In a nutshell, the compute engine initiates ray tracing, the dedicated scene traversal hardware performs the ray tracing and invokes up a new compute task that processes the results. Other parts of the patent discuss in detail the interaction between the RT hardware and the compute engine. I can’t really tell if the patent mentions work compacting or ray reordering at all.

To add to @jmho’s reply: Apple GPUs are vector processors that operate on 1024-bit vectors (32 FP32 numbers) per cycle. A single add operation for example operates on inputs of 32 numbers at once. This is what is known as SIMD (single instruction multiple data). So a simdgroup is essentially just a one instance of a shader program. As @jmho mentions, with this kind of hardware you really want every instruction to do as much useful work as possible, so it’s advantageous to design the program (and your data layouts!) in a way that 32 values can be processed at once. Otherwise you are just wasting compute resources. Example: most modern GPUs have massive problems with small or “long” triangles, because they process blocks of 4x8 pixels at once and most of this block will be empty along the edges of such triangles.

And finally, let’s talk about timing. Nvidia RT patents were filed in 2014/2015. If I remember correctly the first RT hardware came out in 2018 - tests three to four years to the market. Apple patents are filed in 2020, so I’d expect the hardware to arrive in 2013 at the earliest. But they might surprise us and give us the M2 Pro with this stuff included already this year. Anyway, it kind of explains why there were no changes to the GPU from the A15 on. Apples Team must be very busy :)


P.S. These are the Apple RT patents I found, maybe there are more, no idea (we already discussed two of them)



 
Last edited:

exoticSpice

Suspended
Jan 9, 2022
1,242
1,952
I find patents notoriously difficult to read, but from a cursory glance it seems to me that this patent simply describes hardware-accelerated RT as such. It’s appears to be about the ability of the ray-tracing hardware to issue work to compute cores and via versa. In a nutshell, the compute engine initiates ray tracing, the dedicated scene traversal hardware performs the ray tracing and invokes up a new compute task that processes the results. Other parts of the patent discuss in detail the interaction between the RT hardware and the compute engine. I can’t really tell if the patent mentions work compacting or ray reordering at all.

To add to @jmho’s reply: Apple GPUs are vector processors that operate on 1024-bit vectors (32 FP32 numbers) per cycle. A single add operation for example operates on inputs of 32 numbers at once. This is what is known as SIMD (single instruction multiple data). So a simdgroup is essentially just a one instance of a shader program. As @jmho mentions, with this kind of hardware you really want every instruction to do as much useful work as possible, so it’s advantageous to design the program (and your data layouts!) in a way that 32 values can be processed at once. Otherwise you are just wasting compute resources. Example: most modern GPUs have massive problems with small or “long” triangles, because they process blocks of 4x8 pixels at once and most of this block will be empty along the edges of such triangles.

And finally, let’s talk about timing. Nvidia RT patents were filed in 2014/2015. If I remember correctly the first RT hardware came out in 2018 - tests three to four years to the market. Apple patents are filed in 2020, so I’d expect the hardware to arrive in 2013 at the earliest. But they might surprise us and give us the M2 Pro with this stuff included already this year. Anyway, it kind of explains why there were no changes to the GPU from the A15 on. Apples Team must be very busy :)


P.S. These are the Apple RT patents I found, maybe there are more, no idea (we already discussed two of them)



Interesting leman. Thank you for your explanation and you as well @jmho. These next few years are going to be fun!
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
Based on the file date and publish date, it seems like we could see this ray tracing tech debut in the Mac Pro and then make its way into the M3 series (the M2 Pro/ Max could be possible as well).

debut in Mac Pro? Errr, a tether-less , battery only powered, VR/AR headset likely has much higher pressing need for a hyper low power consumption RT hardware.

Several of these "work way smarter , not harder" patents smell far more driven by being limited by a small battery than on primarily focused on constructing some Nvidia x090/x080 or AMD x900/x800 'killer' large GPU.

It would trickle out the Macs ( and other SoCs) , but it won't be surprising if it doesn't start there.
 

Boil

macrumors 68040
Oct 23, 2018
3,478
3,173
Stargate Command
Ray-Tracing debuting in the ASi Mac Pro as part of the toolset for creating content for the forthcoming Apple Mixed Reality glasses/goggles/headset/whatever...?
 

leman

macrumors Core
Oct 14, 2008
19,522
19,679
Ray-Tracing debuting in the ASi Mac Pro as part of the toolset for creating content for the forthcoming Apple Mixed Reality glasses/goggles/headset/whatever...?

What would be the value of that? Metal has RT for what, two years now? And something like M1 Pro is already fast enough for real-time RT on simpler geometries if you forego denoising.
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
Ray-Tracing debuting in the ASi Mac Pro as part of the toolset for creating content for the forthcoming Apple Mixed Reality glasses/goggles/headset/whatever...?


If the current API has a call that takes a 'ray' and a point to a function to do the ray computation if necessary then don't really need to change that call at all if there is now a "hardware" to support whether can skip the compute function call. The Apple library code invokes the 'test if a complete miss' call and then skips the user supplied call as appropriate. The application level code wouldn't change at all.

Similar issues if the work scheduling is largely being handled by the Apple foundation library code.

Basically same principle of how Apple dispatches to a ProRes accelerate if there is hardware present or 'falls back' to software ProRes decoding if there isn't one. The applications are suppose to make the same Apple library API call regardless.

Apple has been promoting their AR libraries for more than several years now. The hardware acceleration point should be to make the existing code go faster at far lower power consumption; not that app developers have to rewrite for every iteration of hardware evolution. Few folks are going to use a RT hardware interface that is constantly churning every couple of years.

There is a huge , increasingly legacy preconception that the VR/AR app has to run tether to some gigantic , firebreathing GPU to construct a AR/VR app that runs on tethered headset . That should not be the case at all. For better or worse, Apple killed off that development approach years ago. Their efforts have been to go around that generally across the whole line up so that what folks built toward Apple GPU would move straightforwardly onto the headset. The headset is going to use the baseline Apple GPU tech also.
 

theorist9

macrumors 68040
May 28, 2015
3,882
3,061
Given that NVIDIA's 40-series hardware RT is the product of three generations of refinement (it was introduced with the 20-series), I wonder if Apple's first offering, which I assume will use Imagination's RT tech, will be as performant.

Imagination's marketing materials say it will be, but has there been any independent testing of Imagination's commercial implementation of this, which can be seen in the IMG CXT GPU, to see if the reality lives up to the claims? I did a quick Google search and didn't turn up anything.
 
Last edited:

leman

macrumors Core
Oct 14, 2008
19,522
19,679
If the current API has a call that takes a 'ray' and a point to a function to do the ray computation if necessary then don't really need to change that call at all if there is now a "hardware" to support whether can skip the compute function call. The Apple library code invokes the 'test if a complete miss' call and then skips the user supplied call as appropriate. The application level code wouldn't change at all.

Precisely. Metal already has a state of the art raytracing support and it has obviously been designed with hardware acceleration in mind. The software layer won’t change.

I wonder if Apple's first offering, which I assume will use Imagination's RT tech, will be as advanced.

Why do you think they will use Imaginations IP? Apples RT patents seem to be an entirely new effort.
 

theorist9

macrumors 68040
May 28, 2015
3,882
3,061
Why do you think they will use Imaginations IP? Apples RT patents seem to be an entirely new effort.
I don't know for certain, but I assumed that would be the case because of the deal Apple signed with Imagination in 2020:

The patents seem less definitive, since Apple implements only a tiny percentage of what they patent. Often patents are filed either speculatively (for technology a company doesn't intend to use now, but wants the option of using in the future); or defensively (for tech a company has no intention of using now or in the futurre, but wants to prevent competitiors from using).

Do you have anything definitive showing what Apple will use?

And regardless of what Apple uses, my essential question remains: How likely is it that Apple, with their first hardware RT implementation, will be able to offer something as performant as that in NVIDIA's 40-series which, again, is the product of three generations of refinement?

Remember that while Apple's M-series chips were a game-changer, they were an extension of something with which Apple had years of practical experience: The A-series chips. Hardware RT is qualitatively different, since it's something they've never done before, in any form (right?). Hence my question.
 
Last edited:

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,628
1,101
How likely is it that Apple, with their first hardware RT implementation, will be able to offer something as performant as that in NVIDIA's 40-series?
For reference, ARC A770 can render almost as fast as an RTX 3050.
Intel-Arc-A770-and-A750-Performance-Blender-BMW-and-Classroom-680x383.jpg


And it should render faster than the M1 Ultra with 64 cores.
a770vsM1Ultra.png
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
I don't know for certain, but I assumed that would be the case because of the deal Apple signed with Imagination in 2020:

The patents seem less definitive, since Apple implements only a tiny percentage of what they patent.

Do you have anything definitive showing what Apple will use?

Pretty good chance Apple implementation will not be exactly the same. The bigger issue is how much do the approaches overlap. Patents usually have a section where the reference prior art and related patents. I haven't done an exhaustive search of these new two but there is a pretty good chance they reference some 'concepts' in some other Imagination Tech patents.

For example this first new one listed.

"...

Patent Citations (3)​

Publication numberPriority datePublication dateAssigneeTitle
Family To Family Citations
US9928640B2 *2015-12-182018-03-27 Intel Corporation Decompression and traversal of a bounding volume hierarchy
US10825230B2 *2018-08-102020-11-03 Nvidia Corporation Watertight ray triangle intersection
US10970914B1 *2019-11-152021-04-06 Imagination Technologies Limited Multiple precision level intersection testing in a ray tracing system
* Cited by examiner, † Cited by third party
..."


Surprise , surprise , surprise. There is an Imagination Tech patent with almost the same title. Completely zero overlap in how those two approached leverage the TBDR tile memory cache or other foundationally common infrastructure. Apple has a different GPU implementation but the stack of patents it is built off of shares lots of infrastructure patents with ImgTech. Apple is still licensing stuff.

Rather than get into an exhaustive litgative match it makes sense just to license the stuff from Imgination tech. It also a defensive move because can see Nvidia is still there also with some similar techniques.




Similar with other one dug up
https://patents.google.com/patent/US20220036630A1/en?oq=US20220036630A1

Go to citations list and look for absence of Imagination Tech. Doesn't happen.


And regardless of what Apple uses, my essential question remains: How likely is it that Apple, with their first hardware RT implementation, will be able to offer something as performant as that in NVIDIA's 40-series which, again, is the product of three generations of refinement?

Your presumption is that they are solely after 'preformance' as opposed to 'perf/watt'. Every year from 2020 through 2022 at every new Apple Silicon SoC introduction, Apple gets up and preaches another sermon on "Pref/Watt". When Apple rolls out hardware RT there is a very good chance going to get another "Pref/Watt" sermon.

Apple isn't out to built a x090 'killer' GPU. Nothing that Nvidia makes is even remotely suitable for a AR/VR headset. Qualcomm is the vendor who has real, shipping volume headsets with a SoC. Not Nvidia.

If the point was to make something that was plugged into a wall for power and completely tethered, then Nvidia might be highly relevant.


The other issue is the software. The Nvidia 40 series is benefitting from a multiple years of laying foundation for the software RT calls that calls the hardware. That first 1-1.5 years after Nvidia rollout out their first generation RT hardware pragmatically didn't buy a whole lot.



Hardware RT is qualitatively different, since it's something they've never done before, in any form (right?). Hence my question.

Pragmatically, It is not purely a hardware issue. Nvidia GPU hardware is how useful to macOS 13 how with no GPU drivers?






P.S. Just like it is in Apple's best interest to keep getting new Arch licenses for new ARM architectures ( to keep the ecosystem healthy), it is also in Apple's general interest to keep ImagTech somewhat afloat with new licensing also.
If ImaginTech patents fell into the hands of an entity that was hyper hostile to Apple ... may not be saving any money long term by feeding ImaginTech to the wolves.
 
Last edited:

leman

macrumors Core
Oct 14, 2008
19,522
19,679
And regardless of what Apple uses, my essential question remains: How likely is it that Apple, with their first hardware RT implementation, will be able to offer something as performant as that in NVIDIA's 40-series which, again, is the product of three generations of refinement?

Remember that while Apple's M-series chips were a game-changer, they were an extension of something with which Apple had years of practical experience: The A-series chips. Hardware RT is qualitatively different, since it's something they've never done before, in any form (right?). Hence my question.

I think that Apple has enough resources and talent to deliver a good implementation. Hardware RT is a new domain in general, and while Nvidia is an absolute pioneer in this area, it doesn't mean that there is only one viable approach. As @deconstruct60 writes, Apple will probably pursue efficiency (which is not Nvidia's strongest point).
 

theorist9

macrumors 68040
May 28, 2015
3,882
3,061
Your presumption is that they are solely after 'preformance' as opposed to 'perf/watt'.
I think you misunderstand what I'm asking. I'm wondering about how advanced/performant Apple's RT implementation will be compared to NVIDIA's. I.e., NVIDIA's implementation gives tasks that benefit from RT a certain percent increase in performance over what the performance would be without RT. So when I ask how performant Apple's application of RT is compared to NVIDIA's, I'm asking how Apple's percentage performance gain from RT will compare. I'm not asking about absolute performance.

Or are are you saying that you can get a higher percentage increase in performance from RT by having a bigger area of the die devoted to RT, would would cost more watts?
I think that Apple has enough resources and talent to deliver a good implementation. Hardware RT is a new domain in general, and while Nvidia is an absolute pioneer in this area, it doesn't mean that there is only one viable approach. As @deconstruct60 writes, Apple will probably pursue efficiency (which is not Nvidia's strongest point).
See my reply to deconstruct60, above.
Pragmatically, It is not purely a hardware issue. Nvidia GPU hardware is how useful to macOS 13 how with no GPU drivers?
This is just word salad. When you write this way, you're asking your readers to do the work to translate your words into something comprehensible. Don't make me work so hard to understand you!
 
Last edited:
  • Like
Reactions: sirio76

leman

macrumors Core
Oct 14, 2008
19,522
19,679
I think you misunderstand what I'm asking. I'm wondering about how advanced/performant Apple's RT implementation will be compared to NVIDIA's. I.e., NVIDIA's implementation gives tasks that benefit from RT a certain percent increase in performance over what the performance would be without RT. So when I ask how performant Apple's application of RT is compared to NVIDIA's, I'm asking how Apple's percentage performance gain from RT will compare. I'm not asking about absolute performance.


Since you also directed this question at me, the simple truth is that we don’t know. As Apple is clearly pursuing a different approach than other companies, we’ll have to wait and see. But judging by the patents we’ve seen I think there are reasons to be cautiously optimistic. Using reduced precision for conservative preliminary tests suggests an area-efficient solution, so there is a possibility that Apples hardware might end up with a higher ray testing throughput than the competitors, at lower power.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.