Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

treehuggerpro

macrumors regular
Oct 21, 2021
111
124
For some scene specific numbers on how Apples x4 RT metric might work out . . .

From the Redshift Benchmark:

1x 4090+RTX - 1m23s (83s)
4090+RTX.jpg


2x 3090+RTX - 1m14s (74s)
2x 3090+RTX.jpg


M2 Ultra x60 GPU - 4m29s (269s)
M2 Ultra x60 GPU.jpg


All things remaining the same, a 60core M2 Ultra would need about a x3.25 boost from Apple RT to land where a 4090 benches with RTX on (in the current version of Redshift).
 

leman

macrumors Core
Oct 14, 2008
19,521
19,674
(a) Can you give a reference to the Ray Tracing patents you've been looking at?

Sure, here are some of the patents that give you pretty much the detailed story (there is substantial overlap between them):


(b) The reason I mentioned Geometry hardware is even many of the recent Apple patents talk about a Vertex Master (along with Pixel Master and Data Master). It's hard to know what these refer to, but Imagination documentation from say around 2018 says that these ultimately refer to dedicated hardware as opposed to the generic Shader hardware.

Yeah, I was wondering about the same thing. I would speculate these modules have to do with configuring the hardware to run the shaders ad invoke the associated fixed-function hardware etc. I mean, even with the most flexible compute architectures there will be fixed-function stuff (binning, rasterization, early depth and alpha test, compute data compression etc.). Asahi people probably know more, one would need to look around in their drivers and maybe ask Alyssa.


The other question is how "programmable" this RT hardware is. For example, can it ONLY walk a BVH tree and perform intersections? Or can it walk a generic pointer based data structure and apply a function to every end node?

As far as I know, nVidia's hardware can still only do Ray Tracing, which I find strange. You'd think, based on their history, they'd be defining "Node shaders" that operate as I described, to get more value out the hardware.

I understand the appeal of general graph traversal acceleration, but this would mean giving the accelerator the ability to execute arbitrary node test code, and that probably introduces a lot of additional complexity as well as requiring bigger transistor and power budget. A device specialised for RT can be designed for a specific hard-coded graph layout and with specific features in place. E.g. one of Apple's core idea is that intersection calculation is done conservatively, using low precision (and hits are then verified on the regular shader hardware using full precision). This means that the RT accelerator doesn't need to implement full-precision ALUs (or even full functionality ALUs). I can imagine that this can save them a lot of transistors, while also enabling more intersection circuits at the same budget.

Maybe one day we will see more general graph traversal hardware in the GPUs, but it sounds expensive to me.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
To predict the potential performance of M3 Ultra with hardware-based ray tracing, we can compare the performance of RTX 3090 on CUDA with that of RTX 4090 on OptiX. For example, in Blender, RTX 4090 on OptiX takes 17 seconds, while RTX 3090 on CUDA takes 50 seconds to render this scene, so RTX 4090 is almost 3 times faster than RTX 3090. So, M3 Ultra is likely to be 3 to 4 times faster than M2 Ultra.

Blender-Cycles-GPU-Render-Performance-Secret-Deer-CUDA-vs-OptiX-1.jpg

 
  • Like
Reactions: jujoje

stevemiller

macrumors 68020
Oct 27, 2008
2,057
1,607
i'll be curious to see how this pans out in m3 hardware, but my prediction is everyone hoping for a 3-4x speedup is going to be disappointed. i suspect we'll see a 2x speedup at most (and probably less) going from m2 to m3 (which at least in blender is comparable to m1 to m2 gpu improvements).

my gut is that apple wants to stay relevant with 'decent' performance and modern tech (like rt and neural engines) but isn't fighting for some sort of performance crown. more like performance per watt instead.

also their iphone demo showing rt on and off was super awkward in my opinion. i attached the 50% wipe between rt on and off, and sure for people who are looking for it you could pick out some extra nuance in reflections. but for a casual observer, this isn't doesn't feel like a meaningful difference at all. makes me wonder if even apple knows what to do with rt.
 

Attachments

  • Screenshot 2023-09-14 at 9.23.07 AM.png
    Screenshot 2023-09-14 at 9.23.07 AM.png
    2.9 MB · Views: 82

innerproduct

macrumors regular
Jun 21, 2021
222
353
Well, tbh, cpu rendering is still the best choice for massive things.
I mostly hope for some really smart xpu renderer that supports mac. The shared mem should be a game changer right?
 
  • Like
Reactions: aytan and sirio76

leman

macrumors Core
Oct 14, 2008
19,521
19,674
i'll be curious to see how this pans out in m3 hardware, but my prediction is everyone hoping for a 3-4x speedup is going to be disappointed. i suspect we'll see a 2x speedup at most (and probably less) going from m2 to m3 (which at least in blender is comparable to m1 to m2 gpu improvements).

What makes you think that? Genuinely curious. Nvidia achieves a huge speedup between their CUDA and OPTIX backends in Blender (over 3x), and there are reasons to believe that Apple's implementation is more sophisticated.

also their iphone demo showing rt on and off was super awkward in my opinion. i attached the 50% wipe between rt on and off, and sure for people who are looking for it you could pick out some extra nuance in reflections. but for a casual observer, this isn't doesn't feel like a meaningful difference at all. makes me wonder if even apple knows what to do with rt.

They could have chosen a more visually impressive demo (there is a good reason one often goes for reflection-heavy scenes to show off RT), but their raytraced scene clearly shows impressive soft shadows and light volumes, as well as global illumination. It's subtle, but it's there, and it's no less impressive.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
What makes you think that? Genuinely curious. Nvidia achieves a huge speedup between their CUDA and OPTIX backends in Blender (over 3x), and there are reasons to believe that Apple's implementation is more sophisticated.
To achieve such an incredible improvement between RTX 3090 and RTX 4090, Nvidia has had to use a more advanced node (from Samsung 8nm to TSMC 5nm) and increase power consumption. Apple may have less leeway to improve performance between generations.
 

NT1440

macrumors Pentium
May 18, 2008
15,092
22,158
To achieve such an incredible improvement between RTX 3090 and RTX 4090, Nvidia has had to use a more advanced node (from Samsung 8nm to TSMC 5nm) and increase power consumption. Apple may have less leeway to improve performance between generations.
If we follow this trajectory GPU makers are in trouble, unless they somehow convince people a standalone power supply for just the card will be acceptable for the 5 and 6 series…
 
  • Like
Reactions: MRMSFC

leman

macrumors Core
Oct 14, 2008
19,521
19,674
To achieve such an incredible improvement between RTX 3090 and RTX 4090, Nvidia has had to use a more advanced node (from Samsung 8nm to TSMC 5nm) and increase power consumption. Apple may have less leeway to improve performance between generations.

That’s not what I am talking about. I was referring to OPTIX vs. CUDA performance. Nvidia can achieve 3x improvement on same hardware with hardware RT enabled. Why can’t Apple?
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
I was referring to OPTIX vs. CUDA performance. Nvidia can achieve 3x improvement on same hardware with hardware RT enabled. Why can’t Apple?
Only on less powerful GPUs. The more powerful the GPU, the less OptiX improves.

Improvement varies greatly from scene to scene.

f74f5421110afdbf0ea1ae8316da97ac54db6e31_2_1035x496.png

Blender-Cycles-GPU-Render-Performance-Scanlands-CUDA-vs-OptiX-1.jpg


e2395df63fffb40120c68f8720745a44583341f1_2_1035x496.png

Blender-Cycles-GPU-Render-Performance-White-Lands-CUDA-vs-OptiX-1.jpg

 
Last edited:
  • Like
Reactions: innerproduct

Chuckeee

macrumors 68040
Aug 18, 2023
3,062
8,723
Southern California
To predict the potential performance of M3 Ultra with hardware-based ray tracing
Seems like the discussion is beginning to “go down the rabbit hole” of benchmarking standards and approaches to quantify capabilities. More than just questioning numbers, I’m questioning the approach. I believe that there is a general problem in the industry (actually many industries) were capabilities are tailored to make be benchmarks look good vs actual capabilities for users.

I don’t have a solution, just let’s be thoughtful when tossing around numbers like 17.8% better or 57x faster
 
Last edited:
  • Like
Reactions: sirio76 and leman

jeanlain

macrumors 68020
Mar 14, 2009
2,459
953
To predict the potential performance of M3 Ultra with hardware-based ray tracing, we can compare the performance of RTX 3090 on CUDA with that of RTX 4090 on OptiX. For example, in Blender, RTX 4090 on OptiX takes 17 seconds, while RTX 3090 on CUDA takes 50 seconds to render this scene, so RTX 4090 is almost 3 times faster than RTX 3090. So, M3 Ultra is likely to be 3 to 4 times faster than M2 Ultra.

View attachment 2261432
Why is CUDA faster than Optix in this test?
 
  • Wow
Reactions: Xiao_Xi

bcortens

macrumors 65816
Aug 16, 2007
1,324
1,796
Canada
Only on less powerful GPUs. The more powerful the GPU, the less OptiX improves.

Improvement varies greatly from scene to scene.

View attachment 2261761
View attachment 2261762

View attachment 2261760
View attachment 2261764
While it is true that the scaling appears to get worse as the GPUs get more powerful it isn't a huge difference and isn't as consistent as saying the slower cards always benefit more than the faster cards.

As we can see below that the white lands scene which benefits the least from RT of the two you posted shows that the benefit from RT differs by less than 10% based on performance of the card.

White Lands Scene
1.20x - 4080, 4090
1.22x - 4070

1.18 - 3090
1.21 - 3070
1.20 - 3050


In scan lands there with the 30XX series the slower cards do appear to benefit more from RT than the faster cards but that isn't as clear in the 40XX series since we only have 3 data points.

Scan Lands Scene
2.38x - 4090
2.65x - 4080
2.57x - 4070

2.55x - 3090
2.67x - 3070
2.82x - 3050


The subtask breakdowns in which you compare intel - AMD - Nvidia aren't useful for determining whether the improvement changes based on the power of the hardware because there are underlying architectural differences that make that comparison impossible. Those breakdowns DO show that it really varies from scene to scene however.
 
  • Like
Reactions: Xiao_Xi

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Why is CUDA faster than Optix in this test?
The labels are interchanged. :eek:
To avoid confusion, those CUDA/OptiX graphs should have retained the same sorting as the regular OptiX-only graphs. The problem is that the primary color in each graph is blue, and that happened to be CUDA in the secondary graphs. Ignoring the coloring, they're still accurate.
 

innerproduct

macrumors regular
Jun 21, 2021
222
353
If you look at octanes benchmark, https://render.otoy.com/octanebench...le_by=linear&filter=&singleGPU=1&showRTXOff=1
Rtx on/off gives an average advantage across scenes of 20-40%. The same was true in redshift when RTX was enabled. Blenders bench is weird in that way.
Do a landscape scene with 1000s of non instanced trees that uses a single texture and you will be in rt acceleration benefit heaven. Do a superrealistic rendering of a human face with gigs of texture data, sub surface scattering, hair using curves etc and the benefit will be less. Etc.
 
  • Like
Reactions: sirio76

stevemiller

macrumors 68020
Oct 27, 2008
2,057
1,607
What makes you think that? Genuinely curious. Nvidia achieves a huge speedup between their CUDA and OPTIX backends in Blender (over 3x), and there are reasons to believe that Apple's implementation is more sophisticated.
Again I could be wrong. But raytracing by no means guarantees enormous performance jumps. Amd’s first attempt seems pretty modest, for example.
 

Attachments

  • IMG_1470.jpeg
    IMG_1470.jpeg
    616.2 KB · Views: 70

quarkysg

macrumors 65816
Oct 12, 2019
1,247
841
Ray tracing, from a consumer point of view is not that interesting if you asked me. I would think most folks with AMD and Nvidia GPUs do not really make use of RT for gaming if they value high frame-rate. And I don't think many will own the highest end GPUs just for gaming.

Apple likely is targeting RT for the professional folks rendering movies, especially large scenes with tons of render assets. Hundreds of GB for the GPU to use is really good for such an application.

An M3 Pro/Max with HW RT (if it comes with the M3) will likely not perform exceptionally well in games, so likely RT will not be used.
 
  • Like
  • Wow
Reactions: jujoje and Chuckeee

leman

macrumors Core
Oct 14, 2008
19,521
19,674
Again I could be wrong. But raytracing by no means guarantees enormous performance jumps. Amd’s first attempt seems pretty modest, for example.

AMD’s first attempt is a low-effort hack, they have slapped a naive ray intersection loop on top of their texture unit and that’s it. Apple’s implementation is much more sophisticated. Of course, the proof will be in actual benchmarks. I’m very curious to see what they have achieved.
 
  • Like
Reactions: altaic

jeanlain

macrumors 68020
Mar 14, 2009
2,459
953
Ray tracing, from a consumer point of view is not that interesting if you asked me. I would think most folks with AMD and Nvidia GPUs do not really make use of RT for gaming if they value high frame-rate. And I don't think many will own the highest end GPUs just for gaming.

Apple likely is targeting RT for the professional folks rendering movies, especially large scenes with tons of render assets. Hundreds of GB for the GPU to use is really good for such an application.

An M3 Pro/Max with HW RT (if it comes with the M3) will likely not perform exceptionally well in games, so likely RT will not be used.
I concur. For me, RT in games is mostly a marketing gimmick. The difference in visual quality, compared to the best rasterization techniques, is minor and the performance hit is huge. I would have preferred if Apple improved rasterization performance instead.
 
  • Like
Reactions: quarkysg

jujoje

macrumors regular
May 17, 2009
247
288
Ray tracing, from a consumer point of view is not that interesting if you asked me. I would think most folks with AMD and Nvidia GPUs do not really make use of RT for gaming if they value high frame-rate. And I don't think many will own the highest end GPUs just for gaming.

Apple likely is targeting RT for the professional folks rendering movies, especially large scenes with tons of render assets. Hundreds of GB for the GPU to use is really good for such an application.

An M3 Pro/Max with HW RT (if it comes with the M3) will likely not perform exceptionally well in games, so likely RT will not be used.


Yeah def agree with this.

So far raytracing has been pretty underwhelming; starting to see it get adopted a bit more meaningfully now - Alan Wake 2 and the new Cyberpunk update spring to mind - but to be honest still feels a ways out from being a must have feature.

For 3D content creation, if the new hardware raytracing is decent, it makes Apple's gpu pretty tempting; if you can render large sets or detailed FX on a Mac Studio it become pretty compelling particularly given the amount of memory. Similar for GPU accelerated simulations, which will easily exceed 24GB, putting them in the 'very expensive' category if you were looking by Nvidia hardware...

Raytracing wise, feel it doesn't have to beat the 4090, but has to be fast enough.
 
  • Like
Reactions: sirio76 and MRMSFC

leman

macrumors Core
Oct 14, 2008
19,521
19,674
I concur. For me, RT in games is mostly a marketing gimmick. The difference in visual quality, compared to the best rasterization techniques, is minor and the performance hit is huge. I would have preferred if Apple improved rasterization performance instead.

Not that long time ago people were saying the same thing about programmable shading.

Graphics is a constant game of evolution, as new approaches and techniques become more feasible, old hacks are being phased out. I have little doubt that in a not too far future 3D games will be fully raytraced, simply because RT dramatically simplifies the engine code.
 

jeanlain

macrumors 68020
Mar 14, 2009
2,459
953
I have little doubt that in a not too far future 3D games will be fully raytraced, simply because RT dramatically simplifies the engine code.
But at what cost? In this day and age, we should be aiming at power efficiency. Ray tracing is the opposite of efficiency. It is a brute-force solution that consumes more power. Rasterization tries to find smart ways to use less ressources.
 
Last edited:

leman

macrumors Core
Oct 14, 2008
19,521
19,674
But at what cost? In this day and age, we should be aiming at power efficiency. Ray tracing is the opposite of efficiency. It is a brute-force solution that consumes more power. Rasterization tries to be find smart way to use less ressources.

I'd say it's more about achieving your goals within the given (power, performance) budget. If you target frame time under 16ms and can achieve it on your target hardware using RT, why wouldn't you do it?

What I am trying to say that relative cost of certain algorithms and approaches goes down as the technology improves. Not so long ago we very using fixed precision math to do calculations to get better power efficiency. Now it's full floating point everywhere. Games used to have pre-baked fixed lighting, now it's fully dynamic stuff with advanced effects. Over last two decades we got dramatically less power efficient in our approaches, because the complexity — and ambition — of the algorithms grew a lot, but so did the hardware capability. Much more work at the same power budget. If the goal were to simply reduce the energy consumption, then folks should stop using their fancy shaders.

Twenty years ago a real-time rasterization pipeline with volumetric soft shadows, dynamic lights, and ambient occlusion was entirely unrealistic, and I bet some people were arguing that the GPUs should improve the basic vertex processing rates instead of pursuing decadent things like programmable shaders. But it's just as possible that twenty years in the future every mobile phone will be capable of real-time raytracing on high-complexity scenes in 5K.

And finally, you say that RT is inherently power-hungry because that's how contemporary implementations are. But these are not the only possible implementations. Just like the shader utilisation and cache efficiency of a rasteriser can be dramatically improved vie techniques like binning and deferred rendering, similar things are possible with hardware RT as well.
 

name99

macrumors 68020
Jun 21, 2004
2,410
2,317
Ray tracing, from a consumer point of view is not that interesting if you asked me. I would think most folks with AMD and Nvidia GPUs do not really make use of RT for gaming if they value high frame-rate. And I don't think many will own the highest end GPUs just for gaming.

Apple likely is targeting RT for the professional folks rendering movies, especially large scenes with tons of render assets. Hundreds of GB for the GPU to use is really good for such an application.

An M3 Pro/Max with HW RT (if it comes with the M3) will likely not perform exceptionally well in games, so likely RT will not be used.
Ray tracing may be important for AR, to properly "ground" artificial objects in the real world. In earlier years Apple has shown demo's of how, if you don't include things like shadows, it's hard to see exactly where an artificial AR object is supposed to be in space; it may look like it is floating a foot above the ground.

So value provided by RT hardware would appear to be
- (definite) making high end Apple hardware at least somewhat competitive and relevant to people running a lot of blender type apps.
And presumably the cheaper it becomes (in performance) the more adding ray-tracing flare will move from the specialty of a few professionals to something present in more mainstream apps like Illustrator and Photoshop, maybe even video editing. (In these uses it pairs well with AI hardware, which can extract a good enough 3D model from image/video, and then apply ray tracing to that.)

- (definite, but mostly for boasting purposes) making consumer games prettier

- (definite, and perhaps very important, who knows?) making AR feel more natural, less uncanny value. This is mainly value for Vision Pro, but hell, OF COURSE Apple are going to add hardware to the SoCs that benefit Vision Pro, and if every other product can also get some extra benefit, that's just a nice bonus.
(Which raises the issue of how much else on the A17 that Apple did not mention is mainly there for Vision Pro; and getting that right was, in fact, the single highest priority of the A17 team...)

- (very tentative...) the hardware required for ray tracing MAY possibly be useful for other GPU tasks (I've suggested this in the context of walking large pointer-based data structures) in a way that's of value to tasks apparently totally unrelated to RT. This may be present on day one; it may be a goal Apple is aware of, but was unable to fit into this year's design; or it may be a crazy idea that will never make sense!
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.