Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Homy

macrumors 68030
Jan 14, 2006
2,502
2,450
Sweden
Are you sure that was actually written by a human? It's very repetitive and meanders all over the place, reeks of LLM generated slop.

It gives no evidence that CUDA is better for simple scenes, it just asserts that it is. I personally doubt that claim a lot. CUDA is a GPGPU API targeted at utilizing compute shaders. As far as I know it doesn't take advantage of raytracing cores at all, so if you're using Blender on a NVidia GPU which has RT, you should probably always use OptiX.

The blog is written by ”Sushith Balu” based in Kerala India.
Skärmavbild 2024-09-03 kl. 02.40.57.png


Here is his LinkedIn. Here is their Facebook. They’re on Quora too.

However the blog post appears to be based on an article by the company iRender based in Singapore and Vietnam. At the same time the blog is older than the company article.

They both say ”However, this may depend on the specific hardware configuration and the rendering engine being used”.

Again I didn’t say CUDA uses RT and you’re welcome to add your own links and sources if you find some information about the subject. The discussion was about whether CUDA or OptiX was used in the user test above. My comment was just a side note about the possible speed difference. OptiX is the way to go for fastest rendering with RT cores but in some other cases it’s still better to use CUDA.

Nvidia gives some explanation but it’s not about Blender but application programming: "CUDA launches allow use of shared memory and warp/block intrinsics, where OptiX launches require a single-threaded programming model. So if you want to do any of the kinds of fancy thread synchronization that CUDA allows, then using a CUDA launch would be preferable to using an OptiX launch."

Skärmavbild 2024-09-03 kl. 03.28.10.png
 
Last edited:

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,866
The blog is written by ”Sushith Balu” based in Kerala India.
View attachment 2412118

Here is his LinkedIn. Here is their Facebook. They’re on Quora too.
I hadn't looked that up, but what made me think LLM is that I clicked through to another of his other blog posts that was about how to do a specific task in Blender, and it wasn't the same writing style at all. It was short and focused and clear. Seemed like the writing of an actual human who knew what they were talking about. By contrast, the post you linked doesn't.

I don't expect a 3D artist to know much about programming GPUs, or vice versa - very different skill sets! So what I'm suspicious of here is that maybe when he decided to blog about a topic he didn't feel confident about, he asked a chatbot to write that post for him.

Again I didn’t say Cuda uses RT and you’re welcome to add your own links and sources if you find some information about the subject.
I'm not saying you said that, and for clarity I'm not really engaging with the discussion you were having earlier. I'm only saying that a specific claim - that sometimes CUDA is faster at raytracing than OptiX - seems highly questionable. Take that for what you will.

Nvidia gives some explanation: "CUDA launches allow use of shared memory and warp/block intrinsics, where OptiX launches require a single-threaded programming model. So if you want to do any of the kinds of fancy thread synchronization that CUDA allows, then using a CUDA launch would be preferable to using an OptiX launch."

View attachment 2412121
I think you're reading too much into that response. David is telling the person asking the question that when they use OptiX to do things which aren't raytracing (key phrase: "never calls optixTrace()"), they need to keep OptiX's limitations in mind, and consider switching to CUDA for those tasks.

I'm no expert in this but from what I can tell, OptiX isn't purely for raytracing. It has some CUDA-style GPGPU on the side, presumably so that you can integrate other types of computations into your raytracer. Its level of GPGPU support isn't a full replacement for CUDA because of those limitations with respect to thread synchronization, higher startup overhead, and so forth, but in circumstances where those things don't matter it should be fine to use OptiX for GPGPU work anyways.

But the other direction doesn't make sense. I don't think CUDA can use the RT cores at all (something we all seem to be in violent agreement on), so it'll be stuck doing ray casts using software running on GPGPU compute resources.

That absolutely is something which is possible to do; after all NVidia itself demoed GPGPU raytracing using CUDA years before they shipped their first RT GPU. But the whole point of the RT cores is that they're much faster and more power efficient at that specific task than GPGPU compute ever can be, so if you're writing a raytracer and you want to maximize performance, you're going to need to use OptiX.
 

Homy

macrumors 68030
Jan 14, 2006
2,502
2,450
Sweden
I think you're reading too much into that response. David is telling the person asking the question that when they use OptiX to do things which aren't raytracing (key phrase: "never calls optixTrace()"), they need to keep OptiX's limitations in mind, and consider switching to CUDA for those tasks.

I'm no expert in this but from what I can tell, OptiX isn't purely for raytracing. It has some CUDA-style GPGPU on the side, presumably so that you can integrate other types of computations into your raytracer. Its level of GPGPU support isn't a full replacement for CUDA because of those limitations with respect to thread synchronization, higher startup overhead, and so forth, but in circumstances where those things don't matter it should be fine to use OptiX for GPGPU work anyways.

But the other direction doesn't make sense. I don't think CUDA can use the RT cores at all (something we all seem to be in violent agreement on), so it'll be stuck doing ray casts using software running on GPGPU compute resources.

That absolutely is something which is possible to do; after all NVidia itself demoed GPGPU raytracing using CUDA years before they shipped their first RT GPU. But the whole point of the RT cores is that they're much faster and more power efficient at that specific task than GPGPU compute ever can be, so if you're writing a raytracer and you want to maximize performance, you're going to need to use OptiX.

I'm no expert either and pointed out that they weren't talking about Blender but single-threaded programming and multi-threaded synchronization which can affect performance if done wrong as I understand. It was just an example. To be fair he didn't say CUDA is sometimes faster than OptiX at ray tracing but at "rendering simpler scenes or materials". He does say "OptiX is faster for complex scenes with reflections and refractions" which are used in ray tracing. We both agree that CUDA can't use RT.

Yes, RT cores are really fast as we've seen with MetalRT in the new M3 Macs too.
 
Last edited:

komuh

macrumors regular
May 13, 2023
126
113
I snipped a bunch. Has Nvidia allowed Cuda to run on the RT cores now? I thought that was the whole point to Optix.
I'm no expert either and pointed out that they weren't talking about Blender but single-threaded programming and multi-threaded synchronization which can affect performance if done wrong as I understand. It was just an example. To be fair he didn't say CUDA is sometimes faster than OptiX at ray tracing but at "rendering simpler scenes or materials". He does say "OptiX is faster for complex scenes with reflections and refractions" which are used in ray tracing. We both agree that CUDA can't use RT.

Yes, RT cores are really fast as we've seen with MetalRT in the new M3 Macs too.

CUDA (CUBLAs) can run on RT* (tensor) cores for matmul for a long time at least after 3000 series but i'm pretty sure it was implemented somewhere closer to 2000 series release.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,450
1,220
CUDA (CUBLAs) can run on RT* (tensor) cores for matmul for a long time at least after 3000 series but i'm pretty sure it was implemented somewhere closer to 2000 series release.
RT and tensor cores are separate pieces of hardware. Nvidia GPUs can use the tensor cores during ray tracing for AI-based denoising in OptiX but the RT and tensor cores have different functions and the CUDA API cannot access RT cores - only the OptiX API can. Tensor cores are for mixed precision matrix multiplication, while RT cores are for accelerating the traversal and bundling of rays to determine intersections with objects (more on that below). That's why if you were to explore the spec page for an Nvidia GPU, they have different listings for how many RT and how many Tensor cores a GPU has:


To give another, more in-depth, example: Apple's M3 and M4 GPUs have ray tracing cores but not MatMul cores (though of course Apple has the NPU on-SOC, but that's not quite the same as in-GPU). While there are differences, conceptually Apple's RT cores are similar enough in the broad strokes to Nvidia's. As such, I've linked Apple's developer video below so that you can see the description the function RT cores are meant to accelerate.


Ray tracing starts around 17 minutes, 42 seconds.
 
Last edited:
  • Like
Reactions: Homy

mi7chy

macrumors G4
Oct 24, 2014
10,619
11,292
How is CUDA better for simpler scenes and what is considered a simple scene? BMW is a simple scene but it takes twice as long to render with CUDA vs OptiX.

For those with 4090, what is your min power limit and min power limit rendering times vs default for barbershop, classroom and BMW scenes?

Here's min power limit for 4080 Super:

Code:
nvidia-smi -q -d POWER

==============NVSMI LOG==============

Timestamp                                 : Tue Sep  3 01:02:12 2024
Driver Version                            : 560.94
CUDA Version                              : 12.6

Attached GPUs                             : 1
GPU 00000000:01:00.0
    GPU Power Readings
        Power Draw                        : 11.76 W
        Current Power Limit               : 150.00 W
        Requested Power Limit             : 150.00 W
        Default Power Limit               : 320.00 W
        Min Power Limit                   : 150.00 W
        Max Power Limit                   : 352.00 W
    Power Samples
        Duration                          : 9.15 sec
        Number of Samples                 : 119
        Max                               : 51.26 W
        Min                               : 10.72 W
        Avg                               : 13.84 W
    GPU Memory Power Readings
        Power Draw                        : N/A
    Module Power Readings
        Power Draw                        : N/A
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A

To set power limit with administrative access (sudo on Linux or run cmd as administrator on Windows):

Code:
nvidia-smi -pl 150
Power limit for GPU 00000000:01:00.0 was set to 150.00 W from 320.00 W.
All done.
 
Last edited:

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
It seems that Apple is developing Open Subdivision support for Metal.
Apple is working on OpenSubDiv support. This is an proof of concept and its goal is to figure out what needs to be done.
Could other programs besides Blender take advantage of it?
 
  • Like
Reactions: Lone Deranger

jujoje

macrumors regular
May 17, 2009
247
288
It seems that Apple is developing Open Subdivision support for Metal.

Could other programs besides Blender take advantage of it?
Preview/Quicklook/AR View (for USD and the Storm delegate). It already supports subdivisions tessellation, but presumably not Open Subdivision yet, so this will give it better (and more widely adopted) subdivision support.
 
  • Like
Reactions: Xiao_Xi

singhs.apps

macrumors 6502a
Oct 27, 2016
660
400
How is CUDA better for simpler scenes and what is considered a simple scene? BMW is a simple scene but it takes twice as long to render with CUDA vs OptiX.

For those with 4090, what is your min power limit and min power limit rendering times vs default for barbershop, classroom and BMW scenes?

Here's min power limit for 4080 Super:

Code:
nvidia-smi -q -d POWER

==============NVSMI LOG==============

Timestamp                                 : Tue Sep  3 01:02:12 2024
Driver Version                            : 560.94
CUDA Version                              : 12.6

Attached GPUs                             : 1
GPU 00000000:01:00.0
    GPU Power Readings
        Power Draw                        : 11.76 W
        Current Power Limit               : 150.00 W
        Requested Power Limit             : 150.00 W
        Default Power Limit               : 320.00 W
        Min Power Limit                   : 150.00 W
        Max Power Limit                   : 352.00 W
    Power Samples
        Duration                          : 9.15 sec
        Number of Samples                 : 119
        Max                               : 51.26 W
        Min                               : 10.72 W
        Avg                               : 13.84 W
    GPU Memory Power Readings
        Power Draw                        : N/A
    Module Power Readings
        Power Draw                        : N/A
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A

To set power limit with administrative access (sudo on Linux or run cmd as administrator on Windows):

Code:
nvidia-smi -pl 150
Power limit for GPU 00000000:01:00.0 was set to 150.00 W from 320.00 W.
All done.
This looks neat.
So add the code as is in the CMD window for an Nvidia GPU in Windows?
 

mi7chy

macrumors G4
Oct 24, 2014
10,619
11,292
This looks neat.
So add the code as is in the CMD window for an Nvidia GPU in Windows?

'nvidia-smi' is a command-line utility to control power limit that's part of the minimal driver install and works in Windows cmd or powershell. Once the sweet spot for performance per watt is discovered for your GPU model it can saved to a .bat file then launched with Task Scheduler on Windows startup.
 

singhs.apps

macrumors 6502a
Oct 27, 2016
660
400
'nvidia-smi' is a command-line utility to control power limit that's part of the minimal driver install and works in Windows cmd or powershell. Once the sweet spot for performance per watt is discovered for your GPU model it can saved to a .bat file then launched with Task Scheduler on Windows startup.
I see...everytime I boot into Windows, I have to execute the bat file? I'll look into it.

Is there a chart somewhere that lists the ideal powerdraw for Nvidia GPUs?

Thanks
 

mi7chy

macrumors G4
Oct 24, 2014
10,619
11,292
I see...everytime I boot into Windows, I have to execute the bat file? I'll look into it.

Is there a chart somewhere that lists the ideal powerdraw for Nvidia GPUs?

Thanks

Automatically set GPU power limit on Windows startup with batch file launched through Task Scheduler.

Sweet spot is usually <70% power limit but varies with model and silicon lottery so best to go through discovery.
 
  • Like
Reactions: singhs.apps

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
It seems that Apple is developing Open Subdivision support for Metal.
The first implementation doesn't look very good.
Experimental Metal version for open subdiv. The initial implementation showed an order of magnitude slower performance as each vertex is evaluated in its own GPU submission. Currently checking if we can push the vertex loop closer to the actual submission so more work can be shared. There are still options to move to a totally different implementation where we do a batch based approach. This needs more research as Blender does more than a regular subdivision and that might be limited on user side.
 

hifimac

macrumors member
Mar 28, 2013
64
40
What's everyone's thoughts on the new M4 Max? I have a M1 Max and the IPR in Blender is a little too pokey for me to use it over my 3090 PC. Would the M4 Max be better suited? They claim it is 1.9x faster than the M1 Max.
 

sunny5

macrumors 68000
Jun 11, 2021
1,835
1,706
What's everyone's thoughts on the new M4 Max? I have a M1 Max and the IPR in Blender is a little too pokey for me to use it over my 3090 PC. Would the M4 Max be better suited? They claim it is 1.9x faster than the M1 Max.
Maybe close to mobile RTX 3080 but due to limited software, not sure if it up to competition with GPU.
 
  • Like
Reactions: komuh

komuh

macrumors regular
May 13, 2023
126
113
What's everyone's thoughts on the new M4 Max? I have a M1 Max and the IPR in Blender is a little too pokey for me to use it over my 3090 PC. Would the M4 Max be better suited? They claim it is 1.9x faster than the M1 Max.
It won't be even close to NV if you want upgrade wait for 5000 series, if it is something extra and you just like macOS upgrade if you feel like 1.9x performance is worth 5k USD.
 

leman

macrumors Core
Oct 14, 2008
19,516
19,664
What's everyone's thoughts on the new M4 Max? I have a M1 Max and the IPR in Blender is a little too pokey for me to use it over my 3090 PC. Would the M4 Max be better suited? They claim it is 1.9x faster than the M1 Max.

Should be in the ballpark of RTX 4070/RTX 3090 (both desktop versions). Depends on what you want to use it for. Larger Nvidia GPUs (4080/4090) will obviously be better, the Mac would work better on very large scenes (but if you work with that kind of stuff you probably want a completely different setup anyway).

I agree with others who advise to wait for the 5000 series and see what it delivers.
 

mi7chy

macrumors G4
Oct 24, 2014
10,619
11,292
No big node jump gain so even with minor frequency and memory bandwidth increase and whatever improvements to RT, guestimating around +30% increase so M4 Max 40GPU around 3080 sounds about right.
 

terminator-jq

macrumors 6502a
Nov 25, 2012
719
1,506
M4 Max doesn't look bad at all for a mobile chip. Now let's see if they do an M4 Ultra. An 80 core GPU with raytracing sounds awesome!
 
  • Like
Reactions: M4pro

komuh

macrumors regular
May 13, 2023
126
113
Macs are getting a new port of Cyberpunk 2077 - so we’ll gain a new and def more fun way to make cross-platform GPU comparisons.
unified memory is perfect for gaming sadly macOS is just ignored by every dev and even Apple but it have potential.
 
  • Like
Reactions: M4pro

M4pro

macrumors member
May 15, 2024
67
109
Apple is very spend-y lately, paying $$ to gets ports of Control and now Cyberpunk 2077 done for the Mac.

So Apple is not ignoring me 😇
 
  • Haha
Reactions: name99

Homy

macrumors 68030
Jan 14, 2006
2,502
2,450
Sweden
Apple is very spend-y lately, paying $$ to gets ports of Control and now Cyberpunk 2077 done for the Mac.

So Apple is not ignoring me 😇

More and more developers support AS on/off stage every year, like CD Projekt Red, 11 bit Studios, Capcom, Ubisoft, Remedy, 4A, Larian Studios, Kojima Productions, NEOWIZ, Hello Games, Iron Gate, Cyan, Teyon, Pocketpair, Dumbuster Studios, Firaxis Games, Rebellion, Supergiant Games, Grinding Gear Games, Sports Interactive, Bloober Team, Fallen Leaf, BlueTweleve Studio, Piranha Bytes, Saber Interactive, Rockfish Games, BlackMill Games, Feral/Sega/Codemasters, Nimble Giant, NetEase and many more apart from all the indie developers, but sure, "macOS is just ignored by every dev and even Apple".

We get Cyberpunk 2077: Ultimate Edition with Phantom Liberty with path tracing, frame generation, and built-in Spatial Audio coming to Mac App Store, GOG, Steam and Epic Games Store when they're ignoring us so imagine what we could get if they didn't ignore us. 😄


Let's not forget Where Winds Meet.

 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.