3D Rendering on Apple Silicon, CPU&GPU

Homy · Sep 2, 2024

mr_roboto said:
Are you sure that was actually written by a human? It's very repetitive and meanders all over the place, reeks of LLM generated slop.

It gives no evidence that CUDA is better for simple scenes, it just asserts that it is. I personally doubt that claim a lot. CUDA is a GPGPU API targeted at utilizing compute shaders. As far as I know it doesn't take advantage of raytracing cores at all, so if you're using Blender on a NVidia GPU which has RT, you should probably always use OptiX.

The blog is written by ”Sushith Balu” based in Kerala India.

Skärmavbild 2024-09-03 kl. 02.40.57.png

Here is his LinkedIn. Here is their Facebook. They’re on Quora too.

However the blog post appears to be based on an article by the company iRender based in Singapore and Vietnam. At the same time the blog is older than the company article.

They both say ”However, this may depend on the specific hardware configuration and the rendering engine being used”.

Again I didn’t say CUDA uses RT and you’re welcome to add your own links and sources if you find some information about the subject. The discussion was about whether CUDA or OptiX was used in the user test above. My comment was just a side note about the possible speed difference. OptiX is the way to go for fastest rendering with RT cores but in some other cases it’s still better to use CUDA.

Nvidia gives some explanation but it’s not about Blender but application programming: "CUDA launches allow use of shared memory and warp/block intrinsics, where OptiX launches require a single-threaded programming model. So if you want to do any of the kinds of fancy thread synchronization that CUDA allows, then using a CUDA launch would be preferable to using an OptiX launch."

Skärmavbild 2024-09-03 kl. 03.28.10.png

mr_roboto · Sep 2, 2024

Homy said:
The blog is written by ”Sushith Balu” based in Kerala India.
View attachment 2412118

Here is his LinkedIn. Here is their Facebook. They’re on Quora too.

I hadn't looked that up, but what made me think LLM is that I clicked through to another of his other blog posts that was about how to do a specific task in Blender, and it wasn't the same writing style at all. It was short and focused and clear. Seemed like the writing of an actual human who knew what they were talking about. By contrast, the post you linked doesn't.

I don't expect a 3D artist to know much about programming GPUs, or vice versa - very different skill sets! So what I'm suspicious of here is that maybe when he decided to blog about a topic he didn't feel confident about, he asked a chatbot to write that post for him.

Homy said:
Again I didn’t say Cuda uses RT and you’re welcome to add your own links and sources if you find some information about the subject.

I'm not saying you said that, and for clarity I'm not really engaging with the discussion you were having earlier. I'm only saying that a specific claim - that sometimes CUDA is faster at raytracing than OptiX - seems highly questionable. Take that for what you will.

Homy said:
Nvidia gives some explanation: "CUDA launches allow use of shared memory and warp/block intrinsics, where OptiX launches require a single-threaded programming model. So if you want to do any of the kinds of fancy thread synchronization that CUDA allows, then using a CUDA launch would be preferable to using an OptiX launch."

View attachment 2412121

I think you're reading too much into that response. David is telling the person asking the question that when they use OptiX to do things which aren't raytracing (key phrase: "never calls optixTrace()"), they need to keep OptiX's limitations in mind, and consider switching to CUDA for those tasks.

I'm no expert in this but from what I can tell, OptiX isn't purely for raytracing. It has some CUDA-style GPGPU on the side, presumably so that you can integrate other types of computations into your raytracer. Its level of GPGPU support isn't a full replacement for CUDA because of those limitations with respect to thread synchronization, higher startup overhead, and so forth, but in circumstances where those things don't matter it should be fine to use OptiX for GPGPU work anyways.

But the other direction doesn't make sense. I don't think CUDA can use the RT cores at all (something we all seem to be in violent agreement on), so it'll be stuck doing ray casts using software running on GPGPU compute resources.

That absolutely is something which is possible to do; after all NVidia itself demoed GPGPU raytracing using CUDA years before they shipped their first RT GPU. But the whole point of the RT cores is that they're much faster and more power efficient at that specific task than GPGPU compute ever can be, so if you're writing a raytracer and you want to maximize performance, you're going to need to use OptiX.

Homy · Sep 2, 2024

mr_roboto said:
I think you're reading too much into that response. David is telling the person asking the question that when they use OptiX to do things which aren't raytracing (key phrase: "never calls optixTrace()"), they need to keep OptiX's limitations in mind, and consider switching to CUDA for those tasks.

I'm no expert in this but from what I can tell, OptiX isn't purely for raytracing. It has some CUDA-style GPGPU on the side, presumably so that you can integrate other types of computations into your raytracer. Its level of GPGPU support isn't a full replacement for CUDA because of those limitations with respect to thread synchronization, higher startup overhead, and so forth, but in circumstances where those things don't matter it should be fine to use OptiX for GPGPU work anyways.

But the other direction doesn't make sense. I don't think CUDA can use the RT cores at all (something we all seem to be in violent agreement on), so it'll be stuck doing ray casts using software running on GPGPU compute resources.

That absolutely is something which is possible to do; after all NVidia itself demoed GPGPU raytracing using CUDA years before they shipped their first RT GPU. But the whole point of the RT cores is that they're much faster and more power efficient at that specific task than GPGPU compute ever can be, so if you're writing a raytracer and you want to maximize performance, you're going to need to use OptiX.

I'm no expert either and pointed out that they weren't talking about Blender but single-threaded programming and multi-threaded synchronization which can affect performance if done wrong as I understand. It was just an example. To be fair he didn't say CUDA is sometimes faster than OptiX at ray tracing but at "rendering simpler scenes or materials". He does say "OptiX is faster for complex scenes with reflections and refractions" which are used in ray tracing. We both agree that CUDA can't use RT.

Yes, RT cores are really fast as we've seen with MetalRT in the new M3 Macs too.

komuh · Sep 3, 2024

diamond.g said:
I snipped a bunch. Has Nvidia allowed Cuda to run on the RT cores now? I thought that was the whole point to Optix.

Homy said:
I'm no expert either and pointed out that they weren't talking about Blender but single-threaded programming and multi-threaded synchronization which can affect performance if done wrong as I understand. It was just an example. To be fair he didn't say CUDA is sometimes faster than OptiX at ray tracing but at "rendering simpler scenes or materials". He does say "OptiX is faster for complex scenes with reflections and refractions" which are used in ray tracing. We both agree that CUDA can't use RT.

Yes, RT cores are really fast as we've seen with MetalRT in the new M3 Macs too.

CUDA (CUBLAs) can run on RT* (tensor) cores for matmul for a long time at least after 3000 series but i'm pretty sure it was implemented somewhere closer to 2000 series release.

crazy dave · Sep 3, 2024

komuh said:
CUDA (CUBLAs) can run on RT* (tensor) cores for matmul for a long time at least after 3000 series but i'm pretty sure it was implemented somewhere closer to 2000 series release.

RT and tensor cores are separate pieces of hardware. Nvidia GPUs can use the tensor cores during ray tracing for AI-based denoising in OptiX but the RT and tensor cores have different functions and the CUDA API cannot access RT cores - only the OptiX API can. Tensor cores are for mixed precision matrix multiplication, while RT cores are for accelerating the traversal and bundling of rays to determine intersections with objects (more on that below). That's why if you were to explore the spec page for an Nvidia GPU, they have different listings for how many RT and how many Tensor cores a GPU has:

NVIDIA GeForce RTX 4080 Specs

NVIDIA AD103, 2505 MHz, 9728 Cores, 304 TMUs, 112 ROPs, 16384 MB GDDR6X, 1400 MHz, 256 bit

www.techpowerup.com

To give another, more in-depth, example: Apple's M3 and M4 GPUs have ray tracing cores but not MatMul cores (though of course Apple has the NPU on-SOC, but that's not quite the same as in-GPU). While there are differences, conceptually Apple's RT cores are similar enough in the broad strokes to Nvidia's. As such, I've linked Apple's developer video below so that you can see the description the function RT cores are meant to accelerate.

Explore GPU advancements in M3 and A17 Pro - Tech Talks - Videos - Apple Developer

Learn how Dynamic Caching, the next-generation shader core, hardware-accelerated ray tracing, and hardware-accelerated mesh shading of...

developer.apple.com

Ray tracing starts around 17 minutes, 42 seconds.

mi7chy · Sep 3, 2024

How is CUDA better for simpler scenes and what is considered a simple scene? BMW is a simple scene but it takes twice as long to render with CUDA vs OptiX.

For those with 4090, what is your min power limit and min power limit rendering times vs default for barbershop, classroom and BMW scenes?

Here's min power limit for 4080 Super:

Code:

nvidia-smi -q -d POWER

==============NVSMI LOG==============

Timestamp                                 : Tue Sep  3 01:02:12 2024
Driver Version                            : 560.94
CUDA Version                              : 12.6

Attached GPUs                             : 1
GPU 00000000:01:00.0
    GPU Power Readings
        Power Draw                        : 11.76 W
        Current Power Limit               : 150.00 W
        Requested Power Limit             : 150.00 W
        Default Power Limit               : 320.00 W
        Min Power Limit                   : 150.00 W
        Max Power Limit                   : 352.00 W
    Power Samples
        Duration                          : 9.15 sec
        Number of Samples                 : 119
        Max                               : 51.26 W
        Min                               : 10.72 W
        Avg                               : 13.84 W
    GPU Memory Power Readings
        Power Draw                        : N/A
    Module Power Readings
        Power Draw                        : N/A
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A

To set power limit with administrative access (sudo on Linux or run cmd as administrator on Windows):

Code:

nvidia-smi -pl 150
Power limit for GPU 00000000:01:00.0 was set to 150.00 W from 320.00 W.
All done.

iPadified · Sep 7, 2024

I see the benchmarkers have shown up again.

Nothing new:

Want the fastest? Buy an NVIDIA rig.

Want something pleasant to work with?
Buy an ultra.

Xiao_Xi · Sep 9, 2024

It seems that Apple is developing Open Subdivision support for Metal.

Apple is working on OpenSubDiv support. This is an proof of concept and its goal is to figure out what needs to be done.

2024-09-09 Viewport/EEVEE Module Meeting

Practical Info This is a weekly video chat meeting for planning and discussion of Blender Viewport/EEVEE module development. Any contributor (developer, UI/UX designer, writer, …) working on Viewport/EEVEE in Blender is welcome to join. For users and other interested parties, we ask to read...

devtalk.blender.org

Could other programs besides Blender take advantage of it?

jujoje · Sep 9, 2024

Xiao_Xi said:
It seems that Apple is developing Open Subdivision support for Metal.

2024-09-09 Viewport/EEVEE Module Meeting

Practical Info This is a weekly video chat meeting for planning and discussion of Blender Viewport/EEVEE module development. Any contributor (developer, UI/UX designer, writer, …) working on Viewport/EEVEE in Blender is welcome to join. For users and other interested parties, we ask to read...

devtalk.blender.org

Could other programs besides Blender take advantage of it?

Preview/Quicklook/AR View (for USD and the Storm delegate). It already supports subdivisions tessellation, but presumably not Open Subdivision yet, so this will give it better (and more widely adopted) subdivision support.

singhs.apps · Sep 9, 2024

mi7chy said:

How is CUDA better for simpler scenes and what is considered a simple scene? BMW is a simple scene but it takes twice as long to render with CUDA vs OptiX.

For those with 4090, what is your min power limit and min power limit rendering times vs default for barbershop, classroom and BMW scenes?

Here's min power limit for 4080 Super:

Code:

nvidia-smi -q -d POWER

==============NVSMI LOG==============

Timestamp                                 : Tue Sep  3 01:02:12 2024
Driver Version                            : 560.94
CUDA Version                              : 12.6

Attached GPUs                             : 1
GPU 00000000:01:00.0
    GPU Power Readings
        Power Draw                        : 11.76 W
        Current Power Limit               : 150.00 W
        Requested Power Limit             : 150.00 W
        Default Power Limit               : 320.00 W
        Min Power Limit                   : 150.00 W
        Max Power Limit                   : 352.00 W
    Power Samples
        Duration                          : 9.15 sec
        Number of Samples                 : 119
        Max                               : 51.26 W
        Min                               : 10.72 W
        Avg                               : 13.84 W
    GPU Memory Power Readings
        Power Draw                        : N/A
    Module Power Readings
        Power Draw                        : N/A
        Current Power Limit               : N/A
        Requested Power Limit             : N/A
        Default Power Limit               : N/A
        Min Power Limit                   : N/A
        Max Power Limit                   : N/A

To set power limit with administrative access (sudo on Linux or run cmd as administrator on Windows):

Code:

nvidia-smi -pl 150
Power limit for GPU 00000000:01:00.0 was set to 150.00 W from 320.00 W.
All done.

This looks neat.
So add the code as is in the CMD window for an Nvidia GPU in Windows?

mi7chy · Sep 9, 2024

singhs.apps said:
This looks neat.
So add the code as is in the CMD window for an Nvidia GPU in Windows?

'nvidia-smi' is a command-line utility to control power limit that's part of the minimal driver install and works in Windows cmd or powershell. Once the sweet spot for performance per watt is discovered for your GPU model it can saved to a .bat file then launched with Task Scheduler on Windows startup.

singhs.apps · Sep 9, 2024

mi7chy said:
'nvidia-smi' is a command-line utility to control power limit that's part of the minimal driver install and works in Windows cmd or powershell. Once the sweet spot for performance per watt is discovered for your GPU model it can saved to a .bat file then launched with Task Scheduler on Windows startup.

I see...everytime I boot into Windows, I have to execute the bat file? I'll look into it.

Is there a chart somewhere that lists the ideal powerdraw for Nvidia GPUs?

Thanks

mi7chy · Sep 11, 2024

singhs.apps said:
I see...everytime I boot into Windows, I have to execute the bat file? I'll look into it.

Is there a chart somewhere that lists the ideal powerdraw for Nvidia GPUs?

Thanks

Automatically set GPU power limit on Windows startup with batch file launched through Task Scheduler.

Sweet spot is usually <70% power limit but varies with model and silicon lottery so best to go through discovery.

Xiao_Xi · Sep 16, 2024

Xiao_Xi said:
It seems that Apple is developing Open Subdivision support for Metal.

The first implementation doesn't look very good.

Experimental Metal version for open subdiv. The initial implementation showed an order of magnitude slower performance as each vertex is evaluated in its own GPU submission. Currently checking if we can push the vertex loop closer to the actual submission so more work can be shared. There are still options to move to a totally different implementation where we do a batch based approach. This needs more research as Blender does more than a regular subdivision and that might be limited on user side.

2024-09-16 Viewport & EEVEE Module Meeting

Practical Info This is a weekly video chat meeting for planning and discussion of Blender Viewport & EEVEE module development. Any contributor (developer, UI/UX designer, writer, …) working on Viewport & EEVEE in Blender is welcome to join. For users and other interested parties, we ask to...

devtalk.blender.org

hifimac · Oct 30, 2024

What's everyone's thoughts on the new M4 Max? I have a M1 Max and the IPR in Blender is a little too pokey for me to use it over my 3090 PC. Would the M4 Max be better suited? They claim it is 1.9x faster than the M1 Max.

sunny5 · Oct 30, 2024

hifimac said:
What's everyone's thoughts on the new M4 Max? I have a M1 Max and the IPR in Blender is a little too pokey for me to use it over my 3090 PC. Would the M4 Max be better suited? They claim it is 1.9x faster than the M1 Max.

Maybe close to mobile RTX 3080 but due to limited software, not sure if it up to competition with GPU.

komuh · Oct 30, 2024

hifimac said:
What's everyone's thoughts on the new M4 Max? I have a M1 Max and the IPR in Blender is a little too pokey for me to use it over my 3090 PC. Would the M4 Max be better suited? They claim it is 1.9x faster than the M1 Max.

It won't be even close to NV if you want upgrade wait for 5000 series, if it is something extra and you just like macOS upgrade if you feel like 1.9x performance is worth 5k USD.

leman · Oct 30, 2024

hifimac said:
What's everyone's thoughts on the new M4 Max? I have a M1 Max and the IPR in Blender is a little too pokey for me to use it over my 3090 PC. Would the M4 Max be better suited? They claim it is 1.9x faster than the M1 Max.

Should be in the ballpark of RTX 4070/RTX 3090 (both desktop versions). Depends on what you want to use it for. Larger Nvidia GPUs (4080/4090) will obviously be better, the Mac would work better on very large scenes (but if you work with that kind of stuff you probably want a completely different setup anyway).

I agree with others who advise to wait for the 5000 series and see what it delivers.

mi7chy · Oct 30, 2024

No big node jump gain so even with minor frequency and memory bandwidth increase and whatever improvements to RT, guestimating around +30% increase so M4 Max 40GPU around 3080 sounds about right.

terminator-jq · Oct 30, 2024

M4 Max doesn't look bad at all for a mobile chip. Now let's see if they do an M4 Ultra. An 80 core GPU with raytracing sounds awesome!

M4pro · Oct 30, 2024

Macs are getting a new port of Cyberpunk 2077 - so we’ll gain a new and def more fun way to make cross-platform GPU comparisons.

komuh · Oct 30, 2024

M4pro said:
Macs are getting a new port of Cyberpunk 2077 - so we’ll gain a new and def more fun way to make cross-platform GPU comparisons.

unified memory is perfect for gaming sadly macOS is just ignored by every dev and even Apple but it have potential.

M4pro · Oct 30, 2024

Apple is very spend-y lately, paying $$ to gets ports of Control and now Cyberpunk 2077 done for the Mac.

So Apple is not ignoring me 😇

Homy · Oct 30, 2024

M4pro said:
Apple is very spend-y lately, paying $$ to gets ports of Control and now Cyberpunk 2077 done for the Mac.

So Apple is not ignoring me 😇

More and more developers support AS on/off stage every year, like CD Projekt Red, 11 bit Studios, Capcom, Ubisoft, Remedy, 4A, Larian Studios, Kojima Productions, NEOWIZ, Hello Games, Iron Gate, Cyan, Teyon, Pocketpair, Dumbuster Studios, Firaxis Games, Rebellion, Supergiant Games, Grinding Gear Games, Sports Interactive, Bloober Team, Fallen Leaf, BlueTweleve Studio, Piranha Bytes, Saber Interactive, Rockfish Games, BlackMill Games, Feral/Sega/Codemasters, Nimble Giant, NetEase and many more apart from all the indie developers, but sure, "macOS is just ignored by every dev and even Apple".

We get Cyberpunk 2077: Ultimate Edition with Phantom Liberty with path tracing, frame generation, and built-in Spatial Audio coming to Mac App Store, GOG, Steam and Epic Games Store when they're ignoring us so imagine what we could get if they didn't ignore us. 😄

Just Announced — Cyberpunk 2077: Ultimate Edition Coming to Mac!

Available early next year on Macs with Apple silicon, the Ultimate Edition will launch on the Mac App Store and Steam.

www.cyberpunk.net

Let's not forget Where Winds Meet.

mi7chy · Oct 30, 2024

What's the big deal about Control? Epic have given it away for free a few times.

3D Rendering on Apple Silicon, CPU&GPU

macrumors 68030

macrumors 6502a

macrumors 68030

Suspended

macrumors 68000

Suspended

macrumors 68020

macrumors 68000

macrumors 6502

macrumors 6502a

Suspended

macrumors 6502a

Suspended

macrumors 68000

macrumors member

Suspended

Suspended

macrumors Core

Suspended

macrumors 6502a

macrumors regular

Suspended

macrumors regular

macrumors 68030

Suspended

Our Staff