3D Rendering on Apple Silicon, CPU&GPU

leman · Oct 14, 2022

Xiao_Xi said:
CUDA cores are slower than RT cores, so people don't use them.

I think you misunderstand what the "CUDA core" is and what the "RT core" is. A "CUDA core" is the GPU compute shader core, the hardware capable of executing shader programs (be it for graphical or general purpose computation tasks). An "RT core" is an auxiliary hardware unit that accelerates the task of finding which triangle is hit by a ray. But "RT cores" cannot run programs, all they can do is take a ray and a list of triangles and say "oh, this triangle is hit", but they can do it really fast, much faster than if you wrote a program for it using the general purpose shader cores. You still need a program that generates the rays and decides what to do with the hit information (e.g. shade the pixel based on the light bounces etc.). So an RT accelerated GPU program runs on "CUDA cores" and uses "RT cores" to make raytracing fast.

diamond.g said:
What is the point of the optix renderer if the cuda renderer also uses the RT hardware?

It's been a while that I last worked with CUDA and I never touched Optix but from what I understand CUDA itself does not have any APIs to access the RT hardware and Optix is basically a framework that has access to RT hardware and uses CUDA to specify what to do with the RT results. It's a little bit confusing, but one (a bit naive and not entirely correct) way is to consider Optix as CUDA with hardware raytracing.

Xiao_Xi · Oct 14, 2022

It looks like there are two pipelines, one for rasterization and one for ray tracing.

Are CUDA cores used for rasterization and RTX cores for ray tracing?

leman · Oct 14, 2022

Xiao_Xi said:
Are CUDA cores used for rasterization and RTX cores for ray tracing?

No. In the graphs you posted the white nodes represent programmable stages, the grey rhombi represent fixed-function stages and the partially grey-filled nodes represent fixed-function partially configurable stages*. CUDA cores are what runs the programmable stages (shading and ray generation) while the RT cores run the "traversal & intersection" stage. Rasterisation is a fixed function hardware stage that receives primitive data and generates pixel data to be shaded.

*I don't necessarily agree with that kind of representation but it's ok as a simplification.

dmr727 · Oct 14, 2022

Xiao_Xi said:
It looks like there are two pipelines, one for rasterization and one for ray tracing.
View attachment 2094849
Are CUDA cores used for rasterization and RTX cores for ray tracing?

If you look at the video where he talks about this slide (thanks for posting it btw, it was interesting to me!) - only the middle of the Ray Tracing pipeline is handled by RTX (the green box):

Screen Shot 2022-10-14 at 12.46.01 PM.jpg

sirio76 · Oct 14, 2022

diamond.g said:
What is the point of the optix renderer if the cuda renderer also uses the RT hardware?

Vray for example have two GPU mode, CUDA and RTX. The CUDA engine will use only the traditional GPU cores, the RTX will use both the standard core and the RT core to accelerate some part of the render.

diamond.g · Oct 14, 2022

sirio76 said:
Vray for example have two GPU mode, CUDA and RTX. The CUDA engine will use only the traditional GPU cores, the RTX will use both the standard core and the RT core to accelerate some part of the render.

Why wouldn't they just have one renderer/API like Apple?

tomO2013 · Oct 14, 2022

Just for ****s and giggles…. here is the PowerVR Photon approach to ray tracing. I’m truly excited to see this IP in the wild in working Apple Silicon.

PowerVR Photon - Imagination

The most advanced ray tracing architecture in the world, enabling desktop-level ray tracing to mobile.

www.imaginationtech.com

I like their overview of CXT and photon here..

I could see a future M2 Ultra/M3 Ultra including their L4 ray tracing solution - it would allow Apple to compete with nVidia on workloads that benefit from RT cores, but to do so at massively lower power consumption figures.

I have to say that I’m somewhat disappointed by the approach taken with the 4090. Personally I’d have taken a cheaper card with 40-50% performance improvement over the 3090 but at significantly lower power usage and lower heat/noise when under load. Noise output from my workstations is a big thing for me. Even when PC gaming - without getting into LN or custom liquid cooling solutions - I hate when graphics cards get loud and noise under load.

innerproduct · Oct 14, 2022

Cuda is the general api for doing compute work on nvida GPUs. If I remember correctly you may not access the RT cores directly from Cuda. Optix is a specific ray tracing framework that is higher level and easy for developers to use. All gpu based nvidia ray tracers used to use Cuda but that required deep understanding of solving things like bvhs etc. witg optix, that is taken care of in hardware optimal ways for nvidias different chips. So, all in all, in order to use nvidas latest architectures you need to use Optix. There is absolutely no sane reason to just use plain Cuda anymore for most raytracing. It would be like writing your own video codecs instead of using the built in hw that is designed for that specific task. In a way metalRT is similar to Optix but there is just no hw backend at the moment.

mi7chy · Oct 14, 2022

tomO2013 said:
I have to say that I’m somewhat disappointed by the approach taken with the 4090. Personally I’d have taken a cheaper card with 40-50% performance improvement over the 3090 but at significantly lower power usage and lower heat/noise when under load. Noise output from my workstations is a big thing for me. Even when PC gaming - without getting into LN or custom liquid cooling solutions - I hate when graphics cards get loud and noise under load.

Not rocket science. Just scale the 4090 power consumption at 60% to achieve 85% performance or 50% power for 75% performance with silent operation.

tomO2013 · Oct 14, 2022

mi7chy said:
Not rocket science. Just scale the 4090 power consumption at 60% to achieve 85% performance or 50% power for 75% performance with silent operation.

For most users that is not an option or read differently, it’s not an option for those who work for big companies who will get their nvidia 4090 as part of a developer/creator pre-configured workstation from Dell, HP, Lenovo etc…

From a business perspective - not a tinkerer/hobbyist who builds his/her/their own PC - do you think that most people / IT departments / graphics departments allow their staff to tinker with power profiles on their GPUs other than stock or do you think that there is an expectation that they run the hardware that they buy from such vendors in a stock profile so as not to impact the vendor provided warranty?

The ‘rocket science’ that you present is simply not pragmatic for most large IT, graphics design organizations where rubber meets the road. Unfortunately as is often the case, a choice in hardware spec is distilled to a conversation along the lines of…
“ here are the corporately provided and approved hardware configurations that we can offer staff. We have a contract negotiated with our favorite vendor (insert Dell, HP or Lenovo). Please pick option A, B or C and we’ll get your hardware ordered“
”well I was watching Jayz two cents/Linus Tech tips/<insert reviewsite> and we can get 4090 and custom build and underclock/overclock so that it will be less noisey”
”let me interrupt you…. pick either A, B or C. You’ll be using it stock. Hardware is managed to corporate standards by IT operations team. Have a nice day! “.

diamond.g · Oct 14, 2022

tomO2013 said:
For most users that is not an option or read differently, it’s not an option for those who work for big companies who will get their nvidia 4090 as part of a developer/creator pre-configured workstation from Dell, HP, Lenovo etc…

From a business perspective - not a tinkerer/hobbyist who builds his/her/their own PC - do you think that most people / IT departments / graphics departments allow their staff to tinker with power profiles on their GPUs other than stock or do you think that there is an expectation that they run the hardware that they buy from such vendors in a stock profile so as not to impact the vendor provided warranty?

The ‘rocket science’ that you present is simply not pragmatic for most large IT, graphics design organizations where rubber meets the road. Unfortunately as is often the case, a choice in hardware spec is distilled to a conversation along the lines of…
“ here are the corporately provided and approved hardware configurations that we can offer staff. We have a contract negotiated with our favorite vendor (insert Dell, HP or Lenovo). Please pick option A, B or C and we’ll get your hardware ordered“
”well I was watching Jayz two cents/Linus Tech tips/<insert reviewsite> and we can get 4090 and custom build and underclock/overclock so that it will be less noisey”
”let me interrupt you…. pick either A, B or C. You’ll be using it stock. Hardware is managed to corporate standards by IT operations team. Have a nice day! “.

Those places should be buying the "Quadro" cards anyways...

tomO2013 · Oct 14, 2022

diamond.g said:
Those places should be buying the "Quadro" cards anyways...

Actually many places and game studios, including XBox game studios, use consumer cards during the development process

It’s really not uncommon.

Also the 4090 is pitched by nVidia as a content creation card, taking over from Titan and not just a gaming card.

Puget systems for example , tout the 4090 advantages for workstation loads (rightly so) over a 3090.

In any case, I standby my point. Most folks who get their mits on a 4090 or quadro derivative in a business situation won’t be touching the card to underclock/overclock beyond default manufacturers configuration. Those who follow the add-in-board home builder are for the most part in the minority.

diamond.g · Oct 14, 2022

tomO2013 said:
Actually many places and game studios, including XBox game studios, use consumer cards during the development process It’s really not uncommon.

Also the 4090 is pitched by nVidia as a content creation card, taking over from Titan and not just a gaming card.

Puget systems for example , tout the 4090 advantages for workstation loads (rightly so) over a 3090.

In any case, I standby my point. Most folks who get their mits on a 4090 or quadro derivative in a business situation won’t be touching the card to underclock/overclock beyond default manufacturers configuration. Those who follow the add-in-board home builder are for the most part in the minority.

Yeah. Was just thinking the workstation cards pull less power than the consumer cards. It will be interesting to see how Dell HP and so on handle these 4090's this go around.

leman · Oct 14, 2022

diamond.g said:
Those places should be buying the "Quadro" cards anyways...

Why? Unless you have some very very special requirements you’d be better off with a gaming card from a reputable brand.

mi7chy · Oct 14, 2022

tomO2013 said:
For most users that is not an option or read differently, it’s not an option for those who work for big companies who will get their nvidia 4090 as part of a developer/creator pre-configured workstation from Dell, HP, Lenovo etc…

You wanted a "cheaper card" so personal use was implied. Employee time is money to companies so $1600 is cheap but any decent company has a centralized render/compute farm anyhow.

Xiao_Xi · Oct 15, 2022

Is GPU-accelerated MNEE coming to macOS in Blender?

Cycles: Enable MNEE on Metal (macOS >= 13) · ba67a383fa

This patch enables MNEE on macOS >= 13. There was an inefficiency in the calculation of spill requirements, fixed as of macOS 13. This patch also adds a temporary inlining workaround for a Metal compiler bug which causes `mnee_compute_constraint_derivatives` to behave incorrectly. Reviewed By: ...

developer.blender.org

galad · Oct 15, 2022

That's what it says.

altaic · Oct 15, 2022

Xiao_Xi said:
Is GPU-accelerated MNEE coming to macOS in Blender?

Cycles: Enable MNEE on Metal (macOS >= 13) · ba67a383fa

This patch enables MNEE on macOS >= 13. There was an inefficiency in the calculation of spill requirements, fixed as of macOS 13. This patch also adds a temporary inlining workaround for a Metal compiler bug which causes `mnee_compute_constraint_derivatives` to behave incorrectly. Reviewed By: ...

developer.blender.org

The first of the Metal 3 commits, cool! 🚀 🎉

Edit: Never mind. Premature celly, my bad.

galad · Oct 15, 2022

Not really, it's just that Apple improved their shaders compiler on macOS 13 and fixed some bugs, that commit doesn't use any of the new Metal 3 features.

altaic · Oct 15, 2022

galad said:
Not really, it's just that Apple improved their shaders compiler on macOS 13 and fixed some bugs, that commit doesn't use any of the new Metal 3 features.

Aw, you’re right. I should have looked at the diffs.

jmho · Oct 15, 2022

Metal 3 only works on Ventura which isn't out yet, so it'll probably be a while before they start putting Metal 3 features in Blender.

iPadified · Oct 15, 2022

diamond.g said:
Those places should be buying the "Quadro" cards anyways...

The price/performance ratio goes down the drain for Quadro card so the self proclaimed "Pros" at MR uses gaming cards as reference instead.

Xiao_Xi · Oct 16, 2022

Unexpected and not relevant to this thread, but shows Apple's commitment.

Blender Archive - developer.blender.org

developer.blender.org

Xiao_Xi · Oct 17, 2022

This is getting very interesting. There is an add-on to use Stable Diffusion in Blender.

Is there something similar for other software?

altaic · Oct 17, 2022

jmho said:
Metal 3 only works on Ventura which isn't out yet, so it'll probably be a while before they start putting Metal 3 features in Blender.

That commit was specifically for macOS >= 13, i.e. Ventura, hence my presumption that it had to do with Metal 3.

3D Rendering on Apple Silicon, CPU&GPU

macrumors Core

macrumors 68000

macrumors Core

macrumors G4

macrumors 6502a

macrumors G5

macrumors member

macrumors regular

Suspended

macrumors member

macrumors G5

macrumors member

macrumors G5

macrumors Core

Suspended

macrumors 68000

macrumors 6502a

macrumors 6502a

macrumors 6502a

macrumors 6502a

macrumors 6502a

macrumors 68020

macrumors 68000

macrumors 68000

macrumors 6502a

Our Staff