3D Rendering on Apple Silicon, CPU&GPU

loekf · Dec 22, 2022

Xiao_Xi said:
Vulkan is so verbose that most applications are still using OpenGL and that makes application development difficult.
View attachment 2131605
The Vulkan program still needs almost another full page. Around 15:15 from "Introduction to WebGPU - CIS 565 GPU Programming Fall 2022".

For example, Blender is adapting most compositor operations to the GPU, but those that require complex mathematical operations are more difficult to implement. Since Metal can do computer shaders, it is possible that the compositor in Blender has better support for the Apple GPU than for any PC GPU.

Real-time Compositor: Feedback and discussion

Hi everyone, I previously presented the difficulty of implementing the Glare node in the following post: While I wanted to implement a new Glare node that is more realistic, more easily controllable, and more performant and GPU friendly as required by the real time compositor, it was clear...

devtalk.blender.org

I imagine something similar will happen to other 3D programs until they adopt Vulkan or WebGPU if it eventually becomes an alternative to Vulkan.

Which 3D software has adopted Vulkan and which Metal?

Blender can use Metal (incl. M1/M2 Macs). Vulcan is still a work in progress (for Blender).

leman · Dec 22, 2022

jujoje said:
Out of curiosity is there something that metal is significantly lacking these days? From what I can recall most of the critical ones have been addressed (but totally not an area I'm particularly familiar with).

In terms of gaming and basic compute APIs, philosophical differences aside (which can be a major pain point when trying to bring these Apis to a common shared denominator), nothing specific comes to mind.

For more serious general purpose compute, CUDA certainly is ahead in some key areas. For example, CUDA uses unified virtual addresses (you can pass a pointer between the CPU and the GPU as-is), which is something that Apple curiously lacks (they have true UMA unlike Nvidia but they don't use the same virtual address mappings across CPU and GPU). This is a major pain point for porting CUDA kernel to Metal from what I hear. CUDA programs can also allocate memory and do other fancy stuff, something that Metal does not offer.

I am sure I am forgetting many important things though. Hopefully someone can correct me and/or offer more info.

jmho · Dec 22, 2022

leman said:
In terms of gaming and basic compute APIs, philosophical differences aside (which can be a major pain point when trying to bring these Apis to a common shared denominator), nothing specific comes to mind.

For more serious general purpose compute, CUDA certainly is ahead in some key areas. For example, CUDA uses unified virtual addresses (you can pass a pointer between the CPU and the GPU as-is), which is something that Apple curiously lacks (they have true UMA unlike Nvidia but they don't use the same virtual address mappings across CPU and GPU). This is a major pain point for porting CUDA kernel to Metal from what I hear. CUDA programs can also allocate memory and do other fancy stuff, something that Metal does not offer.

I am sure I am forgetting many important things though. Hopefully someone can correct me and/or offer more info.

The Metal 3 bindless stuff helps here. You can allocate a buffer via a heap and then get its gpuAddress and then do a bunch of pointer arithmetic in your compute shader instead of having to specifically bind the buffer. I think you're correct that you can't allocate new memory from a compute shader though.

It's really nice. Except if you have a bug you can now easily completely hard-lock your Mac (and then macOS will helpfully try to re-open your buggy app when you restart and then crash immediately

)

leman · Dec 22, 2022

jmho said:
The Metal 3 bindless stuff helps here. You can allocate a buffer via a heap and then get its gpuAddress and then do a bunch of pointer arithmetic in your compute shader instead of having to specifically bind the buffer. I think you're correct that you can't allocate new memory from a compute shader though.

It's really nice. Except if you have a bug you can now easily completely hard-lock your Mac (and then macOS will helpfully try to re-open your buggy app when you restart and then crash immediately )

Metal had pointers for a while and you don’t need Metal 3 to do buffer pointer arithmetic inside shaders, Metal 3 just makes sending the data to the GPU simpler.

But with CUDA you can use exactly the same addresses on the CPU and the GPU, e.g. you can write pointer values in your CPU code and copy it to the GPU as is. With Metal you have to either marshal them via GPU address API or work with offsets. Not a big deal per se but super annoying if you have a big CUDA kernel you want to port.

jmho · Dec 22, 2022

leman said:
Metal had pointers for a while and you don’t need Metal 3 to do buffer pointer arithmetic inside shaders, Metal 3 just makes sending the data to the GPU simpler.

Really? The gpuAddress param on MTLBuffer is macOS 13+ / iOS 16+ only.

How were you able to access gpuAddresses before?

I should be clearer that obviously you've always been able to do buffer point arithmetic like "I want to go 512 bytes into buffer A", but afaik it's new that you can now say "Here is buffer B and it contains the address of buffer A, and you can use this address to access buffer A (as long as you've marked it, or the heap containing it as being used)"

leman · Dec 22, 2022

jmho said:
Really? The gpuAddress param on MTLBuffer is macOS 13+ / iOS 16+ only.

How were you able to access gpuAddresses before?

Via setBuffer either on the argument encoder or the command encoder. All of these methods set the data pointer on the shader side which you can manipulate as usual. That ability has been there for years.

In that sense, Metal 3 doesn’t bring anything new. It just directly exposes the addresses/handles and lets you use them directly via memcpy instead of forcing the argument encoder on you. I suspect there were two reason for making the handles public: first, it removes an inelegant part of the API and makes it more streamlined, second, Apple has been trying to smooth out the differences between Metal and DX12.

jmho · Dec 22, 2022

I think it's more because it makes working with argument buffers and indirect command buffers a lot easier, which have a huge benefit for ray tracing - where you might want to have lots of buffers containing material parameters etc. where you don't want to have to drop out to the CPU to call setBuffer.

Much easier if you can just have buffers full of gpuAddresses and gpuResourceIds that the GPU can look up directly if it needs to find which texture is being used by object #18482 and don't want to have to bind tens of thousands of buffers and resources to your RT compute shader (or cause a sync point where you bind the necessary resources on the CPU and then run a second compute shader)

leman · Dec 22, 2022

jmho said:
I think it's more because it makes working with argument buffers and indirect command buffers a lot easier, which have a huge benefit for ray tracing - where you might want to have lots of buffers containing material parameters etc. where you don't want to have to drop out to the CPU to call setBuffer.

Much easier if you can just have buffers full of gpuAddresses and gpuResourceIds that the GPU can look up directly if it needs to find which texture is being used by object #18482 and don't want to have to bind tens of thousands of buffers and resources to your RT compute shader (or cause a sync point where you bind the necessary resources on the CPU and then run a second compute shader)

Oh, absolutely. It’s easier to use and also easier to learn as you can skip the argument encoder and think in terms of old good arrays and structs.

At the same time, it’s not like the new API enables principally new applications. You still have to write these handles to a buffer and whether you call memcpy(), use an assignment operator or call a MTLArgumentEncoder method is just a syntactic difference. Of course, the encoder method has higher overhead if you have to
set a lot of these things, but it’s not like getting the handle is without overhead either.

In the end, you get a cleaner, easier to understand API (if you are a seasoned system programmer and have a good grasp of pointers and memory layouts), but not that much else.

BTW, you can also set bindings on the GPU. That has been a huge advantage if Metal over other APIs for a while now.

P.S. There are of course more curiosities, like for example how Apple manages to use 8 bytes for all these handles where we know for a fact that texture and sampler descriptors are larger on some hardware. I believe that Apple maintains a hidden descriptor table and the handles are offsets into this table.

jmho · Dec 22, 2022

It might not enable principally new applications, but it does enable much larger applications by removing all the limitations of the old argumentEncoder stuff (like a maximum number of buffers and textures etc.) and just letting you work with memory directly.

Although we're back to the original point which was that Metal 3 lets you manage your memory more directly

I don't really know too much about CUDA though, apart from the fact that it seems more like an entire all-encompassing platform which does a ton of stuff for you like standardising addresses, so yeah it makes sense that it's a) attractive, and b) a huge pain to port to a slightly lower-level API like Metal.

iBug2 · Dec 22, 2022

jmho said:
They're not developing FOR 4090 boxes, but I guarantee they're developing ON 4090 boxes.

I think you're confusing "making a Mac port" (which lots of people are doing) with "actively developing on Mac" (which literally nobody is doing)

How can you develop a Mac app without XCode?

leman · Dec 22, 2022

jmho said:
It might not enable principally new applications, but it does enable much larger applications by removing all the limitations of the old argumentEncoder stuff (like a maximum number of buffers and textures etc.) and just letting you work with memory directly.

Na, there weren’t any limitations to begin with. What’s funny that some time ago there was a discussion on Reddit and Twitter about the alleged “500000” textures limit in Metal, and I’ve reached out to Apple for clarifications. Turned out it was just bad documentation and the limitation was always just the viable memory. What’s funny is that Apple updated the number to one million in the latest capability tables, probably to to match the DX12 limits and to remove all speculation. I prefer to think of this as my achievement

jmho said:
I don't really know too much about CUDA though, apart from the fact that it seems more like an entire all-encompassing platform which does a ton of stuff for you like standardising addresses, so yeah it makes sense that it's a) attractive, and b) a huge pain to port to a slightly lower-level API like Metal.

CUDA is just parasitic. The way how you write CPU and GPU code is super cool but then you realize that you are committed to Nvidias C++ compiler and everything gets messy. They have very smart strategists. It’s deliberately designed to hook you in and never let you out.

exoticSpice · Dec 22, 2022

Really? Then why did apple claim M1 had the fastest CPU core in the 2020 November keynote.

Apple does care about top performance but right now their GPUs are weak. The RTX 4090 is very efficient for the power it provides after all it's a very good architecture and made on TSMC custom node.

jeanlain said:
Indeed. It is becoming clear that the Apple Silicon transition has never really been about top performance, but power efficiency.

Apple has nothing to compete with the best GPUs from AMD and nVidia. They just showed results that no one else can reproduce except with useless apps like GBXBench.

exoticSpice · Dec 22, 2022

Look if Apple want Metal to gain a foothold in the 3D industry, gaming is the way to go. Make a console like Xbox or PS.

Make a mass affordable product that is value oriented and also mid range performance. Then the market adopts Metal and no Apple TV is going to to cut it. It needs be a swicth like console ie handheld or a home console.

Unity and UE5 and various 3D programs are used in game development. Apple lost the gaming boat a long time ago sadly.

jmho · Dec 22, 2022

leman said:
Na, there weren’t any limitations to begin with. What’s funny that some time ago there was a discussion on Reddit and Twitter about the alleged “500000” textures limit in Metal, and I’ve reached out to Apple for clarifications. Turned out it was just bad documentation and the limitation was always just the viable memory. What’s funny is that Apple updated the number to one million in the latest capability tables, probably to to match the DX12 limits and to remove all speculation. I prefer to think of this as my achievement

Ahh, yeah there is a difference between tier1 and tier2. Not that it matters though because Metal 3 is so much nicer that I like to pretend that argument buffer encoders don't exist anymore

Where do all the people who know Metal hang out?

Slartibart · Dec 22, 2022

iBug2 said:
How can you develop a Mac app without XCode?

for example using Qt for GUI (and other stuff), gcc and make in the terminal.

jmho · Dec 22, 2022

iBug2 said:
How can you develop a Mac app without XCode?

There is a difference between a primary and secondary platform.

Let's say you are developing a cross-platform ray-tracer and every time you compile you run a test scene. On a 4090 this test scene takes 3 seconds to render. On an M1 Ultra this test scene takes 30 seconds to render. You run this test scene 100 times per day, so we're talking 5 minutes of downtime if you develop on a PC, vs 50 minutes of downtime if you develop on the Mac.

Therefore what happens is you develop the software on the PC, and then when it's finished and working on the PC, you port it to Mac (likely using Xcode), and as such the PC version tends to be much better than the Mac version because you made all the architectural decisions for the PC.

bcortens · Dec 22, 2022

A lot of people are claiming M series feature weak GPUs. This is terminology I take some issue with. While I wish Apple would push for the high end the GPUs that they do sell aren't actually weak in the scope of the entire marketplace. They only look weak if you limit your scope of comparison to exclusively the mid to high end of the market.

The M1Pro and M1Max are fantastic in the MBP lineup, as desktop GPUs, sure they are mediocre, but that isn't their primary purpose. Sure PC laptops can throw in higher end hardware but that typically means it is more of a portable gaming/workstation rather than being a true laptop. The M1/2 air is just straight better than other laptops in it's class GFXBench, and even holds its own against higher end laptop GPUs GFXBench M2 vs 3050 Laptop.

One of the best things the M series has done is raise the performance floor on Macs to a very high level. It hasn't raised the ceiling or kept pace with the top of the line on the PC side of things but that doesn't mean it makes weak GPUs. We have been spoiled by GPU performance increases in the last few iterations as the RTX 20 series, and now 40 series really push performance to incredible heights. Apple needs to continuously execute every single year to keep pace with this pace of improvement, I don't know if Tim Cooks Apple is capable of that.

There are many other issues with the apple silicon transition, the lack of RT cores, allowing the M series to fall behind the A series by so much (The M1 came 2 months after the A14 but the M2 came out 9 months later. The M1Pro/Max were a full year behind the M1. I don't know if they are understaffed or don't have enough high quality staff but they have allowed themselves to fall behind.

What I want to see next is Apple raise the bar on RT like they did with rasterization performance. The M3(or 4) should have good enough RT performance to make basic RT backed rendering and gaming with RT viable on the MacBook Air and iPad Pro.

leman · Dec 22, 2022

jmho said:
Ahh, yeah there is a difference between tier1 and tier2. Not that it matters though because Metal 3 is so much nicer that I like to pretend that argument buffer encoders don't exist anymore

I’m not even thinking about tier 1. That’s old hardware and developing for it is a nightmare due to old buggy drivers anyway. Anything done on a Mac these days should target Apple Silicon anyway.

jmho said:
Where do all the people who know Metal hang out?

Id also like to know… it has been incredibly hard learning this stuff.

leman · Dec 22, 2022

bcortens said:
What I want to see next is Apple raise the bar on RT like they did with rasterization performance. The M3(or 4) should have good enough RT performance to make basic RT backed rendering and gaming with RT viable on the MacBook Air and iPad Pro.

Judging by their GPU patents they definitely have a next-gen GPU in the oven. Hardware RT, more efficient execution, likely more compute resources…

HiddenPaul · Dec 22, 2022

leman said:
Judging by their GPU patents they definitely have a next-gen GPU in the oven. Hardware RT, more efficient execution, likely more compute resources…

IIRC someone posted here about PowerVR RT cores being more efficient and powerful. If that happened in M3, I will be happy but will not trade my M2 Air right now but I will get the 2nd Gen RT cores.

sirio76 · Dec 22, 2022

iBug2 said:
How can you develop a Mac app without XCode?

It’s a kind of magic

That, or he is just starting from the common and wrong assumption that anyone is (or should be) using a 4090.

jmho · Dec 22, 2022

sirio76 said:
It’s a kind of magic
That, or he is just starting from the common and wrong assumption that anyone is (or should be) using a 4090.

You asked someone earlier to show you their art, so show me your code.

sunny5 · Dec 22, 2022

bcortens said:
A lot of people are claiming M series feature weak GPUs. This is terminology I take some issue with. While I wish Apple would push for the high end the GPUs that they do sell aren't actually weak in the scope of the entire marketplace. They only look weak if you limit your scope of comparison to exclusively the mid to high end of the market.

The M1Pro and M1Max are fantastic in the MBP lineup, as desktop GPUs, sure they are mediocre, but that isn't their primary purpose. Sure PC laptops can throw in higher end hardware but that typically means it is more of a portable gaming/workstation rather than being a true laptop. The M1/2 air is just straight better than other laptops in it's class GFXBench, and even holds its own against higher end laptop GPUs GFXBench M2 vs 3050 Laptop.

One of the best things the M series has done is raise the performance floor on Macs to a very high level. It hasn't raised the ceiling or kept pace with the top of the line on the PC side of things but that doesn't mean it makes weak GPUs. We have been spoiled by GPU performance increases in the last few iterations as the RTX 20 series, and now 40 series really push performance to incredible heights. Apple needs to continuously execute every single year to keep pace with this pace of improvement, I don't know if Tim Cooks Apple is capable of that.

There are many other issues with the apple silicon transition, the lack of RT cores, allowing the M series to fall behind the A series by so much (The M1 came 2 months after the A14 but the M2 came out 9 months later. The M1Pro/Max were a full year behind the M1. I don't know if they are understaffed or don't have enough high quality staff but they have allowed themselves to fall behind.

What I want to see next is Apple raise the bar on RT like they did with rasterization performance. The M3(or 4) should have good enough RT performance to make basic RT backed rendering and gaming with RT viable on the MacBook Air and iPad Pro.

Benchmark results doesn't really represent the actual performance. M1 Max itself is close to 3060 laptop with gaming performance so I would say that's meaningless when you actually work with software. Apple Silicon GPU's performance is weak, forget it everyone.

bcortens · Dec 22, 2022

sunny5 said:
Benchmark results doesn't really represent the actual performance. M1 Max itself is close to 3060 laptop with gaming performance so I would say that's meaningless when you actually work with software. Apple Silicon GPU's performance is weak, forget it everyone.

But that isn't what was claimed when 'weak' was brought up. You're redefining weak in the context of the conversation to be more about optimization and real world performance than theoretical performance. The comparison the TFlops of the GPUs and the benchmark results show that Apple's GPUs can hang well with their competitors in well optimized software. Bad optimization doesn't make the GPUs themselves weak where they are positioned in the overall market.

HiddenPaul · Dec 22, 2022

sunny5 said:
Benchmark results doesn't really represent the actual performance. M1 Max itself is close to 3060 laptop with gaming performance so I would say that's meaningless when you actually work with software. Apple Silicon GPU's performance is weak, forget it everyone.

Not quite. There are applications or operations that are certainly faster on AS GPUs. And in some applications, price per perf are very much in favor of AS.

3D Rendering on Apple Silicon, CPU&GPU

macrumors 6502a

macrumors Core

macrumors 6502a

macrumors Core

macrumors 6502a

macrumors Core

macrumors 6502a

macrumors Core

macrumors 6502a

macrumors 601

macrumors Core

Suspended

Suspended

macrumors 6502a

macrumors 68040

macrumors 6502a

macrumors 65816

macrumors Core

macrumors Core

macrumors newbie

macrumors 6502a

macrumors 6502a

Suspended

macrumors 65816

macrumors newbie

Our Staff