3D Rendering on Apple Silicon, CPU&GPU

vinegarshots · Apr 27, 2022

Xiao_Xi said:
I may have misunderstood the meeting notes. What is your interpretation?

Apple contributed an idea to use specialized kernels to improve Metal performance. Blender team looked into possibility of how that could work in Cycles, and found that it has some problems, which don't have a clear path to solving right now. Metal's implementation uses source-code and Nvidia/AMD uses byte-code mechanisms, so they would need to figure out a way to somehow get Nvidia/AMD to work with Metal's method (and in earlier tests, it didn't make enough performance improvement in Nvidia Optix to be worthwhile).

TLDR version: Apple proposed a method to improve Metal performance. Blender team sees usability issues using that method that would need to be solved. Backend to solve issues doesn't exist yet.

Xiao_Xi · Apr 28, 2022

Could Apple's method for improving Metal performance in Blender be useful/applied in other 3D software?

jmho · Apr 28, 2022

The way I understand it is that it's the age-old shader issue of whether or not to have one huge "monster shader" with a lot of branching, or to split that monster shader up into a lot of smaller ones - basically every time you would have an if / else statement to toggle a feature in a shader you just make a new shader and just delete the branch.

So instead of the main shader saying if material.specular >= 0 { .... calculate specular component ... } they just make two shaders, one that always calculates a specular component and one that never does, and then you just load the respective shader based on whether the material has its specular value set to black or not.

Obviously this means that you need more time compiling many different shaders though, but now each shader will execute quicker because it's doing less work. It sounds like there is something causing cycles to block while shaders are compiling so they need to make some changes to cycles itself so that it doesn't just block for a long time while compiling many different (potentially unnecessary) shaders.

galad · Apr 28, 2022

Improvements and bug fixes to the Metal Shader Compiler will benefits every app.

waltteriii · May 3, 2022

Blender 3.3.0 alpha version came today https://builder.blender.org/download/daily/

Anybody tested the cycles rendering if its faster than before? I'll check later how to install two separate Blender versions. Running version 3.1.2 right now and middle of a project.

l0stl0rd · May 3, 2022

waltteriii said:
Blender 3.3.0 alpha version came today https://builder.blender.org/download/daily/

Anybody tested the cycles rendering if its faster than before? I'll check later how to install two separate Blender versions. Running version 3.1.2 right now and middle of a project.

Still the same as fas as I can tell.

Xiao_Xi · May 3, 2022

waltteriii said:
Anybody tested the cycles rendering if its faster than before?

l0stl0rd said:
Still the same as fas as I can tell.

The new version of Blender should work the same. The Metal backend for the viewport is still in its infancy, and the Blender developers have not yet included Apple's Cycles optimizations.

Metal Viewport

**Status:** `?` Initial stages of development --- **Description** Bringing Metal support to the Blender viewport for optimal performance on macOS. This will be implemented as a new GPU Backend within Blender's GPU Module. **GHOST** * - [x] Encapsulate use of OpenGL within GHOST_ContextCGL * -...

developer.blender.org

Blender Archive - developer.blender.org

developer.blender.org

By the way, it looks like Apple is working on Blender more than I thought. There are six people listed as authors on the first patch for the Metal backend for the viewport.

Metal: Initial Implementation of Metal Backend for GPU Module. · 309ea31485

Adding WITH_METAL option to CMAKE to guard compilation for macOS only. Implemented stub METALBackend to mirror GPUBackend interface and added capabilities initialisation, along with API initialisation paths. Global rendering coordination commands added to backend with GPU_render_begin and GPU_r...

developer.blender.org

Lone Deranger · May 4, 2022

Xiao_Xi said:
The new version of Blender should work the same. The Metal backend for the viewport is still in its infancy, and the Blender developers have not yet included Apple's Cycles optimizations.

Metal Viewport

**Status:** `?` Initial stages of development --- **Description** Bringing Metal support to the Blender viewport for optimal performance on macOS. This will be implemented as a new GPU Backend within Blender's GPU Module. **GHOST** * - [x] Encapsulate use of OpenGL within GHOST_ContextCGL * -...

developer.blender.org

Blender Archive - developer.blender.org

developer.blender.org

By the way, it looks like Apple is working on Blender more than I thought. There are six people listed as authors on the first patch for the Metal backend for the viewport.

Metal: Initial Implementation of Metal Backend for GPU Module. · 309ea31485

Adding WITH_METAL option to CMAKE to guard compilation for macOS only. Implemented stub METALBackend to mirror GPUBackend interface and added capabilities initialisation, along with API initialisation paths. Global rendering coordination commands added to backend with GPU_render_begin and GPU_r...

developer.blender.org

We're in good hands folks, even Tim Cook is helping out.

jujoje · May 11, 2022

Another day, another benchmark. But wait! One that shows the M1 Ultra trading blows with a 3090!

This is for the Axiom fluid solver, which runs on OpenCL / Metal. You can see the benchmarks here:

Benchmark — Theory Accelerated

theoryaccelerated.com

The interesting bit from the page wrt the M1 Ultra:

The M1 Ultra sits comfortably in second place in most of the tests beating out the A6000. The AMD GPU in the Mac Pro shows the significant performance gains you can expect when using Metal over OpenCL, although the difference is not as staggering with the M1 chips.

And

I will also note that Axiom does not utilize any extra features found on modern GPUs. Axiom’s workload only uses the standard GPU cores, no AI accelerators, no raytracing cores, no media encoders / decoders, etc. It’s as fair a fight between GPUs as you can get.

Not entirely surprising, but interesting to see the M1 compete with the 3090 on raw GPU performance,. The lack of the raytracing cores really hurts the Apple Silicon GPU; fingers crossed for the Mac Pro...

leman · May 11, 2022

jujoje said:
Not entirely surprising, but interesting to see the M1 compete with the 3090 on raw GPU performance,. The lack of the raytracing cores really hurts the Apple Silicon GPU; fingers crossed for the Mac Pro...

In raw computational power, the 3090 should be at least 30-50% faster, possibly more. I assume that the bottleneck in this particular test is communication between the GPU work packages, where M1 might have an edge thanks to much larger cache.

And of course, they are using OpenCL, which probably means lack of optimization on Nvidia. It would probably look different with a CUDA and some profiling.

Xiao_Xi · May 11, 2022

jujoje said:
Not entirely surprising, but interesting to see the M1 compete with the 3090 on raw GPU performance,.

I find it more surprising that a company decided to optimize their software using Metal first instead of Cuda/Optix.

mi7chy · May 11, 2022

Gimped by OpenCL. Nice try though.

JimmyjamesEU · May 11, 2022

mi7chy said:
Gimped by OpenCL. Nice try though.

Almost as bad as posting a result where the Mac has hardware acceleration turned off and the pc has it on. What kind of person would do that?

Andropov · May 11, 2022

jmho said:
Obviously this means that you need more time compiling many different shaders though, but now each shader will execute quicker because it's doing less work.

Less register pressure too, since you free registers that were being used for flow control.

vinegarshots · May 11, 2022

jujoje said:
Not entirely surprising, but interesting to see the M1 compete with the 3090 on raw GPU performance,. The lack of the raytracing cores really hurts the Apple Silicon GPU; fingers crossed for the Mac Pro...

No it doesn't. OpenCL is not equivalent to Metal. In fact, Metal is what replaced OpenCL and OpenGL when Apple deprecated it in MacOS. The fair comparison would be Metal to Cuda, not Metal to OpenCL.

jujoje · May 11, 2022

The whole 'But wait! One that shows the M1 Ultra trading blows with a 3090!' was largely tongue in cheek (hence the '!') given how many random, largely spurious, benchmark results we seem to end up discussing, but apparently didn't read that way so much. Welcome to the internet I guess

leman said:
In raw computational power, the 3090 should be at least 30-50% faster, possibly more. I assume that the bottleneck in this particular test is communication between the GPU work packages, where M1 might have an edge thanks to much larger cache.

I didn't think the 3090 was that much faster (unless it was using Optix), been a while since I looked at any benchmarks though, so could be way off base.

Would that mean that with larger datasets the performance difference would increase? A bit unsure on what you mean, but assuming that the larger available memory would mean larger work items and less transferring of data around?

Xiao_Xi said:
I find it more surprising that a company decided to optimize their software using Metal first instead of Cuda/Optix.

I was pretty surprised myself; kinda neat though. Tbh largely posted it because I thought it was interesting, as I haven't seen this kind of thing being optimised for metal (let alone metal first).

mi7chy said:
Gimped by OpenCL. Nice try though.

Probably shouldn't dignify this low effort troll with a response, but hey ho. There's a compelling case for OpenCl (at least historically). Off the top of my head:

Vendor and platform agnostic.
Tends to be more stable. Nvidia semi-regularly seem to release drivers that break things which makes using them in production fun.
Works on the GPU and CPU; this enables you to develop things fast locally and submit the farm and get the same results (ish).

Curious to see how things go as we move away from OpenCL/GL.

vinegarshots said:
No it doesn't. OpenCL is not equivalent to Metal. In fact, Metal is what replaced OpenCL and OpenGL when Apple deprecated it in MacOS. The fair comparison would be Metal to Cuda, not Metal to OpenCL.

Agree that the story would be different with CUDA, but was curious to see the improvements in performance on the Mac between Metal and OpenCL. I guess that out of date OpenCL version was really hindering things. Besides for my use case OpenCL is a far better comparison, since most of the Houdini (outside of Karma GPU) is uses OpenCL (I think the only things that uses Optix are the vellum pressure constraint nodes).

I guess, comparing Apples to Apples (heh), the main takeaway would be that Nvidia offers terrible performance for OpenCL, given that Apple's massively out of date version (1.2 from 2013 iirc) gets pretty close it.

leman · May 11, 2022

jujoje said:
I didn't think the 3090 was that much faster (unless it was using Optix), been a while since I looked at any benchmarks though, so could be way off base.

Well, 3090 has 10496 ALUs running at ~1.5ghz, where Ultra has 8192 ALUs running at 1.3ghz, that already makes a big practical difference. Of course, direct comparisons are complex as Ampere is a more sophisticated GPU with limited superscalar execution capabilities, where Apple G13 is a very straightforward, streamlined device.

jujoje said:
Would that mean that with larger datasets the performance difference would increase? A bit unsure on what you mean, but assuming that the larger available memory would mean larger work items and less transferring of data around?

I think Apple has two big advantages. One is access to much more RAM: if you are working on a very large problem Apple can just, we’ll, work on it where Nvidia would need to fetch the data over the extremely slow PCIe interface. Another one is the cache size and CPU coherency: if you need a lot of synchronization between your compute shaders (among themselves or with CPU work), Apple will stall less. So yeah, I’d say that Apple GPUs are better for large workloads with complex data dependencies. But Nvidia will often have an advantage if the shaders themselves are large/complex, as they have more registers and more compute throughput.

jujoje said:
Curious to see how things go as we move away from OpenCL/GL.

I always wondered why there is a seeming performance discrepancy between CUDA/OpenCL. It kind of boils down to the same thing, just some C-like code compiled for the GPU, but a paper some years ago found whopping 13 to 60% performance difference. Could be that the OpenCL memory management is not a perfect fit for Nvidia, but I somehow suspect that the culprit is lack of quality in the OpenCL implementation. The kernels are likely very poorly optimized. Not that I would find it surprising, OpenCL implementation from Nvidia was always more akin to malicious compliance. After all, it was Nvidia who effectively killed OpenCL.

leman · May 12, 2022

mi7chy said:
Gimped by OpenCL. Nice try though.

Gimped by Nvidia's piss poor drivers you mean

diamond.g · May 12, 2022

jujoje said:
Tends to be more stable. Nvidia semi-regularly seem to release drivers that break things which makes using them in production fun.

leman said:
Gimped by Nvidia's piss poor drivers you mean

If you have a working driver (no bugs in applications you use) why would you be updating it?

mi7chy · May 12, 2022

Made up fairy tales. Who updates drivers or anything in production without first testing in dev and/or qa? AMD HIP performance has been broken with newer drivers for the last two months so have to roll back to February drivers while Nvidia OptiX/CUDA are still working without issue.

JimmyjamesEU · May 12, 2022

mi7chy said:
Made up fairy tales...

Oh the irony.

bogdanw · May 13, 2022

"The Apple GPU and the Impossible Bug" https://rosenzweig.io/blog/asahi-gpu-part-5.html

MayaUser · May 13, 2022

JimmyjamesEU said:
Oh the irony.

People still responding to that girl..it should be better if you all ignore her

jujoje · May 13, 2022

leman said:
I think Apple has two big advantages. One is access to much more RAM: if you are working on a very large problem Apple can just, we’ll, work on it where Nvidia would need to fetch the data over the extremely slow PCIe interface. Another one is the cache size and CPU coherency: if you need a lot of synchronization between your compute shaders (among themselves or with CPU work), Apple will stall less. So yeah, I’d say that Apple GPUs are better for large workloads with complex data dependencies. But Nvidia will often have an advantage if the shaders themselves are large/complex, as they have more registers and more compute throughput.

Thanks for the explanation!

Apple's GPUs seem to be in a bit of an awkward place in that the areas that they offer most advantage are those which are probably less common and harder to fully utilise at least in terms of 3D workflows (simulations or rendering large scene might well do it; figure for a lot of the other use cases Nvidia would probably have the advantage).

leman said:
I always wondered why there is a seeming performance discrepancy between CUDA/OpenCL. It kind of boils down to the same thing, just some C-like code compiled for the GPU, but a paper some years ago found whopping 13 to 60% performance difference. Could be that the OpenCL memory management is not a perfect fit for Nvidia, but I somehow suspect that the culprit is lack of quality in the OpenCL implementation. The kernels are likely very poorly optimized. Not that I would find it surprising, OpenCL implementation from Nvidia was always more akin to malicious compliance. After all, it was Nvidia who effectively killed OpenCL.

I'm definitely going to go with Nvidia gimping their OpenCL drivers as well. Moving towards propriety, vendor specific APIs is a bit frustrating; at least with OpenCL/OpenGL there was a common core to target that should mostly behave across vendors (even if, as in Nvidia's case, their driver deliberately sucks).

I suspect that 3D Apps might well end up going with Vulkan/MoltenVK rather than Metal for the viewports for cross platform compatibility. On the up side there would be parity and faster development, on the downsides I'm assuming there's some things that still wouldn't work on MacOS, and we'd never really see the full use of Metal. Apart from Blender, at least.

diamond.g said:
If you have a working driver (no bugs in applications you use) why would you be updating it?

There's no such thing as a perfect driver and sometimes problems show up over time, or through unexpected behaviour in another application (nothing like a driver fixing a crash in Maya but causing a crash in Nuke). Besides if a new feature or application would save a significant amount of time in production, then updating makes sense.

GPU renderers also tend to be pretty sensitive to drivers as well; one studio I went with CPU over GPU rendering to a certain extent because of the instability and unreliability of GPU drivers (this was a while ago though).

jujoje · May 13, 2022

leman said:
Gimped by Nvidia's piss poor drivers you mean

I take your Nvidia's piss poor drivers and raise you Nvidia's piss poor denoising solution:

https://twitter.com/x/status/1524381001056673793

More seriously, Intel de-noiser now native

3D Rendering on Apple Silicon, CPU&GPU

macrumors 65816

macrumors 68000

macrumors 6502a

macrumors 6502a

macrumors newbie

macrumors 6502

macrumors 68000

macrumors 68000

macrumors 6502

macrumors Core

macrumors 68000

Suspended

Suspended

macrumors 6502a

macrumors 65816

macrumors 6502

macrumors Core

macrumors Core

macrumors G5

Suspended

Suspended

macrumors 604

macrumors 68040

macrumors 6502

macrumors 6502

Our Staff