Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

vinegarshots

macrumors 6502a
Sep 24, 2018
983
1,349
I may have misunderstood the meeting notes. What is your interpretation?
Apple contributed an idea to use specialized kernels to improve Metal performance. Blender team looked into possibility of how that could work in Cycles, and found that it has some problems, which don't have a clear path to solving right now. Metal's implementation uses source-code and Nvidia/AMD uses byte-code mechanisms, so they would need to figure out a way to somehow get Nvidia/AMD to work with Metal's method (and in earlier tests, it didn't make enough performance improvement in Nvidia Optix to be worthwhile).

TLDR version: Apple proposed a method to improve Metal performance. Blender team sees usability issues using that method that would need to be solved. Backend to solve issues doesn't exist yet.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,628
1,101
Could Apple's method for improving Metal performance in Blender be useful/applied in other 3D software?
 

jmho

macrumors 6502a
Jun 11, 2021
502
996
The way I understand it is that it's the age-old shader issue of whether or not to have one huge "monster shader" with a lot of branching, or to split that monster shader up into a lot of smaller ones - basically every time you would have an if / else statement to toggle a feature in a shader you just make a new shader and just delete the branch.

So instead of the main shader saying if material.specular >= 0 { .... calculate specular component ... } they just make two shaders, one that always calculates a specular component and one that never does, and then you just load the respective shader based on whether the material has its specular value set to black or not.

Obviously this means that you need more time compiling many different shaders though, but now each shader will execute quicker because it's doing less work. It sounds like there is something causing cycles to block while shaders are compiling so they need to make some changes to cycles itself so that it doesn't just block for a long time while compiling many different (potentially unnecessary) shaders.
 

galad

macrumors 6502a
Apr 22, 2022
611
492
Improvements and bug fixes to the Metal Shader Compiler will benefits every app.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,628
1,101
Anybody tested the cycles rendering if its faster than before?
Still the same as fas as I can tell.
The new version of Blender should work the same. The Metal backend for the viewport is still in its infancy, and the Blender developers have not yet included Apple's Cycles optimizations.

By the way, it looks like Apple is working on Blender more than I thought. There are six people listed as authors on the first patch for the Metal backend for the viewport.
 
Last edited:

Lone Deranger

macrumors 68000
Apr 23, 2006
1,900
2,145
Tokyo, Japan
The new version of Blender should work the same. The Metal backend for the viewport is still in its infancy, and the Blender developers have not yet included Apple's Cycles optimizations.

By the way, it looks like Apple is working on Blender more than I thought. There are six people listed as authors on the first patch for the Metal backend for the viewport.
We're in good hands folks, even Tim Cook is helping out. ;)

Screenshot 2022-05-04 at 21.50.08.png
 

jujoje

macrumors regular
May 17, 2009
247
288
Another day, another benchmark. But wait! One that shows the M1 Ultra trading blows with a 3090!

This is for the Axiom fluid solver, which runs on OpenCL / Metal. You can see the benchmarks here:


The interesting bit from the page wrt the M1 Ultra:

The M1 Ultra sits comfortably in second place in most of the tests beating out the A6000. The AMD GPU in the Mac Pro shows the significant performance gains you can expect when using Metal over OpenCL, although the difference is not as staggering with the M1 chips.

And

I will also note that Axiom does not utilize any extra features found on modern GPUs. Axiom’s workload only uses the standard GPU cores, no AI accelerators, no raytracing cores, no media encoders / decoders, etc. It’s as fair a fight between GPUs as you can get.

Not entirely surprising, but interesting to see the M1 compete with the 3090 on raw GPU performance,. The lack of the raytracing cores really hurts the Apple Silicon GPU; fingers crossed for the Mac Pro...
 

leman

macrumors Core
Oct 14, 2008
19,522
19,679
Not entirely surprising, but interesting to see the M1 compete with the 3090 on raw GPU performance,. The lack of the raytracing cores really hurts the Apple Silicon GPU; fingers crossed for the Mac Pro...

In raw computational power, the 3090 should be at least 30-50% faster, possibly more. I assume that the bottleneck in this particular test is communication between the GPU work packages, where M1 might have an edge thanks to much larger cache.

And of course, they are using OpenCL, which probably means lack of optimization on Nvidia. It would probably look different with a CUDA and some profiling.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,628
1,101
Not entirely surprising, but interesting to see the M1 compete with the 3090 on raw GPU performance,.
I find it more surprising that a company decided to optimize their software using Metal first instead of Cuda/Optix.
 

Andropov

macrumors 6502a
May 3, 2012
746
990
Spain
Obviously this means that you need more time compiling many different shaders though, but now each shader will execute quicker because it's doing less work.
Less register pressure too, since you free registers that were being used for flow control.
 

vinegarshots

macrumors 6502a
Sep 24, 2018
983
1,349
Not entirely surprising, but interesting to see the M1 compete with the 3090 on raw GPU performance,. The lack of the raytracing cores really hurts the Apple Silicon GPU; fingers crossed for the Mac Pro...

No it doesn't. OpenCL is not equivalent to Metal. In fact, Metal is what replaced OpenCL and OpenGL when Apple deprecated it in MacOS. The fair comparison would be Metal to Cuda, not Metal to OpenCL.
 

jujoje

macrumors regular
May 17, 2009
247
288
The whole 'But wait! One that shows the M1 Ultra trading blows with a 3090!' was largely tongue in cheek (hence the '!') given how many random, largely spurious, benchmark results we seem to end up discussing, but apparently didn't read that way so much. Welcome to the internet I guess :p

In raw computational power, the 3090 should be at least 30-50% faster, possibly more. I assume that the bottleneck in this particular test is communication between the GPU work packages, where M1 might have an edge thanks to much larger cache.

I didn't think the 3090 was that much faster (unless it was using Optix), been a while since I looked at any benchmarks though, so could be way off base.

Would that mean that with larger datasets the performance difference would increase? A bit unsure on what you mean, but assuming that the larger available memory would mean larger work items and less transferring of data around?

I find it more surprising that a company decided to optimize their software using Metal first instead of Cuda/Optix.

I was pretty surprised myself; kinda neat though. Tbh largely posted it because I thought it was interesting, as I haven't seen this kind of thing being optimised for metal (let alone metal first).

Gimped by OpenCL. Nice try though.

Probably shouldn't dignify this low effort troll with a response, but hey ho. There's a compelling case for OpenCl (at least historically). Off the top of my head:

  1. Vendor and platform agnostic.
  2. Tends to be more stable. Nvidia semi-regularly seem to release drivers that break things which makes using them in production fun.
  3. Works on the GPU and CPU; this enables you to develop things fast locally and submit the farm and get the same results (ish).
Curious to see how things go as we move away from OpenCL/GL.

No it doesn't. OpenCL is not equivalent to Metal. In fact, Metal is what replaced OpenCL and OpenGL when Apple deprecated it in MacOS. The fair comparison would be Metal to Cuda, not Metal to OpenCL.

Agree that the story would be different with CUDA, but was curious to see the improvements in performance on the Mac between Metal and OpenCL. I guess that out of date OpenCL version was really hindering things. Besides for my use case OpenCL is a far better comparison, since most of the Houdini (outside of Karma GPU) is uses OpenCL (I think the only things that uses Optix are the vellum pressure constraint nodes).

I guess, comparing Apples to Apples (heh), the main takeaway would be that Nvidia offers terrible performance for OpenCL, given that Apple's massively out of date version (1.2 from 2013 iirc) gets pretty close it.
 
  • Like
Reactions: sirio76

leman

macrumors Core
Oct 14, 2008
19,522
19,679
I didn't think the 3090 was that much faster (unless it was using Optix), been a while since I looked at any benchmarks though, so could be way off base.

Well, 3090 has 10496 ALUs running at ~1.5ghz, where Ultra has 8192 ALUs running at 1.3ghz, that already makes a big practical difference. Of course, direct comparisons are complex as Ampere is a more sophisticated GPU with limited superscalar execution capabilities, where Apple G13 is a very straightforward, streamlined device.

Would that mean that with larger datasets the performance difference would increase? A bit unsure on what you mean, but assuming that the larger available memory would mean larger work items and less transferring of data around?

I think Apple has two big advantages. One is access to much more RAM: if you are working on a very large problem Apple can just, we’ll, work on it where Nvidia would need to fetch the data over the extremely slow PCIe interface. Another one is the cache size and CPU coherency: if you need a lot of synchronization between your compute shaders (among themselves or with CPU work), Apple will stall less. So yeah, I’d say that Apple GPUs are better for large workloads with complex data dependencies. But Nvidia will often have an advantage if the shaders themselves are large/complex, as they have more registers and more compute throughput.

Curious to see how things go as we move away from OpenCL/GL.

I always wondered why there is a seeming performance discrepancy between CUDA/OpenCL. It kind of boils down to the same thing, just some C-like code compiled for the GPU, but a paper some years ago found whopping 13 to 60% performance difference. Could be that the OpenCL memory management is not a perfect fit for Nvidia, but I somehow suspect that the culprit is lack of quality in the OpenCL implementation. The kernels are likely very poorly optimized. Not that I would find it surprising, OpenCL implementation from Nvidia was always more akin to malicious compliance. After all, it was Nvidia who effectively killed OpenCL.
 

mi7chy

macrumors G4
Oct 24, 2014
10,625
11,298
Made up fairy tales. Who updates drivers or anything in production without first testing in dev and/or qa? AMD HIP performance has been broken with newer drivers for the last two months so have to roll back to February drivers while Nvidia OptiX/CUDA are still working without issue.
 

jujoje

macrumors regular
May 17, 2009
247
288
I think Apple has two big advantages. One is access to much more RAM: if you are working on a very large problem Apple can just, we’ll, work on it where Nvidia would need to fetch the data over the extremely slow PCIe interface. Another one is the cache size and CPU coherency: if you need a lot of synchronization between your compute shaders (among themselves or with CPU work), Apple will stall less. So yeah, I’d say that Apple GPUs are better for large workloads with complex data dependencies. But Nvidia will often have an advantage if the shaders themselves are large/complex, as they have more registers and more compute throughput.

Thanks for the explanation!

Apple's GPUs seem to be in a bit of an awkward place in that the areas that they offer most advantage are those which are probably less common and harder to fully utilise at least in terms of 3D workflows (simulations or rendering large scene might well do it; figure for a lot of the other use cases Nvidia would probably have the advantage).

I always wondered why there is a seeming performance discrepancy between CUDA/OpenCL. It kind of boils down to the same thing, just some C-like code compiled for the GPU, but a paper some years ago found whopping 13 to 60% performance difference. Could be that the OpenCL memory management is not a perfect fit for Nvidia, but I somehow suspect that the culprit is lack of quality in the OpenCL implementation. The kernels are likely very poorly optimized. Not that I would find it surprising, OpenCL implementation from Nvidia was always more akin to malicious compliance. After all, it was Nvidia who effectively killed OpenCL.

I'm definitely going to go with Nvidia gimping their OpenCL drivers as well. Moving towards propriety, vendor specific APIs is a bit frustrating; at least with OpenCL/OpenGL there was a common core to target that should mostly behave across vendors (even if, as in Nvidia's case, their driver deliberately sucks).

I suspect that 3D Apps might well end up going with Vulkan/MoltenVK rather than Metal for the viewports for cross platform compatibility. On the up side there would be parity and faster development, on the downsides I'm assuming there's some things that still wouldn't work on MacOS, and we'd never really see the full use of Metal. Apart from Blender, at least.

If you have a working driver (no bugs in applications you use) why would you be updating it?

There's no such thing as a perfect driver and sometimes problems show up over time, or through unexpected behaviour in another application (nothing like a driver fixing a crash in Maya but causing a crash in Nuke). Besides if a new feature or application would save a significant amount of time in production, then updating makes sense.

GPU renderers also tend to be pretty sensitive to drivers as well; one studio I went with CPU over GPU rendering to a certain extent because of the instability and unreliability of GPU drivers (this was a while ago though).
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.