Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

jujoje

macrumors regular
May 17, 2009
247
288
Looks like Octane are getting some nice performance benefits from the new hw raytracing:

Octane X 2024 is being optimized for new ray tracing hardware on M3 Apple GPUS. This is scene dependent, but is already showing 2x to 12x(!) speed gains in heavily instanced scenes (link)

And the other nice thing that stood out for me was rendering across different platforms/hardware

Mixed Platform Network Rendering
Until now we used a different memory layout for geometry data on Metal (macOS) and CUDA. This made the compiled geometry data incompatible between macOS and Windows/Linux, prohibiting mixed platform network rendering. In version 2024.1 we unified the memory layout, allowing now network rendering on both platforms

Don't really use Octane myself (gave it a go, but went back to Karma CPU in the end). Interested to hear how anyone finds it these days on macOS? They seem much more focused on AS and Metal performance than Redshift (although always feel Otoy overpromise a bit).

Also, as a side note, they have some kind of Black Friday deal going at the moment.
 

Spybreakjj

macrumors newbie
May 11, 2015
16
37
Spain
I'm really excited about Octane 2024 and plan to test it extensively as soon as my M3 Max comes in on Monday/Tuesday.

I've used it this last year on my M1 Max but mostly during the first 10%-20% of building a scene. Once it got too much for the live viewer I would move the project to my 4090+3090 PC and finish there. Looking forward to being able to do more work on the M3 and hopefully network render with both machines all from MacOS.
 

Appletoni

Suspended
Mar 26, 2021
443
177
I do lighter, more technical work, mainly in Houdini and have been using a m1max for the last years as my primary machine. It has been rough in many ways but also enough. When doing any serious work, I am usually connected to a display and have a lot of gadgets connected so I have been waiting for “the serious Mac desktop” for a while. When doing any sim or render that takes more than 5 mins, fans spin up. Not loudly, but enough to stress me about what happens with dust buildup etc. also, the battery gets worse by time so in the end it will become a desktop anyway.
The m3max uses a lot more power it seems, so fans will have to be working harder. The “lust” for this machine is high, but I will wait for the ultra. That will be a machine that can stay with me for many years. A m3max specced out enough is in the 7000$ range, way to much for a biannual investment.
Funny I'm doing exactly the same things using Houdini and ChessBase 17 and Fritz 19😁.
I will wait for the ULTRA too, hopefully something between 18- to 20-inch and until then I will stay with the M3 MAX maxed out MacBook.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
What are those differences in Mesh Shaders?
Experiments for optimizing shadow maps for the dynamic shadow maps. Idea is to add a simple depth pass which is a transform using mesh shaders, just moving the position and most likely be optimized away. This is for now only used by the Metal backend due to the differences between Mesh Shader optimizations on other platforms.
 

singhs.apps

macrumors 6502a
Oct 27, 2016
660
400
I'm really excited about Octane 2024 and plan to test it extensively as soon as my M3 Max comes in on Monday/Tuesday.

I've used it this last year on my M1 Max but mostly during the first 10%-20% of building a scene. Once it got too much for the live viewer I would move the project to my 4090+3090 PC and finish there. Looking forward to being able to do more work on the M3 and hopefully network render with both machines all from MacOS.
Oh man yes. Will like to hook my pc to the Mac to get render tasks done via the pc. Let me know how u have set it up whenever you have the time.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Some say Blender 4 might be slower because of the new Material nodes but not sure.
A forum user has quoted the lead Cycles developer on the differences between Blender 3.6 and Blender 4.0 in the Blender forum.
For 4.0, the main reason you shouldn’t compare it’s performance with previous versions of Blender is because of the shader and light changes. Changes to the shaders and lights will make them render differently. This makes comparisons of performance unfair because you’re not comparing the same render between versions.
 
  • Like
Reactions: Macintosh IIcx

aytan

macrumors regular
Dec 20, 2022
161
110
A forum user has quoted the lead Cycles developer on the differences between Blender 3.6 and Blender 4.0 in the Blender forum.

There is no significant difference between 3.6 and 4.0.1 on M1 Ultra, rendered out same scene both of them difference is 18 seconds better at 4 minutes 22 seconds total render time with Blender 4.0.1. This issue could related only with AMD GPUs. I have no clue about new shader system is slower on any Apple Mx Devices. It works just fine on Mx.
 

leman

macrumors Core
Oct 14, 2008
19,518
19,669
There is no significant difference between 3.6 and 4.0.1 on M1 Ultra, rendered out same scene both of them difference is 18 seconds better at 4 minutes 22 seconds total render time with Blender 4.0.1. This issue could related only with AMD GPUs. I have no clue about new shader system is slower on any Apple Mx Devices. It works just fine on Mx.

According to official Blender benchmark database Mx series seem to be around 6% faster in 3.6 compared to 4.0 (with M3 being the obvious exception, of course).

 

Appletoni

Suspended
Mar 26, 2021
443
177
$1000 upgrade to get a 128 Gb RAM GPU render setup is amazingly cheap. Try that with NVIDIA.
Don't forget it's also for DDR5 RAM + much faster
and not DDR4 + much slower or DDR3 + much much slower.
By the way I paid $500 for 64GB DDR3 RAM!! (Windows PC self configuration) FOR CHESS.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
What could those issues be?
Metal support is under development in OIDN [Open Image Denoise]. There are some technical issues regarding how Metal needs handles instead of device memory pointers, and there was some discussion about the precise implementation of that.
 
  • Like
Reactions: jujoje

leman

macrumors Core
Oct 14, 2008
19,518
19,669
What could those issues be?


Isn't that written in the notes? It seems that the code relies on unified virtual memory between the CPU and the GPU, which Apple does not support.
 
  • Like
Reactions: Xiao_Xi

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,660
OBX
Isn't that written in the notes? It seems that the code relies on unified virtual memory between the CPU and the GPU, which Apple does not support.
This feels like a dumb question, but how does Apple have literal unified memory, but not support it virtually?
 

leman

macrumors Core
Oct 14, 2008
19,518
19,669
This feels like a dumb question, but how does Apple have literal unified memory, but not support it virtually?

Those are two very different things. Virtual address is what your application sees, the hardware uses a sophisticated mechanism to translate it to the actual physical address. This enables a bunch of useful features (e.g. the address might point to a file on disk instead of physical RAM, you get additional safety, and the OS can move/compress the physical memory without you even noticing, etc. etc.).

For unified virtual memory you need to use the same address space on the CPU and the GPU, so that if you copy a memory address between devices it will be valid and point to the same data. I do not know why Apple does not support this, one would think that their hardware would be perfectly capable of sharing memory page descriptors. Maybe there is still some legacy reason, or maybe there are additional complications. But who knows, it is also possible that Metal 4 will have unified virtual memory on Apple Silicon but will drop Intel-based Macs or something like that.
 

name99

macrumors 68020
Jun 21, 2004
2,407
2,309
Those are two very different things. Virtual address is what your application sees, the hardware uses a sophisticated mechanism to translate it to the actual physical address. This enables a bunch of useful features (e.g. the address might point to a file on disk instead of physical RAM, you get additional safety, and the OS can move/compress the physical memory without you even noticing, etc. etc.).

For unified virtual memory you need to use the same address space on the CPU and the GPU, so that if you copy a memory address between devices it will be valid and point to the same data. I do not know why Apple does not support this, one would think that their hardware would be perfectly capable of sharing memory page descriptors. Maybe there is still some legacy reason, or maybe there are additional complications. But who knows, it is also possible that Metal 4 will have unified virtual memory on Apple Silicon but will drop Intel-based Macs or something like that.
What notes are you referring to beyond what I see on the screen which does not say this.

The issue, as far as I can tell is not one of physical or virtual access, it is one of API.
I am NOT a Metal expert, but this is my understanding:

Metal wants you to use Handles (essentially indices into a table of address ranges) not raw pointers. This is not because raw pointers won't, in some sense, "work", it's because the Metal API depends on being told how and when blocks of data are being used.
This is because the L1 caches are not coherent, and *SW conventions* are required to handle coherency.

After every unit of work (called a "kick", but you can think of this as a shader) the L1 caches are flushed to L2, so that subsequent shaders that depend on the results of this shader, but which may run on a different core, can see the work that was done.
This mechanism for flushing data from L1 to L2 is very sophisticated in terms of flushing the minimum amount of data, and in terms of scheduling non-dependent kicks to execute at the same time that flushing is happening, BUT it depends on knowing which address ranges are used in what way by each kick – ie which address ranges were read, which were written.
If you start passing around raw pointers without informing Metal of how the associated data ranges are being used, you will lose this flushing/coherence.

This is not an issue for Vulkan, or more precisely it's a DIFFERENT issue, because Vulkan has a different model of who is responsible for flushing L1 caches. Apple's solution is not wrong, it's just different from Vulkan's, and assumes different packaging of the information about who is responsible for flushing what data ranges when.
It's like Apple is Java or Swift, with memory management happening behind the scenes – but you don't use raw pointers – while Vulkan is like C with manual calls to malloc and free.

Corrections welcome if I got anything wrong!
 

Attachments

  • Screenshot 2023-12-14 at 10.38.59 AM.png
    Screenshot 2023-12-14 at 10.38.59 AM.png
    57.6 KB · Views: 118
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.