3D Rendering on Apple Silicon, CPU&GPU

Xiao_Xi · Nov 8, 2024

leman said:
What would that be used for? Denoising? Are tensor cores currently used for denoising?

Intel Open Image Denoise library uses tensor cores.

Intel Open Image Denoise exploits modern instruction sets like SSE4, AVX2, AVX-512, and NEON on CPUs, Intel® Xe Matrix Extensions (Intel® XMX) on Intel GPUs, and tensor cores on NVIDIA GPUs to achieve high denoising performance.

It also works on the Apple Silicon GPU, but not on the NPU.

Intel Open Image Denoise supports a wide variety of CPUs and GPUs from different vendors:

ARM64 (AArch64) architecture CPUs (e.g. Apple silicon CPUs)

Apple silicon GPUs (M1 and newer)

GitHub - RenderKit/oidn: Intel® Open Image Denoise library

Intel® Open Image Denoise library. Contribute to RenderKit/oidn development by creating an account on GitHub.

github.com

aeronatis · Nov 8, 2024

leman said:
What would that be used for? Denoising? Are tensor cores currently used for denoising?

I would think so, yes. OptiX is using the Tensor cores in a similar way to DLSS. Right now, choosing OptiX as denoiser almost halves the duration of the render on Nvidia card; however, Apple Silicon does not have anything corresponding to that. Anyway, I still think it is quite good for M3 Max to be similar to RTX 4070 Laptop with OptiX enabled, especially considering the overall app experience is much more stable on M3 Max.

M4pro · Nov 8, 2024

It’s only for one scene (and Blender benchmark scenes vary wildly in complexity) - still this cross-platform chart of denoising times is interesting to me.

Cycles - Blender Developer Documentation

developer.blender.org

Perhaps unbinned M4 Max GPU will approach the M2 Ultra GPU denoising time results (a fraction of a second when rendered at 2000 x 1000 pixels)

aeronatis · Nov 8, 2024

M4pro said:
It’s only for one scene (and Blender benchmark scenes vary wildly in complexity) - still this cross-platform chart of denoising times is interesting to me.

View attachment 2448547

Cycles - Blender Developer Documentation

developer.blender.org

Perhaps unbinned M4 Max GPU will approach the M2 Ultra GPU results (a fraction of a second when rendered at 2000 x 1000 pixels)

M3 Max and M4 Max have hardware RT, which dramatically increases the performance for Blender. Let me share the same scene I mentioned above (Scanlands) for:

M1 Max: 06:55
M2 Max: 04:09
M3 Max: 01:04

My RTX 4080 Desktop finishes the scene in 00:28 with OptiX and in 01:16 without OptiX, which means M4 Max gets roughly between RTX 4080 Desktop CUDA and RTX 4080 Desktop OptiX if it completes the scene around %20 faster than my M3 Max.

M4pro · Nov 8, 2024

aeronatis said:
M3 Max and M4 Max have hardware RT, which dramatically increases the performance for Blender. Let me share the same scene I mentioned above (Scanlands) for:

M1 Max: 06:55
M2 Max: 04:09
M3 Max: 01:04

My RTX 4080 Desktop finishes the scene in 00:28 with OptiX and in 01:16 without OptiX, which means M4 Max gets roughly between RTX 4080 Desktop CUDA and RTX 4080 Desktop OptiX if it completes the scene around %20 faster than my M3 Max.

If M4 Max completes this scene around 00:50, it means it is already close to

Uh huh, how would you say the resulting image quality is using Apple Silicon GPU denoising in Cycles?

I’d have to go back to look at some images, but the way I remember things, the quality of results I was getting from Cycles denoising (Metal) on an Intel CPU / AMD GPU Mac was not bad.

jujoje · Nov 8, 2024

M4pro said:
Cycles denoising (Metal)

Isn't cycles denoising either Nvidia denoiser or Intel's? Or is their an actual cycles only Denoiser now?

If it's the former, quality wise, Nvidia is fast but bad quality, OIDN is reasonable quality, but can struggle with high frequency detail and hair. All of the denoisers suck at volumes as far as I can tell (pretty much just blur it; first thing comp does anyway so 🤷‍♂️).

From what I've heard the Renderman denoiser is a long way ahead of the other options particularly where animation is concerned (it's much more temporally consistent).

Homy · Nov 8, 2024

M4 Max is now almost as fast as desktop 4080 S in Blender BMW test.

Skärmavbild 2024-11-09 kl. 04.48.27.png

Skärmavbild 2024-11-09 kl. 04.49.56.png

Skärmavbild 2024-11-09 kl. 04.51.19.png

mi7chy · Nov 8, 2024

Homy said:
M4 Max is now almost as fast as desktop 4080 S in Blender BMW test.

Why such short render? Are they afraid of posting completion time for something longer like Barbershop?

jujoje · Nov 8, 2024

mi7chy said:
Why such short render? Are they afraid of posting completion time for something longer like Barbershop?

I agree, the bmw scene is not the best. Would be nice to get the Moana island, or Alabs scene, but sadly youtuber's want something that is fast to benchmark, not something representative of production use.

Pretty sure it would not give you the result you want, given that the last result was from back in the M1 days (Redshift):

3090 = 21m:45s
M1 Max = 28m:27s

The 4090 would be a bit fair bit faster (more memory and speed), but I'd be willing to bet that the M4 Max would be pretty close to, or beat it, now it has hw raytracing. But I guess we'll never know, as no one benchmarks those kinds of scenes.

M4pro · Nov 8, 2024

jujoje said:
Isn't cycles denoising either Nvidia denoiser or Intel's? Or is their an actual cycles only Denoiser now?

If it's the former, quality wise, Nvidia is fast but bad quality, OIDN is reasonable quality, but can struggle with high frequency detail and hair. All of the denoisers suck at volumes as far as I can tell (pretty much just blur it; first thing comp does anyway so 🤷‍♂️).

From what I've heard the Renderman denoiser is a long way ahead of the other options particularly where animation is concerned (it's much more temporally consistent).

For Metal, Cycles uses Intel’s denoiser - but now it’s the newer 2024 version supporting Apple Silicon.

Homy · Nov 8, 2024

jujoje said:
I agree, the bmw scene is not the best. Would be nice to get the Moana island, or Alabs scene, but sadly youtuber's want something that is fast to benchmark, not something representative of production use.

Pretty sure it would not give you the result you want, given that the last result was from back in the M1 days (Redshift):

3090 = 21m:45s
M1 Max = 28m:27s

The 4090 would be a bit fair bit faster (more memory and speed), but I'd be willing to bet that the M4 Max would be pretty close to, or beat it, now it has hw raytracing. But I guess we'll never know, as no one benchmarks those kinds of scenes.

M3 Max is 2.4 - 2.8x faster than M1 Max.

Elly Wade on Instagram: "Redshift render speed tests - M3 Max vs M1 Max MacBook Pro 🍏💻⚡️ So, I recently upgraded to an M3 max MacBook Pro (128gb RAM) 💻 and a lot of you were asking how it compared to my previous laptop, the M1 max MBP (64GB RAM). So

696 likes, 37 comments - itwaselly on February 27, 2024: "Redshift render speed tests - M3 Max vs M1 Max MacBook Pro 🍏💻⚡️ So, I recently upgraded to an M3 max MacBook Pro (128gb RAM) 💻 and a lot of you were asking how it compared to my previous laptop, the M1 max MBP (64GB RAM). So I thought...

www.instagram.com

sirio76 · Nov 9, 2024

vinegarshots said:
But like you said, the Blender Benchmark sums for all the scenes, and the score database on their site doesn't list the individual scene scores. The fact that Apple isn't sharing the full result so that one can easily compare to the PC scores in the result browser speaks volumes, I would assume.

EDIT: I just ran the Blender Benchmark to see the individual scene scores on my 4090...just as I expected.

Monster:
M4 Max: 237
Nvidia 4090: 5,393

Junk Shop:
M4 Max: 152
Nvidia 4090: 2376

Classroom:
M4 Max: 102
Nvidia 4090: 2621

That is actually shockingly bad performance on the M4 Max in comparison to the Nvidia hardware, TBH. The M4 Max, according to that benchmark, is only eeking out ~4% of the performance of a 4090...

edit.. I just saw other already noted that.
I think you made some confusion, according to the results from top of the page the M4 max reach about half of your 4090 performance:
https://forums.macrumors.com/proxy.php?image=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FGbyeTQhWwAMOmaf%3Fformat%3Djpg%26name%3D4096x4096&hash=219e21d7d664a6b980c59b33412b2cb4

Pressure · Nov 9, 2024

jujoje said:
I agree, the bmw scene is not the best. Would be nice to get the Moana island, or Alabs scene, but sadly youtuber's want something that is fast to benchmark, not something representative of production use.

Pretty sure it would not give you the result you want, given that the last result was from back in the M1 days (Redshift):

3090 = 21m:45s
M1 Max = 28m:27s

The 4090 would be a bit fair bit faster (more memory and speed), but I'd be willing to bet that the M4 Max would be pretty close to, or beat it, now it has hw raytracing. But I guess we'll never know, as no one benchmarks those kinds of scenes.

Just revisited that Redshift forum thread and it was kind of sad going down memory lane and read them lusting for the M3 Ultra and now we are at M4 already and still no new Mac Studio or Mac Pro.

jujoje · Nov 10, 2024

Pressure said:
Just revisited that Redshift forum thread and it was kind of sad going down memory lane and read them lusting for the M3 Ultra and now we are at M4 already and still no new Mac Studio or Mac Pro.

I'm just hoping that the reason it's been so delayed is they've got something a bit more exciting that two Max's glued together (Gurman's Hidra rumors).

Not entirely counting on it though; had hoped that moving to AS would stabilize Apple's release schedule for the high end, but still seems to be as undependable as always (don't mind if it's not yearly, but at least have some kind of roadmap for the pro market).

Macintosh IIcx · Nov 10, 2024

Xiao_Xi said:
Device Name Median Score Number of Benchmarks
NVIDIA GeForce RTX 4090 10885.57 635
NVIDIA GeForce RTX 3090 5333.05 284
NVIDIA GeForce RTX 4070 5126.74 281
Apple M4 Max (GPU - 40 cores) 5083.33 1
Apple M3 Max (GPU - 40 cores) 4257.46 86
NVIDIA GeForce RTX 4070 Laptop GPU 3457.44 260
Apple M4 Pro (GPU - 20 cores) 2530.47 6
Apple M3 Pro (GPU - 18 cores) 1768.64 43

Blender - Open Data

Blender Open Data is a platform to collect, display and query the results of hardware and software performance tests - provided by the public.

opendata.blender.org

I have to say that I'm fairly impressed that an integrated GPU can be about as fast as an RTX 3090 as the case is with the M4 Max! (there are 11 numbers online now, not just 1)

Also, I did have an RTX 3090 but moved up to wonderful RTX 4090, so I'm not saying this as an Apple Fan boi.

mi7chy · Nov 10, 2024

TSMC 3nm is pretty amazing compared to Samsung 8nm on the 3090. Hopefully, Nvidia 5000 series will be on 3nm for fairer comparison.

leman · Nov 10, 2024

Macintosh IIcx said:
I have to say that I'm fairly impressed that an integrated GPU can be about as fast as an RTX 3090 as the case is with the M4 Max! (there are 11 numbers online now, not just 1)

What’s so surprising about it? A large GPU is a large GPU. Fast unified memory is simply more expensive, which is why nobody except Apple bothers with it on high end. What I find more impressive is that M4 Max manages to be competitive against 3090 despite featuring considerably fewer shader cores. Apples hardware utilization is something else.

aeronatis · Nov 10, 2024

M4pro said:
Uh huh, how would you say the resulting image quality is using Apple Silicon GPU denoising in Cycles?

I’d have to go back to look at some images, but the way I remember things, the quality of results I was getting from Cycles denoising (Metal) on an Intel CPU / AMD GPU Mac was not bad.

I have to do them again to check, as I rendered on Mac without denoising realizing it did not shorten the render time at all.

Macintosh IIcx · Nov 10, 2024

leman said:
What’s so surprising about it? A large GPU is a large GPU. Fast unified memory is simply more expensive, which is why nobody except Apple bothers with it on high end. What I find more impressive is that M4 Max manages to be competitive against 3090 despite featuring considerably fewer shader cores. Apples hardware utilization is something else.

Sounds like we are more in agreement than you think. How can it not be impressive that a SoC with CPU, NPU, GPU and what not integrated into one chip can compete with a dedicated GPU card with a die size of 628mm2 and roughly 28 billion transistors?

I for one didn’t see that coming, we are talking about competing with nvidia here, not some random wanna-be chip manufacturer.

Macintosh IIcx · Nov 10, 2024

I just realized that I don’t think I have seen any information about how many transistors that is used to build the M4 Max chip.

leman · Nov 10, 2024

Macintosh IIcx said:
I just realized that I don’t think I have seen any information about how many transistors that is used to build the M4 Max chip.

Probably around 100 billion, if not more.

singhs.apps · Nov 12, 2024

jujoje said:
Slightly OT, but might be of interest to a few people on this thread, as they've used it in the past; looks like Modo is being discontinued: Foundry announces strategic decision to wind down development of Modo

I think the last version I used was 501 or 601; was really great for modeling and rendering, although lost it's way after the foundry purchased it, and Brad Peebler left (still wonder what he's up to at Apple).

The MODO Situation...

Initial reaction to Foundry's decision to EOL (End of Life) Modo.

www.youtube.com

Homy · Nov 12, 2024

UE 5.4.4 running on the base Mac Mini M4. Lumen and Nanite work.

crazy dave · Nov 12, 2024

Macintosh IIcx said:
Sounds like we are more in agreement than you think. How can it not be impressive that a SoC with CPU, NPU, GPU and what not integrated into one chip can compete with a dedicated GPU card with a die size of 628mm2 and roughly 28 billion transistors?

I for one didn’t see that coming, we are talking about competing with nvidia here, not some random wanna-be chip manufacturer.

Macintosh IIcx said:
I just realized that I don’t think I have seen any information about how many transistors that is used to build the M4 Max chip.

leman said:
Probably around 100 billion, if not more.

The die area devoted to the GPU is sometimes why I've seen people, only half-jokingly, refer to the Max and Ultra as a GPU with an integrated CPU

.

The M3 Max has about 92 billion transistors and the GPU is roughly 36% of that, for about 33 billion transistors. The M4 Max is likely bigger and thus the GPU, in terms of raw transistor count, is likely quite a bit bigger than the 3090 (of course how you use those transistors is even more important).

Annotated Apple M3 Processor Die Shots Bring Chip Designs to Life

Colorful highlights and annotations help visualize the family.

www.tomshardware.com

Don't get me wrong, it's still very impressive! The 3090 packs more SMs at similar/slightly higher clocks (and thus has more raw TFLOPs).

novagamer · Nov 12, 2024

Homy said:
UE 5.4.4 running on the base Mac Mini M4. Lumen and Nanite work.

Did they add HWRT support finally? Their developer forum says not but that was a few weeks ago, and Nanite wasn’t mentioned then either.

3D Rendering on Apple Silicon, CPU&GPU

macrumors 68000

macrumors regular

macrumors regular

macrumors regular

macrumors regular

macrumors 6502

macrumors 68030

Suspended

macrumors 6502

macrumors regular

macrumors 68030

macrumors 6502a

macrumors 603

macrumors 6502

macrumors 6502a

Suspended

macrumors Core

macrumors regular

macrumors 6502a

macrumors 6502a

macrumors Core

macrumors 6502a

macrumors 68030

macrumors 68000

macrumors 6502

Our Staff