Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
What would that be used for? Denoising? Are tensor cores currently used for denoising?
Intel Open Image Denoise library uses tensor cores.
Intel Open Image Denoise exploits modern instruction sets like SSE4, AVX2, AVX-512, and NEON on CPUs, Intel® Xe Matrix Extensions (Intel® XMX) on Intel GPUs, and tensor cores on NVIDIA GPUs to achieve high denoising performance.

It also works on the Apple Silicon GPU, but not on the NPU.
Intel Open Image Denoise supports a wide variety of CPUs and GPUs from different vendors:
  • ARM64 (AArch64) architecture CPUs (e.g. Apple silicon CPUs)
  • Apple silicon GPUs (M1 and newer)
 
  • Like
Reactions: aeronatis

aeronatis

macrumors regular
Sep 9, 2015
198
152
What would that be used for? Denoising? Are tensor cores currently used for denoising?

I would think so, yes. OptiX is using the Tensor cores in a similar way to DLSS. Right now, choosing OptiX as denoiser almost halves the duration of the render on Nvidia card; however, Apple Silicon does not have anything corresponding to that. Anyway, I still think it is quite good for M3 Max to be similar to RTX 4070 Laptop with OptiX enabled, especially considering the overall app experience is much more stable on M3 Max.
 

M4pro

macrumors member
May 15, 2024
67
109
It’s only for one scene (and Blender benchmark scenes vary wildly in complexity) - still this cross-platform chart of denoising times is interesting to me.

1731084577100.png



Perhaps unbinned M4 Max GPU will approach the M2 Ultra GPU denoising time results (a fraction of a second when rendered at 2000 x 1000 pixels)
 
Last edited:
  • Like
Reactions: komuh

aeronatis

macrumors regular
Sep 9, 2015
198
152
It’s only for one scene (and Blender benchmark scenes vary wildly in complexity) - still this cross-platform chart of denoising times is interesting to me.

View attachment 2448547


Perhaps unbinned M4 Max GPU will approach the M2 Ultra GPU results (a fraction of a second when rendered at 2000 x 1000 pixels)

M3 Max and M4 Max have hardware RT, which dramatically increases the performance for Blender. Let me share the same scene I mentioned above (Scanlands) for:

M1 Max: 06:55
M2 Max: 04:09
M3 Max: 01:04

My RTX 4080 Desktop finishes the scene in 00:28 with OptiX and in 01:16 without OptiX, which means M4 Max gets roughly between RTX 4080 Desktop CUDA and RTX 4080 Desktop OptiX if it completes the scene around %20 faster than my M3 Max.
 
Last edited:
  • Like
Reactions: Homy

M4pro

macrumors member
May 15, 2024
67
109
M3 Max and M4 Max have hardware RT, which dramatically increases the performance for Blender. Let me share the same scene I mentioned above (Scanlands) for:

M1 Max: 06:55
M2 Max: 04:09
M3 Max: 01:04

My RTX 4080 Desktop finishes the scene in 00:28 with OptiX and in 01:16 without OptiX, which means M4 Max gets roughly between RTX 4080 Desktop CUDA and RTX 4080 Desktop OptiX if it completes the scene around %20 faster than my M3 Max.


If M4 Max completes this scene around 00:50, it means it is already close to
Uh huh, how would you say the resulting image quality is using Apple Silicon GPU denoising in Cycles?

I’d have to go back to look at some images, but the way I remember things, the quality of results I was getting from Cycles denoising (Metal) on an Intel CPU / AMD GPU Mac was not bad.
 
Last edited:

jujoje

macrumors regular
May 17, 2009
247
288
Cycles denoising (Metal)
Isn't cycles denoising either Nvidia denoiser or Intel's? Or is their an actual cycles only Denoiser now?

If it's the former, quality wise, Nvidia is fast but bad quality, OIDN is reasonable quality, but can struggle with high frequency detail and hair. All of the denoisers suck at volumes as far as I can tell (pretty much just blur it; first thing comp does anyway so 🤷‍♂️).

From what I've heard the Renderman denoiser is a long way ahead of the other options particularly where animation is concerned (it's much more temporally consistent).
 
  • Like
Reactions: M4pro

jujoje

macrumors regular
May 17, 2009
247
288
Why such short render? Are they afraid of posting completion time for something longer like Barbershop?
I agree, the bmw scene is not the best. Would be nice to get the Moana island, or Alabs scene, but sadly youtuber's want something that is fast to benchmark, not something representative of production use.

Pretty sure it would not give you the result you want, given that the last result was from back in the M1 days (Redshift):

3090 = 21m:45s
M1 Max = 28m:27s

The 4090 would be a bit fair bit faster (more memory and speed), but I'd be willing to bet that the M4 Max would be pretty close to, or beat it, now it has hw raytracing. But I guess we'll never know, as no one benchmarks those kinds of scenes.
 
  • Like
Reactions: M4pro

M4pro

macrumors member
May 15, 2024
67
109
Isn't cycles denoising either Nvidia denoiser or Intel's? Or is their an actual cycles only Denoiser now?

If it's the former, quality wise, Nvidia is fast but bad quality, OIDN is reasonable quality, but can struggle with high frequency detail and hair. All of the denoisers suck at volumes as far as I can tell (pretty much just blur it; first thing comp does anyway so 🤷‍♂️).

From what I've heard the Renderman denoiser is a long way ahead of the other options particularly where animation is concerned (it's much more temporally consistent).
For Metal, Cycles uses Intel’s denoiser - but now it’s the newer 2024 version supporting Apple Silicon.
 
  • Like
Reactions: Xiao_Xi

Homy

macrumors 68030
Jan 14, 2006
2,502
2,450
Sweden
I agree, the bmw scene is not the best. Would be nice to get the Moana island, or Alabs scene, but sadly youtuber's want something that is fast to benchmark, not something representative of production use.

Pretty sure it would not give you the result you want, given that the last result was from back in the M1 days (Redshift):

3090 = 21m:45s
M1 Max = 28m:27s

The 4090 would be a bit fair bit faster (more memory and speed), but I'd be willing to bet that the M4 Max would be pretty close to, or beat it, now it has hw raytracing. But I guess we'll never know, as no one benchmarks those kinds of scenes.

M3 Max is 2.4 - 2.8x faster than M1 Max.

 

sirio76

macrumors 6502a
Mar 28, 2013
578
416
But like you said, the Blender Benchmark sums for all the scenes, and the score database on their site doesn't list the individual scene scores. The fact that Apple isn't sharing the full result so that one can easily compare to the PC scores in the result browser speaks volumes, I would assume.

EDIT: I just ran the Blender Benchmark to see the individual scene scores on my 4090...just as I expected.

Monster:

M4 Max: 237
Nvidia 4090: 5,393

Junk Shop:
M4 Max: 152
Nvidia 4090: 2376

Classroom:
M4 Max: 102
Nvidia 4090: 2621

That is actually shockingly bad performance on the M4 Max in comparison to the Nvidia hardware, TBH. The M4 Max, according to that benchmark, is only eeking out ~4% of the performance of a 4090...
edit.. I just saw other already noted that.
I think you made some confusion, according to the results from top of the page the M4 max reach about half of your 4090 performance:
https://forums.macrumors.com/proxy.php?image=https%3A%2F%2Fpbs.twimg.com%2Fmedia%2FGbyeTQhWwAMOmaf%3Fformat%3Djpg%26name%3D4096x4096&hash=219e21d7d664a6b980c59b33412b2cb4
 
Last edited:

Pressure

macrumors 603
May 30, 2006
5,178
1,544
Denmark
I agree, the bmw scene is not the best. Would be nice to get the Moana island, or Alabs scene, but sadly youtuber's want something that is fast to benchmark, not something representative of production use.

Pretty sure it would not give you the result you want, given that the last result was from back in the M1 days (Redshift):

3090 = 21m:45s
M1 Max = 28m:27s

The 4090 would be a bit fair bit faster (more memory and speed), but I'd be willing to bet that the M4 Max would be pretty close to, or beat it, now it has hw raytracing. But I guess we'll never know, as no one benchmarks those kinds of scenes.
Just revisited that Redshift forum thread and it was kind of sad going down memory lane and read them lusting for the M3 Ultra and now we are at M4 already and still no new Mac Studio or Mac Pro.
 

jujoje

macrumors regular
May 17, 2009
247
288
Just revisited that Redshift forum thread and it was kind of sad going down memory lane and read them lusting for the M3 Ultra and now we are at M4 already and still no new Mac Studio or Mac Pro.

I'm just hoping that the reason it's been so delayed is they've got something a bit more exciting that two Max's glued together (Gurman's Hidra rumors).

Not entirely counting on it though; had hoped that moving to AS would stabilize Apple's release schedule for the high end, but still seems to be as undependable as always (don't mind if it's not yearly, but at least have some kind of roadmap for the pro market).
 

Macintosh IIcx

macrumors 6502a
Jul 3, 2014
625
612
Denmark
Device NameMedian ScoreNumber of Benchmarks
NVIDIA GeForce RTX 409010885.57635
NVIDIA GeForce RTX 30905333.05284
NVIDIA GeForce RTX 40705126.74281
Apple M4 Max (GPU - 40 cores)5083.331
Apple M3 Max (GPU - 40 cores)4257.4686
NVIDIA GeForce RTX 4070 Laptop GPU3457.44260
Apple M4 Pro (GPU - 20 cores)2530.476
Apple M3 Pro (GPU - 18 cores)1768.6443
I have to say that I'm fairly impressed that an integrated GPU can be about as fast as an RTX 3090 as the case is with the M4 Max! (there are 11 numbers online now, not just 1)

Also, I did have an RTX 3090 but moved up to wonderful RTX 4090, so I'm not saying this as an Apple Fan boi.
 
  • Like
Reactions: jujoje and Homy

mi7chy

macrumors G4
Oct 24, 2014
10,619
11,292
TSMC 3nm is pretty amazing compared to Samsung 8nm on the 3090. Hopefully, Nvidia 5000 series will be on 3nm for fairer comparison.
 

leman

macrumors Core
Oct 14, 2008
19,516
19,662
I have to say that I'm fairly impressed that an integrated GPU can be about as fast as an RTX 3090 as the case is with the M4 Max! (there are 11 numbers online now, not just 1)

What’s so surprising about it? A large GPU is a large GPU. Fast unified memory is simply more expensive, which is why nobody except Apple bothers with it on high end. What I find more impressive is that M4 Max manages to be competitive against 3090 despite featuring considerably fewer shader cores. Apples hardware utilization is something else.
 
  • Like
Reactions: name99 and Homy

aeronatis

macrumors regular
Sep 9, 2015
198
152
Uh huh, how would you say the resulting image quality is using Apple Silicon GPU denoising in Cycles?

I’d have to go back to look at some images, but the way I remember things, the quality of results I was getting from Cycles denoising (Metal) on an Intel CPU / AMD GPU Mac was not bad.

I have to do them again to check, as I rendered on Mac without denoising realizing it did not shorten the render time at all.
 

Macintosh IIcx

macrumors 6502a
Jul 3, 2014
625
612
Denmark
What’s so surprising about it? A large GPU is a large GPU. Fast unified memory is simply more expensive, which is why nobody except Apple bothers with it on high end. What I find more impressive is that M4 Max manages to be competitive against 3090 despite featuring considerably fewer shader cores. Apples hardware utilization is something else.
Sounds like we are more in agreement than you think. How can it not be impressive that a SoC with CPU, NPU, GPU and what not integrated into one chip can compete with a dedicated GPU card with a die size of 628mm2 and roughly 28 billion transistors? :)

I for one didn’t see that coming, we are talking about competing with nvidia here, not some random wanna-be chip manufacturer.
 
  • Like
Reactions: name99 and Homy

singhs.apps

macrumors 6502a
Oct 27, 2016
660
400
Slightly OT, but might be of interest to a few people on this thread, as they've used it in the past; looks like Modo is being discontinued: Foundry announces strategic decision to wind down development of Modo

I think the last version I used was 501 or 601; was really great for modeling and rendering, although lost it's way after the foundry purchased it, and Brad Peebler left (still wonder what he's up to at Apple).
 

crazy dave

macrumors 65816
Sep 9, 2010
1,450
1,219
Sounds like we are more in agreement than you think. How can it not be impressive that a SoC with CPU, NPU, GPU and what not integrated into one chip can compete with a dedicated GPU card with a die size of 628mm2 and roughly 28 billion transistors? :)

I for one didn’t see that coming, we are talking about competing with nvidia here, not some random wanna-be chip manufacturer.

I just realized that I don’t think I have seen any information about how many transistors that is used to build the M4 Max chip.

Probably around 100 billion, if not more.
The die area devoted to the GPU is sometimes why I've seen people, only half-jokingly, refer to the Max and Ultra as a GPU with an integrated CPU ;).

The M3 Max has about 92 billion transistors and the GPU is roughly 36% of that, for about 33 billion transistors. The M4 Max is likely bigger and thus the GPU, in terms of raw transistor count, is likely quite a bit bigger than the 3090 (of course how you use those transistors is even more important).


Don't get me wrong, it's still very impressive! The 3090 packs more SMs at similar/slightly higher clocks (and thus has more raw TFLOPs).
 
Last edited:
  • Like
Reactions: komuh and M4pro

novagamer

macrumors regular
May 13, 2006
231
312
UE 5.4.4 running on the base Mac Mini M4. Lumen and Nanite work.

Did they add HWRT support finally? Their developer forum says not but that was a few weeks ago, and Nanite wasn’t mentioned then either.
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.