Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

leman

macrumors Core
Oct 14, 2008
19,319
19,336
I think GPU is getting pretty good for M-series but still no near close NV especialy in desktop format.

Nvidia has massive advantage in the number of compute clusters thanks to their separate chip design + more compact hierarchy. Apple has better hardware utilization. I think if Apple wants to compete they will need to invest in more expensive packaging technologies (which wouldn’t be cost effective for Nvidia) as well as make their clusters denser. I think there is some evidence that they are building the fundament for future iterations.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,529
955
The Nvidia 4x series introduced the parallel FP pipeline, effectively doubling peak FP FLOPS, along with significant improvements to ray tracing.
Does Blender take advantage of it? Nvidia GPU scores use CUDA, not OptiX.
 

leman

macrumors Core
Oct 14, 2008
19,319
19,336
Does Blender take advantage of it? Nvidia GPU scores use CUDA, not OptiX.

Anything that uses floating point operations takes advantage of it. Blender supports both Optix and CUDA backends. Here you can inspect the scores by backend type:


 

APCX

Suspended
Sep 19, 2023
262
337
Anything that uses floating point operations takes advantage of it. Blender supports both Optix and CUDA backends. Here you can inspect the scores by backend type:


What it is interesting (to me anyway) is how little increase some of the Nvidia cards get from Optix. The 4090 only gets 50%. I thought it was much more.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,529
955
Anything that uses floating point operations takes advantage of it. Blender supports both Optix and CUDA backends. Here you can inspect the scores by backend type
To claim "M2 Ultra is faster than 3080 and a hair away from 4070", @bcortens use the scores of M2 Ultra using Metal and 3080 using CUDA. So, I understood that @komuh was asking if comparing M2 Ultra using Metal and 3080 using CUDA was fair or not.

Given Nvidia's focus on Blender's OptiX backend since 2019, I doubt Blender's CUDA backend will take good advantage of Nvidia's latest GPUs.
 

leman

macrumors Core
Oct 14, 2008
19,319
19,336
To claim "M2 Ultra is faster than 3080 and a hair away from 4070", @bcortens use the scores of M2 Ultra using Metal and 3080 using CUDA. So, I understood that @komuh was asking if comparing M2 Ultra using Metal and 3080 using CUDA was fair or not.

Given Nvidia's focus on Blender's OptiX backend since 2019, I doubt Blender's CUDA backend will take good advantage of Nvidia's latest GPUs.

Ah, I see. Thanks for explaining the background better.

I’d say that the CUDA backend is mature enough to be used for comparisons like these. I would be surprised if it needed patches to take better advantage of the new GPUs. Note that Metal and CUDA use pretty much the same codebase too.
 

bcortens

macrumors 65816
Aug 16, 2007
1,294
1,671
Ontario Canada
Ah, I see. Thanks for explaining the background better.

I’d say that the CUDA backend is mature enough to be used for comparisons like these. I would be surprised if it needed patches to take better advantage of the new GPUs. Note that Metal and CUDA use pretty much the same codebase too.
I chose CUDA as my intention was to show raw compute capability comparisons, rather than have that masked by the fact that Apple didn’t yet have RT cores and NVIDIA did.
 
  • Like
Reactions: leman

dadadadaelel

macrumors newbie
Jan 22, 2024
4
0
Any Idea of dis/advantage of these M3/128GB/4TB vs NVIDIA 4090 Laptop thaaat goes max 36GB eventually 64 GB RAM when it comes to kohya GUI / Dreambooth image based machine learning ???
 

dadadadaelel

macrumors newbie
Jan 22, 2024
4
0
To OP question is NO, M3 Max will not be as fast as the 4090 laptop
On CUDA applications that nvidia probably will have even an larger edge
Do you mean larger limit or only speed/ what about model training with Dreambooth? If you do upgrade on 128 GB RAM
 

Sydde

macrumors 68030
Aug 17, 2009
2,557
7,059
IOKWARDI
A 16" M3 MBP with a top-of-the-line Max and 64Gb will cost pretty close to a pair of 24Gb desktop 4090s. In most cases, though it would take some extra effort, the kind of stuff that GPUs do will run just fine split between two cards, so it would probably be significantly faster, most of the time. Cards often use inboard RAM that is faster than main board RAM.

The open question is, what if you really do need that extra memory? A 48Gb MBP would give the GPU 35~40Gb of RAM to play with, which is quite a lot more than the card. If you cannot afford two cards, or your work fails to divide nicely, I would imagine that the slower UMA would be more efficient (and ultimately faster) than GDDRx embroiled in a starvation swapfest.
 

dadadadaelel

macrumors newbie
Jan 22, 2024
4
0
Ok… my main fear was disadvantage of lack of access to computing platform CUDA .. 128 RAM is insane advantage for AI model training in a longer term…stable diffusion… upscale etc. and for unified memory with ambition to take a part in AI machine learning ( dreambooth etc.) it is playing quite a role …40 core GPU deciding… beside i do simultan run of couple of platforms as Rhinoceros/ UnrealE or Cinema and Photoshop … and I am usually not changing Laptop that often..
 

MacMore

Suspended
Jan 4, 2024
33
13
Good rule of thumb is about 20-40% performance gain just from good CUDA code vs OpenCL last time i wrote smth there. And even 50%+ using cublas/cudnn and other nvidia stuff baked in NV ecosystem for more complex workloads.

Im not sure about whole GeekBench situation but 15-20% should be minimum gain from same quality CUDA code compared to OpenCL, so score could be around 200 000 - 240 000 for mobile RTX 4090.
Yeah, the one problem is that OpenCL is actually open. Cuda is great, but it's not open...

If people would use open standards code could also be system agnostic....

In hacking, I believe in system agnostics and that things should run on as many platforms as possible.
 

mr_roboto

macrumors 6502a
Sep 30, 2020
777
1,668
Yeah, the one problem is that OpenCL is actually open. Cuda is great, but it's not open...

If people would use open standards code could also be system agnostic....

In hacking, I believe in system agnostics and that things should run on as many platforms as possible.
That's fine, but the reality is that OpenCL lost badly to CUDA, so it's not very relevant. Even Apple thinks so - despite being the company which invented OpenCL and donated it to Khronos to use as an open standard, they've deprecated OpenCL in macOS. "Deprecated" status is how Apple communicates that they plan to remove an API at some point in the future.

Much like OpenGL, OpenCL seems to be sticking around for years past the date Apple tagged it as deprecated, but you can't count on it being there forever, and they aren't updating these to keep up with new features released by Khronos.
 

ChrisA

macrumors G5
Jan 5, 2006
12,610
1,746
Redondo Beach, California
I wonder if the 40-core M3 Max GPU will be as fast as the 4090 Laptop.

It all depends on the software.

I spend some hours every day using 3D CAD. Autodesk Fusion360 now runs native on Apple Silicon but they don't directly use Apple's APIs. They are doing their rendering using single-threaded OpenGL. I just did a ray-trace render of a new 3D printer I'm working on so I could get some comments. It took 6 minutes for one frame.

So we are not taking frames per second, but minutes per frame on an M2-Pro. I looked at Apple's "Activity Monitor" app and noticed that Fusion360 is using one of the 16 GPU cores and all ten CPU cores.

I don't mind much because I only do a few renders a week and I can do other things while I wait.

Only benchmarks take full advantage of the hardware. Most software doesn't. Fusion360 is an extreme example.
 

sunny5

macrumors 68000
Jun 11, 2021
1,712
1,581
I wonder if the 40-core M3 Max GPU will be as fast as the 4090 Laptop.

So far, there aren't many software to compare and even then, they are all fundermantally optimized with CUDA and Nvidia software as Nvidia dominates the GPU market. AMD sucks and not even close to Nvidia to compete especially with software.

With GPU benchmarks, RTX 40 series are far better but at this point, it's hard to prove it since Mac lacks a lot of software to test with compared to PC especially native versions as almost all software are optimized for Nvidia.

Apple has no power to bring many software to Mac which is a huge problem. Beside, AS Mac's limitation is too low due to lack of Mac Pro with 4x high-end GPU. Mac's software are too focused on 2D based software such as video, audio, photo, illustrate, and more. 3D, games, AI, research, and others aren't great on Mac.
 

komuh

macrumors member
May 13, 2023
39
10
So we are not taking frames per second, but minutes per frame on an M2-Pro. I looked at Apple's "Activity Monitor" app and noticed that Fusion360 is using one of the 16 GPU cores and all ten CPU cores.
So they probably don't use GPU for anything other then displaying stuff.
 
  • Like
Reactions: leman

magbarn

macrumors 68030
Oct 25, 2008
2,970
2,274
Not on handbrake H265 encode, I am amazed that using the Apple Videotoolbox encoder beats NVEC on the Lenovo Legion Pro 7i 13900HX and 4090.
Is the compression level and quality equivalent? I'm still using software encoding on my M3Max as I'm still disappointed in video toolbox compression level/quality ratio. Software encoding, while slower, is still much better quality wise and final file size.
 
  • Like
Reactions: komuh

sauria

macrumors 6502
Jul 2, 2001
319
31
Texas, USA
Is the compression level and quality equivalent? I'm still using software encoding on my M3Max as I'm still disappointed in video toolbox compression level/quality ratio. Software encoding, while slower, is still much better quality wise and final file size.
Yes, I had to find the proper way to select the video toolbox. It’s in a different place than just pulling down the menu for the video encoder. If I just pull down the video encoder menu, the quality is lower, but if you also select it in the preset => hardware list, it is quite high.
 
Last edited:

Kristain

macrumors member
Feb 15, 2022
32
45
Is the compression level and quality equivalent? I'm still using software encoding on my M3Max as I'm still disappointed in video toolbox compression level/quality ratio. Software encoding, while slower, is still much better quality wise and final file size.
Nvidias encode quality is sadly far superior to Apple's, even at lower bitrates than Apple.
 
  • Like
Reactions: komuh

Kristain

macrumors member
Feb 15, 2022
32
45
I don't see that. Intel QSV > Nvidia, too.
Interesting, I found similar results to the people commenting here:


I've had a lot of strange artefacts exporting on my M1 machine that my Nvidia machine just doesn't show. I still use it for quick exports etc. but find the quality definitely lacking (and I've tried multiple settings!). Oh and agreed, Intels implementation is excellent.
 
  • Like
Reactions: komuh and sauria

sauria

macrumors 6502
Jul 2, 2001
319
31
Texas, USA
Interesting, I found similar results to the people commenting here:


I've had a lot of strange artefacts exporting on my M1 machine that my Nvidia machine just doesn't show. I still use it for quick exports etc. but find the quality definitely lacking (and I've tried multiple settings!). Oh and agreed, Intels implementation is excellent.
Thanks. I wonder if the new M3 max has fixed some of these things? Part of me thinks this is a setting issue.
 

poorcody

macrumors 65816
Jul 23, 2013
1,319
1,557
For ML developers Alex Ziskind has done some testing comparing M3 against Nvidia (laptop and not) and other Apple chips:

 
  • Like
Reactions: sauria
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.