M3 Max GPU vs NVIDIA RTX 4090 Laptop

leman · Nov 3, 2023

komuh said:
I think GPU is getting pretty good for M-series but still no near close NV especialy in desktop format.

Nvidia has massive advantage in the number of compute clusters thanks to their separate chip design + more compact hierarchy. Apple has better hardware utilization. I think if Apple wants to compete they will need to invest in more expensive packaging technologies (which wouldn’t be cost effective for Nvidia) as well as make their clusters denser. I think there is some evidence that they are building the fundament for future iterations.

Xiao_Xi · Nov 3, 2023

leman said:
The Nvidia 4x series introduced the parallel FP pipeline, effectively doubling peak FP FLOPS, along with significant improvements to ray tracing.

Does Blender take advantage of it? Nvidia GPU scores use CUDA, not OptiX.

leman · Nov 3, 2023

Xiao_Xi said:
Does Blender take advantage of it? Nvidia GPU scores use CUDA, not OptiX.

Anything that uses floating point operations takes advantage of it. Blender supports both Optix and CUDA backends. Here you can inspect the scores by backend type:

Blender - Open Data

Blender Open Data is a platform to collect, display and query the results of hardware and software performance tests - provided by the public.

opendata.blender.org

APCX · Nov 3, 2023

leman said:
Anything that uses floating point operations takes advantage of it. Blender supports both Optix and CUDA backends. Here you can inspect the scores by backend type:

Blender - Open Data

Blender Open Data is a platform to collect, display and query the results of hardware and software performance tests - provided by the public.

opendata.blender.org

What it is interesting (to me anyway) is how little increase some of the Nvidia cards get from Optix. The 4090 only gets 50%. I thought it was much more.

neinjohn · Nov 3, 2023

komuh said:
3090 is slower than 4080 laptop version?

The 4080 laptop version is thereabout with the desktop 3090 at games too (https://www.notebookcheck.net/Nvidi...erformance-and-power-efficiency.691313.0.html),

Ada generation is a big improvement on Ampere at anything above 4070 level.

Xiao_Xi · Nov 3, 2023

leman said:
Anything that uses floating point operations takes advantage of it. Blender supports both Optix and CUDA backends. Here you can inspect the scores by backend type

To claim "M2 Ultra is faster than 3080 and a hair away from 4070", @bcortens use the scores of M2 Ultra using Metal and 3080 using CUDA. So, I understood that @komuh was asking if comparing M2 Ultra using Metal and 3080 using CUDA was fair or not.

Given Nvidia's focus on Blender's OptiX backend since 2019, I doubt Blender's CUDA backend will take good advantage of Nvidia's latest GPUs.

leman · Nov 3, 2023

Xiao_Xi said:
To claim "M2 Ultra is faster than 3080 and a hair away from 4070", @bcortens use the scores of M2 Ultra using Metal and 3080 using CUDA. So, I understood that @komuh was asking if comparing M2 Ultra using Metal and 3080 using CUDA was fair or not.

Given Nvidia's focus on Blender's OptiX backend since 2019, I doubt Blender's CUDA backend will take good advantage of Nvidia's latest GPUs.

Ah, I see. Thanks for explaining the background better.

I’d say that the CUDA backend is mature enough to be used for comparisons like these. I would be surprised if it needed patches to take better advantage of the new GPUs. Note that Metal and CUDA use pretty much the same codebase too.

bcortens · Nov 3, 2023

leman said:
Ah, I see. Thanks for explaining the background better.

I’d say that the CUDA backend is mature enough to be used for comparisons like these. I would be surprised if it needed patches to take better advantage of the new GPUs. Note that Metal and CUDA use pretty much the same codebase too.

I chose CUDA as my intention was to show raw compute capability comparisons, rather than have that masked by the fact that Apple didn’t yet have RT cores and NVIDIA did.

dadadadaelel · Jan 30, 2024

Any Idea of dis/advantage of these M3/128GB/4TB vs NVIDIA 4090 Laptop thaaat goes max 36GB eventually 64 GB RAM when it comes to kohya GUI / Dreambooth image based machine learning ???

dadadadaelel · Jan 30, 2024

MayaUser said:
To OP question is NO, M3 Max will not be as fast as the 4090 laptop
On CUDA applications that nvidia probably will have even an larger edge

Do you mean larger limit or only speed/ what about model training with Dreambooth? If you do upgrade on 128 GB RAM

Sydde · Jan 30, 2024

A 16" M3 MBP with a top-of-the-line Max and 64Gb will cost pretty close to a pair of 24Gb desktop 4090s. In most cases, though it would take some extra effort, the kind of stuff that GPUs do will run just fine split between two cards, so it would probably be significantly faster, most of the time. Cards often use inboard RAM that is faster than main board RAM.

The open question is, what if you really do need that extra memory? A 48Gb MBP would give the GPU 35~40Gb of RAM to play with, which is quite a lot more than the card. If you cannot afford two cards, or your work fails to divide nicely, I would imagine that the slower UMA would be more efficient (and ultimately faster) than GDDRx embroiled in a starvation swapfest.

dadadadaelel · Jan 30, 2024

Ok… my main fear was disadvantage of lack of access to computing platform CUDA .. 128 RAM is insane advantage for AI model training in a longer term…stable diffusion… upscale etc. and for unified memory with ambition to take a part in AI machine learning ( dreambooth etc.) it is playing quite a role …40 core GPU deciding… beside i do simultan run of couple of platforms as Rhinoceros/ UnrealE or Cinema and Photoshop … and I am usually not changing Laptop that often..

MacMore · Feb 1, 2024

komuh said:
Good rule of thumb is about 20-40% performance gain just from good CUDA code vs OpenCL last time i wrote smth there. And even 50%+ using cublas/cudnn and other nvidia stuff baked in NV ecosystem for more complex workloads.

Im not sure about whole GeekBench situation but 15-20% should be minimum gain from same quality CUDA code compared to OpenCL, so score could be around 200 000 - 240 000 for mobile RTX 4090.

Yeah, the one problem is that OpenCL is actually open. Cuda is great, but it's not open...

If people would use open standards code could also be system agnostic....

In hacking, I believe in system agnostics and that things should run on as many platforms as possible.

mr_roboto · Feb 1, 2024

MacMore said:
Yeah, the one problem is that OpenCL is actually open. Cuda is great, but it's not open...

If people would use open standards code could also be system agnostic....

In hacking, I believe in system agnostics and that things should run on as many platforms as possible.

That's fine, but the reality is that OpenCL lost badly to CUDA, so it's not very relevant. Even Apple thinks so - despite being the company which invented OpenCL and donated it to Khronos to use as an open standard, they've deprecated OpenCL in macOS. "Deprecated" status is how Apple communicates that they plan to remove an API at some point in the future.

Much like OpenGL, OpenCL seems to be sticking around for years past the date Apple tagged it as deprecated, but you can't count on it being there forever, and they aren't updating these to keep up with new features released by Khronos.

ChrisA · Feb 1, 2024

AnonFr13n45 said:
I wonder if the 40-core M3 Max GPU will be as fast as the 4090 Laptop.

It all depends on the software.

I spend some hours every day using 3D CAD. Autodesk Fusion360 now runs native on Apple Silicon but they don't directly use Apple's APIs. They are doing their rendering using single-threaded OpenGL. I just did a ray-trace render of a new 3D printer I'm working on so I could get some comments. It took 6 minutes for one frame.

So we are not taking frames per second, but minutes per frame on an M2-Pro. I looked at Apple's "Activity Monitor" app and noticed that Fusion360 is using one of the 16 GPU cores and all ten CPU cores.

I don't mind much because I only do a few renders a week and I can do other things while I wait.

Only benchmarks take full advantage of the hardware. Most software doesn't. Fusion360 is an extreme example.

sunny5 · Feb 1, 2024

AnonFr13n45 said:
I wonder if the 40-core M3 Max GPU will be as fast as the 4090 Laptop.

So far, there aren't many software to compare and even then, they are all fundermantally optimized with CUDA and Nvidia software as Nvidia dominates the GPU market. AMD sucks and not even close to Nvidia to compete especially with software.

With GPU benchmarks, RTX 40 series are far better but at this point, it's hard to prove it since Mac lacks a lot of software to test with compared to PC especially native versions as almost all software are optimized for Nvidia.

Apple has no power to bring many software to Mac which is a huge problem. Beside, AS Mac's limitation is too low due to lack of Mac Pro with 4x high-end GPU. Mac's software are too focused on 2D based software such as video, audio, photo, illustrate, and more. 3D, games, AI, research, and others aren't great on Mac.

komuh · Feb 16, 2024

ChrisA said:
So we are not taking frames per second, but minutes per frame on an M2-Pro. I looked at Apple's "Activity Monitor" app and noticed that Fusion360 is using one of the 16 GPU cores and all ten CPU cores.

So they probably don't use GPU for anything other then displaying stuff.

sauria · Feb 17, 2024

AnonFr13n45 said:
I wonder if the 40-core M3 Max GPU will be as fast as the 4090 Laptop.

Not on handbrake H265 encode, I am amazed that using the Apple Videotoolbox encoder beats NVEC on the Lenovo Legion Pro 7i 13900HX and 4090.

magbarn · Feb 17, 2024

sauria said:
Not on handbrake H265 encode, I am amazed that using the Apple Videotoolbox encoder beats NVEC on the Lenovo Legion Pro 7i 13900HX and 4090.

Is the compression level and quality equivalent? I'm still using software encoding on my M3Max as I'm still disappointed in video toolbox compression level/quality ratio. Software encoding, while slower, is still much better quality wise and final file size.

sauria · Feb 18, 2024

magbarn said:
Is the compression level and quality equivalent? I'm still using software encoding on my M3Max as I'm still disappointed in video toolbox compression level/quality ratio. Software encoding, while slower, is still much better quality wise and final file size.

Yes, I had to find the proper way to select the video toolbox. It’s in a different place than just pulling down the menu for the video encoder. If I just pull down the video encoder menu, the quality is lower, but if you also select it in the preset => hardware list, it is quite high.

Kristain · Feb 18, 2024

magbarn said:
Is the compression level and quality equivalent? I'm still using software encoding on my M3Max as I'm still disappointed in video toolbox compression level/quality ratio. Software encoding, while slower, is still much better quality wise and final file size.

Nvidias encode quality is sadly far superior to Apple's, even at lower bitrates than Apple.

sauria · Feb 18, 2024

Kristain said:
Nvidias encode quality is sadly far superior to Apple's, even at lower bitrates than Apple.

I don't see that. Intel QSV > Nvidia, too.

Kristain · Feb 18, 2024

sauria said:
I don't see that. Intel QSV > Nvidia, too.

Interesting, I found similar results to the people commenting here:

Quality of video encoding (VideoTo… | Apple Developer Forums

forums.developer.apple.com

I've had a lot of strange artefacts exporting on my M1 machine that my Nvidia machine just doesn't show. I still use it for quick exports etc. but find the quality definitely lacking (and I've tried multiple settings!). Oh and agreed, Intels implementation is excellent.

sauria · Feb 18, 2024

Kristain said:
Interesting, I found similar results to the people commenting here:

Quality of video encoding (VideoTo… | Apple Developer Forums

forums.developer.apple.com

I've had a lot of strange artefacts exporting on my M1 machine that my Nvidia machine just doesn't show. I still use it for quick exports etc. but find the quality definitely lacking (and I've tried multiple settings!). Oh and agreed, Intels implementation is excellent.

Thanks. I wonder if the new M3 max has fixed some of these things? Part of me thinks this is a setting issue.

poorcody · Feb 18, 2024

For ML developers Alex Ziskind has done some testing comparing M3 against Nvidia (laptop and not) and other Apple chips:

M3 Max GPU vs NVIDIA RTX 4090 Laptop

macrumors Core

macrumors 68000

macrumors Core

Suspended

macrumors regular

macrumors 68000

macrumors Core

macrumors 65816

macrumors newbie

macrumors newbie

macrumors 68030

macrumors newbie

Suspended

macrumors 6502a

macrumors G5

Suspended

Suspended

macrumors 6502

macrumors 68040

macrumors 6502

macrumors member

macrumors 6502

macrumors member

macrumors 6502

macrumors 65816

Our Staff