So M1 Max with 32 Core GPU matches RTX 3080 Mobile.

JimmyjamesEU · Oct 23, 2021

Borrowed from the Ars Technica Mac forum:

ARMageddon

Andrei from Anandtech has endorsed the theory that the Max die shot is doctored. Along the bottom seem to be duplicated parts, including the neural cores. The assumption is that this hides interconnects for Mac Pro purposes. This seems so ridiculous. The lifetime of that supposed secret is...

arstechnica.com

"Plus it does look like there may be some falloff in Geekbench compute, so some not so perfectly parallel algorithms.

Example: RTX 3090 vs RTX 3060 Ti.
3090 is more than double 3060Ti in every respect, for instance TFLOPS = 29.28/13.72 = 2.13 = 113% more compute according to NVidia.

Yet in GB compute:
https://browser.geekbench.com/opencl-benchmarks

205635/120247 = 1.71 = 71% higher GB compute. You would expect 113% if the benchmark was perfectly scalable and perfectly parallel.

So it's very possible that some of the falloff in M1-Max in the Geekbench compute test, is from Geekbench itself."

It seems like there may be some issues with Geekbench.

ElfinHilon · Oct 23, 2021

leman said:
By the way, I don't know if it was mentioned before, but I would like to point out that these M1 Max scores are very similar to AMD's 6700M — another 10TFLOPS GPU. So again what we have here is that Nvidia GPUs somehow perform better in Geekbench, either because Ampere's dual ALUs give it a big advantage, or because there is something in Geekbench itself that favors Nvidia.

When will we get the mind blowing expose on how Geekbench is in bed with Nvidia???

crazy dave · Oct 23, 2021

JimmyjamesEU said:
Borrowed from the Ars Technica Mac forum:

ARMageddon

Andrei from Anandtech has endorsed the theory that the Max die shot is doctored. Along the bottom seem to be duplicated parts, including the neural cores. The assumption is that this hides interconnects for Mac Pro purposes. This seems so ridiculous. The lifetime of that supposed secret is...

arstechnica.com

"Plus it does look like there may be some falloff in Geekbench compute, so some not so perfectly parallel algorithms.

Example: RTX 3090 vs RTX 3060 Ti.
3090 is more than double 3060Ti in every respect, for instance TFLOPS = 29.28/13.72 = 2.13 = 113% more compute according to NVidia.

Yet in GB compute:
https://browser.geekbench.com/opencl-benchmarks

205635/120247 = 1.71 = 71% higher GB compute. You would expect 113% if the benchmark was perfectly scalable and perfectly parallel.

So it's very possible that some of the falloff in M1-Max in the Geekbench compute test, is from Geekbench itself."

It seems like there may be some issues with Geekbench.

Interesting … though I found with cuda better scaling (though maybe still not linear would have to calculate):

So M1 Max with 32 Core GPU matches RTX 3080 Mobile.

Compute does indeed scale pretty linearly. For instance, you mention the 3090 vs 3070: https://browser.geekbench.com/v5/compute/3563948 https://browser.geekbench.com/v5/compute/3564893 As you see above, you’ll find basically double for the 3090 compared to the 3070. I couldn’t be bothered to...

forums.macrumors.com

So maybe less of an issue that the algorithms are less parallel but that they are not as efficiently coded outside of CUDA?

leman · Oct 23, 2021

hefeglass said:
I think the fact that apple is suddenly putting a lot of effort into getting blenders cycles renderer using metal API is a very good sign that these new GPUs will be great for 3d creative work

What I can't understand is why they are doing it now. They could have had cycles fully Metal-compatible by August if they started last year. Apple's management is sleeping. They really need to up their developer support and FOSS game. Because so far it has been one step forward, one step back and then one step to the side.

For example, they released a Metal-enabled TensorFlow fork last year, then suddenly abandoned it and published a closed-source TensorFlow pluggable driver, which is buggy and seemingly has not been updated in a while. Why not make it open-source so that community can help identifying and fixing bugs? And why isn't Apple implementing a backend for PyTorch? These machines have such tremendous potential of breaking CUDA's hegemony, but Apple has to do more than just to publish some badly formatted documentation.

Serban55 said:
Still a big jump no matter how you see it...the biggest jump from 1 generation to the new one in history i think

It's not a jump, it's a friggin leap of incredible magnitude. They literally took a graphics unit from a mobile phone and catapulted it to the desktop class, while consuming a fraction of power. For mobile workstation, neither Intel, Nvidia or AMD has any counter, nil, zero, nada. Having a power of a upper-mid-range desktop workstation in a 2kg ultracompact laptop with almost twenty hours of battery life is simply unprecedented.

Serban55 said:
Did you order something?

Sure, a 16" with an M1 Max

EugW · Oct 23, 2021

JimmyjamesEU said:
It seems like there may be some issues with Geekbench.

If it was a perfectly scaling benchmark forever, it'd probably not be a very good representation of real world workloads. Cuz most real world workloads don't scale like that either.

leman · Oct 23, 2021

ElfinHilon said:
When will we get the mind blowing expose on how Geekbench is in bed with Nvidia???

I don't think that Geekbench is in bed with Nvidia. Just that recent Nvidia GPU seem to do better in Geekbench for some reason. There might be some trivial reasons for this.

JimmyjamesEU · Oct 23, 2021

EugW said:
If it was a perfectly scaling benchmark forever, it'd probably not be a very good representation of real world workloads. Cuz most real world workloads don't scale like that either.

It's supposed to be a test of hardware: cpus and gpus, so that people can compare different generations and models of computer, with an eye to finding what's suitable for their needs. It's not meant to mimic the imperfections in real world applications. We have...real world applications for that.

crazy dave · Oct 23, 2021

EugW said:
If it was a perfectly scaling benchmark forever, it'd probably not be a very good representation of real world workloads. Cuz most real world workloads don't scale like that either.

Actually in CUDA it does scale better:

(TFLOPS, CUDA, OpenCL)

3090 (35.8, 238937, 205635)
3070 (20.3, 149836, 135018)
3060 TI (16.2, 128474, 120247)

Ratio 3090 vs 3070 (1.76, 1.59, 1.52)
3090 vs 3060 TI (2.2, 1.85, 1.71)
3070 vs 3060 TI (1.25, 1.16, 1.12)

Obviously one take away is that the bigger the TFLOPs difference, the bigger the gap between that and the difference in GB score. Which makes sense if there are scaling issues in general. But those are less pronounced in CUDA.

crazy dave · Oct 23, 2021

JimmyjamesEU said:
It's supposed to be a test of hardware: cpus and gpus, so that people can compare different generations and models of computer, with an eye to finding what's suitable for their needs. It's not meant to mimic the imperfections in real world applications. We have...real world applications for that.

Yes and no, much of compute does scale pretty linearly buuuuut they do use real world algorithms* underpinning their meta benchmark, if enough don’t scale that throws the average off. In the subtests this is good to know how those algorithms do scale if they aren’t in fact linear.

*The algorithms are real but often on simplified or fake data or something that allows the benchmark to be completed in a reasonable amount of time especially since they run multiple sub-benchmarks. This can also throw off what might be an algorithm that should scale.

EugW · Oct 23, 2021

JimmyjamesEU said:
It's supposed to be a test of hardware: cpus and gpus, so that people can compare different generations and models of computer, with an eye to finding what's suitable for their needs. It's not meant to mimic the imperfections in real world applications. We have...real world applications for that.

Actually it’s the opposite of the what you say. The purpose of Geekbench is to try to synthetically simulate what might be expected for application performance, and it uses simple but real world workloads to do that. In fact the criticism of earlier versions of Geekbench was that it was far too synthetic and had no basis in reality so they modified it to compensate.

Now, whether or not Geekbench succeeds in its goal is another matter, but in this case it actually seems to, since what it shows for scaling is roughly what Apple reports for real world applications in its own tests.

JimmyjamesEU · Oct 23, 2021

EugW said:
Actually it’s the opposite of the what you say. The purpose of Geekbench is to try to synthetically simulate what might be expected for application performance, and it uses simple but real world workloads to do that. In fact the criticism of earlier versions of Geekbench was that it was far too synthetic and had no basis in reality so they modified it to compensate.

Now, whether or not Geekbench succeeds in its goal is another matter, but in this case it actually seems to, since what it shows for scaling is roughly what Apple reports for real world applications in its own tests.

It’s a test of cpu, memory and gpu performance. It is not meant to simulate or imitate the faults or inadequacies of “real world” applications.

crazy dave · Oct 23, 2021

crazy dave said:
Actually in CUDA it does scale better:

(TFLOPS, CUDA, OpenCL)

3090 (35.8, 238937, 205635)
3070 (20.3, 149836, 135018)
3060 TI (16.2, 128474, 120247)

Ratio 3090 vs 3070 (1.76, 1.59, 1.52)
3090 vs 3060 TI (2.2, 1.85, 1.71)
3070 vs 3060 TI (1.25, 1.16, 1.12)

Obviously one take away is that the bigger the TFLOPs difference, the bigger the gap between that and the difference in GB score. Which makes sense if there are scaling issues in general. But those are less pronounced in CUDA.

Here's the weird part though ... my previous post on this referenced two actual systems and the scaling in CUDA between the 3090 and 3070 was 2.06x! More than the difference in reported TFLOPs!

If accurate, this tells me either the CPU or the thermals have a big affect on the benchmark. That could affect the scaling from the M1 to M1 Pro to the Max too. It should also be pointed out that this scaling from the 3090 to the 3070 is already way above the GPU power of the Max whose theoretical is 10 TFLOPs. Let's look at the scaling lower down?

GeForce RTX 2070 SUPER	108002
NVIDIA GeForce RTX 2080 Super with Max-Q Design	105981

So this is weird, the 2080 Super Max-Q (Laptop) has an FP32 TFLOP of only 6 but matches the 2070 Super (Desktop) which has a FP32 TFLOP of 9! What the hell is going on with GB? Are these even all the same version of GB in this list?

CUDA Benchmarks - Geekbench Browser

Useless if not ...

Okay moving on ... The 2060 also has 6.45 TFLOPs

GeForce RTX 2060

77808

From 2060 to 2070, perfect scaling in CUDA.

What about OpenCL?

NVIDIA GeForce RTX 2070 SUPER

98973

GeForce RTX 2060

70344

Again perfect scaling in the roughly the regime we'd expect the M1 Max & Pro to fall into. The lack of perfect scaling due to maybe the algorithms not scaling properly only happens well bigger than those GPUs. So something is still up.

ElfinHilon · Oct 23, 2021

leman said:
I don't think that Geekbench is in bed with Nvidia. Just that recent Nvidia GPU seem to do better in Geekbench for some reason. There might be some trivial reasons for this.

Oh I was just joking about that lmao

crazy dave · Oct 23, 2021

JimmyjamesEU said:
It’s a test of cpu, memory and gpu performance. It is not meant to simulate or imitate the faults or inadequacies of “real world” applications.
View attachment 1875886
View attachment 1875888

Except in order to achieve that they do use real world algorithms underlying them. So does SPEC. You can often look at the subtest names and you can track down which program the algorithm came from (and sometimes they'll just tell you, this is blender)

JimmyjamesEU · Oct 23, 2021

crazy dave said:
Yes and no, much of compute does scale pretty linearly buuuuut they do use real world algorithms* underpinning their meta benchmark, if enough don’t scale that throws the average off. In the subtests this is good to know how those algorithms do scale if they aren’t in fact linear.

*The algorithms are real but often on simplified or fake data or something that allows the benchmark to be completed in a reasonable amount of time especially since they run multiple sub-benchmarks. This can also throw off what might be an algorithm that should scale.

My point is, they aren’t mimicking he performance of real world apps that don’t scale well in the hope of providing a proxy for thoss apps performance! They aim to provide a way to measure different systems performance.

EugW · Oct 23, 2021

crazy dave said:
Except in order to achieve that they do use real world algorithms underlying them. So does SPEC. You can often look at the subtest names and you can track down which program the algorithm came from (and sometimes they'll just tell you, this is blender)

Exactly. My guess is @JimmyjamesEU was not aware of all the criticism Geekbench received early on for being too far removed from real world application performance. Afterwards, Geekbench was changed significantly to address those concerns, in version 4.

JimmyjamesEU · Oct 23, 2021

crazy dave said:
Except in order to achieve that they do use real world algorithms underlying them. So does SPEC. You can often look at the subtest names and you can track down which program the algorithm came from (and sometimes they'll just tell you, this is blender)

Sure, but they aim to do so in a way that scales with hardware. Claiming that it’s an aim of geekbench to scale in a poor way because real world apps often do is not true.

JimmyjamesEU · Oct 23, 2021

EugW said:
Exactly. My guess is @JimmyjamesEU was not aware of all the criticism Geekbench received early on for being too far removed from real world application performance. Afterwards, Geekbench was changed significantly to address those concerns, in version 4.

The criticism was about combining different numbers into one overall benchmark number, thereby hiding areas of concern. Not that they should aim to scale poorly because real world apps do.

EugW · Oct 23, 2021

JimmyjamesEU said:
Sure, but they aim to do so in a way that scales with hardware. Claiming that it’s an aim of geekbench to scale in a poor way because real world apps often do is not true.

Honestly, I don't know what you're on about, and my guess is that you don't either.

The fact is that these are real world algorithms, and some do not scale as well as others, and because Geekbench is made up of these algorithms, you would expect it to reflect that.

To be honest, after all this, maybe some might even have more respect for Geekbench now than they might have had previously.

crazy dave · Oct 23, 2021

JimmyjamesEU said:
Sure, but they aim to do so in a way that scales with hardware. Claiming that it’s an aim of geekbench to scale in a poor way because real world apps often do is not true.

Yes actually it is. They don't change the algorithms underlying except in minor ways. They are absolutely supposed to be reflective of the applications they come from, so is SPEC. People argue about how reflective they actually are, but that is indeed what you are supposed to glean from them.

I've written this multiple times in the past, but the top line numbers for meta benchmarks like SPEC and GB are a little silly. The subtests are where the meat are and you're supposed to be able to tell how your particular workflow will be affected.

**** However that's not what's happening here **** read my post above. The algorithms used in GB scale just fine on Nvidia for both OpenCL and CUDA in the TFLOPs regime where the M1 Max and Pro reside. It's only when those graphics cards get massive do they break down. They shouldn't be breaking down here and not by this much. Something is still very wrong.

crazy dave · Oct 23, 2021

EugW said:
The fact is that these are real world algorithms, and some do not scale as well as others, and because Geekbench is made up of these algorithms, you would expect it to reflect that.

Pithily put

But that's not what's happening here. It's only for the truly massive GPUs that these algorithms break down either intrinsically or maybe because the way GB encodes them. Between the 2060 and 2070 Super the algorithms scale just fine in GB. They don't for the M1 Max. So that's still weird. The algorithms shouldn't be mis-scaling *here*.

EugW · Oct 23, 2021

crazy dave said:
Pithily put

But that's not what's happening here. It's only for the truly massive GPUs that these algorithms break down either intrinsically or maybe because the way GB encodes them. Between the 2060 and 2070 Super the algorithms scale just fine in GB. They don't for the M1 Max. So that's still weird. The algorithms shouldn't be mis-scaling *here*.

Perhaps. But then again the scaling Apple is seeing isn't great either.

Maybe much of the problem is on Apple's end, and if so, hopefully it can be addressed with driver updates. I mean, Monterey isn't even out yet.

crazy dave · Oct 23, 2021

EugW said:
Perhaps. But then again the scaling Apple is seeing isn't great either.

You misunderstand, down here it's *only* Apple not seeing great scaling. Nvidia is doing just fine. Going from 6-9 TFlops results in perfect scaling for GB on Nvidia cards. The M1 Pro to Max GPU scaling should've been perfect too. Admittedly didn't check AMD.

JimmyjamesEU · Oct 23, 2021

EugW said:
Honestly, I don't know what you're on about, and my guess is that you don't either.

The fact is that these are real world algorithms, and some do not scale as well as others, and because Geekbench is made up of these algorithms, you would expect it to reflect that.

To be honest, after all this, maybe one should have more respect for Geekbench than one might have previously.

Jeez more personal attacks.

No one disputes they are real world algorithms. You originally stated it was ok for geekbench to scale in a poor way because other applications do. My argument (backed by their website) is that they aim to provide a benchmark that reveals the performance of your system. Cpu, memory and gpu. The purpose of which is to enable you to compare your computer against other computers.

I dont know what else to tell you, their site tells you the aim of the benchmark. You can accept it or not.

crazy dave · Oct 23, 2021

JimmyjamesEU said:
Jeez more personal attacks.

No one disputes they are real world algorithms. You originally stated it was ok for geekbench to scale in a poor way because other applications do. My argument (backed by their website) is that they aim to provide a benchmark that reveals the performance of your system. Cpu, memory and gpu. The purpose of which is to enable you to compare your computer against other computers.

I dont know what else to tell you, their site tells you the aim of the benchmark. You can accept it or not.

The purpose is to compare computers against other computers ***for those workloads and algorithms*** which are meant to be a common, but diverse set of use cases. Thus the top line number gives you a general sense of the overall performance, while the subtests tell you the likelihood of how performant it will be for a given workload. If the sub-benchmarks don't mirror their underlying workloads, then you might as well just compare TFLOPs you don't need to run benchmarks at all.

EDIT:
Think of it this way:

Let's say a real GPU algorithm between the 3060 and 3090 scales at 0.75x the TFLOPs. But you modify the algorithm enough so it scales linearly 1:1, but it no longer does anything real anymore. What are you measuring? TFLOPs? You already have that measurement. Sure you can confirm it, but there are algorithms that you know scale at or near 1:1 and you can simply use those. So let's take your justification that people want to compare two GPUs to know which to get. To be useful so that people can adequately compare: "do I really need a 3090 for my specific task or will the 3060 accomplish a better perf per $ than just the TFLOPs alone would indicate?" for that they need to know how that task actually scales across the two GPUs. GB aims to provide both an overall comparison that averages the subtests and the subtest scores themselves.

Again though, not what is happening here. The algorithms and the overall GB score should be scaling linearly across the M1/Pro/Max GPUs. The Max isn't powerful enough to hit the areas where the linear scaling stops.

My guess? Thermals. GB compute must be running long enough that the chip is getting hot and the default is for Apple to slow the GPU down, throttle it to maintain quiet, cool operations. The high power mode on the 16inch I’m betting is not to do that and spin the fans up really high instead. Thus peak performance is unaffected but what we’re seeing is a mixture of sustained and peak. This is all just conjecture, but it fits the available data so far.

So M1 Max with 32 Core GPU matches RTX 3080 Mobile.

Suspended

macrumors regular

macrumors 68000

macrumors Core

macrumors P6

macrumors Core

Suspended

macrumors 68000

macrumors 68000

macrumors P6

Suspended

macrumors 68000

macrumors regular

macrumors 68000

Suspended

macrumors P6

Suspended

Suspended

macrumors P6

macrumors 68000

macrumors 68000

macrumors P6

macrumors 68000

Suspended

macrumors 68000

Our Staff