Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Adult80HD

macrumors 6502a
Nov 19, 2019
695
825
Do you have a better explanation for the lack of GPU scaling? He provided his conclusions and how he arrived at those conclusions. Only a fool would consider them anything other than his opinion.


Can you provide specific examples where he has "trumpted" his own view as the absolute truth (as opposed to informed opinion)? Video link and time within the video please.

BTW, the blatantly obvious explanation is the code isn't optimized. You don't even have to speculate on that: Apple themselves has made it clear that even their own product FCP X has not been fully optimized, and they have an optimized build in beta. It's ridiculous to speculate on throttling, power wattage and all of the other stuff without knowing right up front if the code has been optimized. That's Computing 101 right there, and basic logic.
 

m1maverick

macrumors 65816
Nov 22, 2020
1,368
1,267
You really don't get it, do you? Open mind, rigghhhhttttt. All you are is a Max apologist on this thread. Either you're Max, his brother, or his biggest fan. He's the one presenting himself as the reviewer. The burden of proof is on him, period.
What a juvenile response. The reality is Max's reviews are out there for anyone to see. You, on the other hand, hide behind nebulous accusations.

I've provided my own analytical data on performance elsewhere, and when asked a legitimate request here by someone to run a real test, I have done so. What I won't answer is your logical non sequiturs and ad hominem attacks.
You mean like those you used in your first paragraph?
 

m1maverick

macrumors 65816
Nov 22, 2020
1,368
1,267
BTW, the blatantly obvious explanation is the code isn't optimized. You don't even have to speculate on that: Apple themselves has made it clear that even their own product FCP X has not been fully optimized, and they have an optimized build in beta. It's ridiculous to speculate on throttling, power wattage and all of the other stuff without knowing right up front if the code has been optimized. That's Computing 101 right there, and basic logic.
Something he's mentioned in his reviews.
 

F-Train

macrumors 68020
Apr 22, 2015
2,272
1,762
NYC & Newfoundland
BTW, the blatantly obvious explanation is the code isn't optimized. You don't even have to speculate on that: Apple themselves has made it clear that even their own product FCP X has not been fully optimized, and they have an optimized build in beta. It's ridiculous to speculate on throttling, power wattage and all of the other stuff without knowing right up front if the code has been optimized. That's Computing 101 right there, and basic logic.

Which is also the point that filmmaker Parker Walbeck made in a quite useful video he posted yesterday on the performance of a built to order Ultra and a MacBook Pro/Max with Premier Pro, DaVinci Resolve and Final Cut, although he doesn't talk about the Final Cut beta. Walbeck's video, which shows some odd numbers, led to some good discussion on the Blackmagic Design forum yesterday about optimisation.

In a video today, 9to5 Mac made the same point. Dave Lee, in his review, talks about the broader issue of single versus multi-threaded optimisation, an issue that affects both Mac and Windows computers.
 
Last edited:
  • Like
Reactions: FriendlyMackle

eddie_ducking

Suspended
Oct 18, 2021
95
118
Do you have a better explanation for the lack of GPU scaling? He provided his conclusions and how he arrived at those conclusions. Only a fool would consider them anything other than his opinion.

potentially yes actually …. Geekbench results for the GPU have been widely accepted as being under representative of true capability due to them not being long/large enough to cause the GPUs to “spin” up to full power

To parallelise a single task over numerious processing units you need to break it down into chunks. The task is 100%, you as a developer decide to sub break it down (into say 10% units) and then divide that 10% by the number of GPUs. For an M1 that 10% works out at 1.25% of the total job per GPU, but if you divide it by 64 not 8, it’s 0.156% of the job. These numbers are certainly not realistic in the real world but it does hint that it could be that the data sets being sent to each GPU in an Ultra 64 are simply too small to get them to ”spin” up to full capacity just because the size/64 is so much smaller than the size/8 that the software was “expecting”.

Maybe such small packets of data processing per GPU (1/8th the size of the M1) isn’t sufficient to cause clock speeds/GPUs to ramp up to full capacity (and hence power usage to do the same) on the Ultra 64 but all it takes is software optimisation to make it able to accept that larger chunks of data are necessary to exploit the capabilities available.

The software being used in Max’s videos shows overall % GPU utilisation, clock speed and power usage. It doesn’t show 8 cores maxed and 40 cores doing nothing, or 64 cores doing little because not enough is being asked of them.
 

m1maverick

macrumors 65816
Nov 22, 2020
1,368
1,267
potentially yes actually …. Geekbench results for the GPU have been widely accepted as being under representative of true capability due to them not being long/large enough to cause the GPUs to “spin” up to full power

To parallelise a single task over numerious processing units you need to break it down into chunks. The task is 100%, you as a developer decide to sub break it down (into say 10% units) and then divide that 10% by the number of GPUs. For an M1 that 10% works out at 1.25% of the total job per GPU, but if you divide it by 64 not 8, it’s 0.156% of the job. These numbers are certainly not realistic in the real world but it does hint that it could be that the data sets being sent to each GPU in an Ultra 64 are simply too small to get them to ”spin” up to full capacity just because the size/64 is so much smaller than the size/8 that the software was “expecting”.

Maybe such small packets of data processing per GPU (1/8th the size of the M1) isn’t sufficient to cause clock speeds/GPUs to ramp up to full capacity (and hence power usage to do the same) on the Ultra 64 but all it takes is software optimisation to make it able to accept that larger chunks of data are necessary to exploit the capabilities available.

The software being used in Max’s videos shows overall % GPU utilisation, clock speed and power usage. It doesn’t show 8 cores maxed and 40 cores doing nothing, or 64 cores doing little because not enough is being asked of them.
What software are you referring to? What video are you referring to? What statement are you referring to? What I have seen are comments to ambiguous claims he's made. Given the ambiguity it's next to impossible to have a reasoned discussion.

That said I will take a stab at bringing less ambiguity to this discussion.

In his Mac Studio ULTIMATE Comparison video he performs a CPU and GPU "torture test". In this test he provides benchmarks for the 24 core M1 Max, the 48 core M1 Ultra, and the 64 core M1 Ultra. The benchmarks consist of the time to complete the benchmark along with the GPU core frequencies (see the chart at time 22:44 in the video). In this chart you see the GPU frequencies decrease as the core count increases: 1,255MHz for the 24 core, 912MHz for the 48 core, and 752MHz for the 64 core GPU configurations. Based on the time to complete the benchmark was 7:14, 5:47, and 5:34 correspondingly.

To address your argument that perhaps the tasks didn't take long enough to tax the GPUs well, the shortest time was 5:34 which, IMO, seems reasonably long enough to ramp up to full utilization. If they didn't reach full utilization then why did the tasks take so long to complete? If they did why did the GPU speed decrease based on the number of cores? The answer to the latter has been seen before: Limited power means reduced frequency in order to distribute that power across a greater number of cores. It's the basis for "turbo" boost processor technology. Is this the reason? We can't say for sure but the explanation seems reasonable based on the data.

Now that we have a specific example I'd love to hear your, or anyone else's, thoughts as to why his conclusions are misleading.
 
Last edited:

eddie_ducking

Suspended
Oct 18, 2021
95
118
What software are you referring to? What video are you referring to? What statement are you referring to? What I have seen are comments to ambiguous claims he's made. Given the ambiguity it's next to impossible to have a reasoned discussion.

That said I will take a stab at bringing less ambiguity to this discussion.

In his Mac Studio ULTIMATE Comparison video he performs a CPU and GPU "torture test". In this test he provides benchmarks for the 24 core M1 Max, the 48 core M1 Ultra, and the 64 core M1 Ultra. The benchmarks consist of the time to complete the benchmark along with the GPU core frequencies (see the chart at time 22:44 in the video). In this chart you see the GPU frequencies decrease as the core count increases: 1,255MHz for the 24 core, 912MHz for the 48 core, and 752MHz for the 64 core GPU configurations. Based on the time to complete the benchmark was 7:14, 5:47, and 5:34 correspondingly.

To address your argument that perhaps the tasks didn't take long enough to tax the GPUs well, the shortest time was 5:34 which, IMO, seems reasonably long enough to ramp up to full utilization. If they didn't reach full utilization then why did the tasks take so long to complete? If they did why did the GPU speed decrease based on the number of cores? The answer to the latter has been seen before: Limited power means reduced frequency in order to distribute that power across a greater number of cores. It's the basis for "turbo" boost processor technology. Is this the reason? We can't say for sure but the explanation seems reasonable based on the data.

Now that we have a specific example I'd love to hear your, or anyone else's, thoughts as to why his conclusions are misleading.

I'm not sure if you're reinforcing my thoughts or not ... I never said time was the sole factor as to whether the GPUs spin up to full speed and your numbers do imply that a set load has been spread evenly across cores (1.2Ghz / 24 vs 750Mhz / 64) to complete in the same timeframe. I have no idea what the criteria are for the GPUs to spin up, but then neither do you, but I do know that doubling/tripling core counts (be it CPU or GPU) has never had a 1:1 relationship on performance... ever. The fact that Apple (from a CPU perspective) have got close (way over 80%) is worthy praise not derision.

I intentionally didn't specific software 'cause in the case of parallelisation of tasks in one respect they're all different in approach and in another it's not relevant. Each task/product/software package will deal with it as a concept that produces the best results/is most cost effective for themselves.

I have no idea behind the logic of the firmware over how the GPUs behave with their controllers' interpretation of existing compiled software and expected workload. I don't work for Apple (and if I did, I'm pretty sure it'd be against NDRs to comment), I'm even more certain you don't either ... so how do any of us really know the logic behind how the whole SoC detects and reacts to whole workloads ? All we can do is observe recordable events and interpret/extrapolate as to the reason. Some thoughts will be right, some will be wrong, but at this point Max's well voiced and public conjecture is just that, and quite potentially just not correct.
 
  • Like
Reactions: Adult80HD

ADGrant

macrumors 68000
Mar 26, 2018
1,689
1,059
That would explain it. I didn't watch many YouTube videos on the original M1 release so I likely missed such comments. I did read, and laughed a lot at, comments on this site which were making said claims.

Have you watched any of his Mac Studio reviews? I found them very helpful. The one I really enjoyed was the comparison of the fully loaded Mac Studio Ultra with the 12900K PC with a 3090 GPU (Luke Miani just did a similar video). Despite the Mac Studios "issues" it was quite competitive with that system.
I watched all his reviews of the Studio. I particularly liked the teardown. Comparisons with the 12900k and 3090 are interesting but I am more interested in comparisons with higher end Intel desktop Macs (Mac Pro, iMac Pro and the 2020 27" iMac with the 5700XT and either the i7 or i9).
 

ADGrant

macrumors 68000
Mar 26, 2018
1,689
1,059
If there were any better examples of why Geekbench is a lousy benchmark I think the various Mac Studio reviews are it.
Most of his other benchmarks concerned video encoding, something I am not in the least bit interested in. Others concerned software that has not yet been optimized for ARM64 (e.g. Cinebench which has been heavily optimized for the Intel SMID instuctions). In the case of his Final Cut review he is mostly testing the video encoders and wasn't using a version optimized for the Ultra (because Apple only gave that version to people they sent review units to).

He does have one Xcode benchmark but that is not a realistic measure for the performance of developer workflows. Its one build of one product using one toolchain. Also, he seems to order units with small SSD drives (512gb on the Studio Max I recall). Smaller SSDs are often slower than larger models. When I ran his Xcode benchmark on my i9 iMac, I got a much better result than he did, probably because it has a 2TB SSD and his had 256gb or 512gb.
 

Adult80HD

macrumors 6502a
Nov 19, 2019
695
825
I watched all his reviews of the Studio. I particularly liked the teardown. Comparisons with the 12900k and 3090 are interesting but I am more interested in comparisons with higher end Intel desktop Macs (Mac Pro, iMac Pro and the 2020 27" iMac with the 5700XT and either the i7 or i9).
In terms of Photoshop benchmarking, the Ultras I have both beat my 2019 Mac Pro 28-core across the board. I used the DigLloyd Photoshop benchmarks that take it through a large range of standards tasks inside of Photoshop. On some tasks it's a pretty big difference, on most its slight--but this is comparing a $25K top of the line Mac Pro to a $6K Studio Ultra. Huge bang for the buck.

For Lightroom, imports, exports and preview rendering are all substantially faster with the Studio Ultra than the Max--in many of those tasks nearly 2x as fast. The key measure for those tasks is proper code optimization--the tasks that are the fastest all fully utilize all of the cores, and that's on either Intel or AS. That said, the 20-core Ultra still handily beats the 28-core Mac Pro. A lot of things you can't benchmark though--the speed and fluidity of the user interface in Lightroom is very noticeable in actual use, and makes big difference in the productive use of the application, for example.

Much still depends on the software you use right now--some software is poorly optimized, some tasks are simply not easily scaled by parallelization. In the photography world, CaptureOne is a big competitor to Lightroom, and they have done a poor job of optimizing for multi-core processors, so the benefits of an Ultra vs. a Max are negligible there--assuming its the only app you are using.

That said, there's literally thousands of uses for these machines, so it's impossible to generalize why one might be better than another without specific use cases in mind.
 

m1maverick

macrumors 65816
Nov 22, 2020
1,368
1,267
I'm not sure if you're reinforcing my thoughts or not ... I never said time was the sole factor as to whether the GPUs spin up to full speed and your numbers do imply that a set load has been spread evenly across cores (1.2Ghz / 24 vs 750Mhz / 64) to complete in the same timeframe. I have no idea what the criteria are for the GPUs to spin up, but then neither do you, but I do know that doubling/tripling core counts (be it CPU or GPU) has never had a 1:1 relationship on performance... ever. The fact that Apple (from a CPU perspective) have got close (way over 80%) is worthy praise not derision.
Here's the problem: I don't know either because you haven't provided a specific example for which to discuss. All you've provided is nebulous and ambiguous talking points. So I am left to speculate as to your meaning.

Now that I have provided a specific benchmark to discuss I am interested in hearing your thoughts as to why the results are the way they are and disproves his opinion as to why they are what they are.

The 1:1 response of yours is a strawman. Max has not claimed he expects a 1:1 scaling (can you show otherwise?). His opinion is that, in this specific benchmark, paying $1K for 33% more GPU cores doesn't seem reasonable when the results improve by 4%. In the Red 8K benchmark his opinion was that paying $1K for 33% more GPU cores may be reasonable as it results in a 23% improvement.

Finally I have not seen Max negatively comment about the CPU core scaling. Another strawman.

I intentionally didn't specific software 'cause in the case of parallelisation of tasks in one respect they're all different in approach and in another it's not relevant. Each task/product/software package will deal with it as a concept that produces the best results/is most cost effective for themselves.

I have no idea behind the logic of the firmware over how the GPUs behave with their controllers' interpretation of existing compiled software and expected workload. I don't work for Apple (and if I did, I'm pretty sure it'd be against NDRs to comment), I'm even more certain you don't either ... so how do any of us really know the logic behind how the whole SoC detects and reacts to whole workloads ? All we can do is observe recordable events and interpret/extrapolate as to the reason. Some thoughts will be right, some will be wrong, but at this point Max's well voiced and public conjecture is just that, and quite potentially just not correct.
We don't know and neither does he. However he has presented a plausible reason for the GPU results observed. I don't recall him saying "This is positively, absolutely, completely unequivocally why" but rather "this could be why".

But in the end it doesn't matter. IMO, when one pays $1K for 33% more GPU power I would expect to see more than a 4% reduction in time. This large difference was observed in all but one (the Red 8K benchmark) of the benchmarks he performed. That, to me, is valuable information to have. In the end it doesn't matter why, the only thing that matters is that it is.
 
  • Like
Reactions: rkuo

ADGrant

macrumors 68000
Mar 26, 2018
1,689
1,059
Regarding the optimization after having been in this industry long enough I have learned you buy a system for what it can do today and not what it may (or is promised to) do in the future. I've fallen for that a few times. A reviewer needs to be able to base their recommendation on observed behavior and not theoretical future benefit.
I agree with you up to a point. However, in the case of the Ultra, it's well known that there is a version of FinalCut in beta that is better optimized for the new SoC. Also, most people buying an Apple Silcon Mac right now could reasonably expect applications built for Apple Silicon in future. There are still many applications that rely on Rosetta 2 to run and some that don't run at all.
 

m1maverick

macrumors 65816
Nov 22, 2020
1,368
1,267
I agree with you up to a point. However, in the case of the Ultra, it's well known that there is a version of FinalCut in beta that is better optimized for the new SoC. Also, most people buying an Apple Silcon Mac right now could reasonably expect applications built for Apple Silicon in future. There are still many applications that rely on Rosetta 2 to run and some that don't run at all.
You mean just like Max says at minute 16:20 of the video I previously referenced? Where he qualifies his findings (at 16:45) with the comment "right now"?
 

ADGrant

macrumors 68000
Mar 26, 2018
1,689
1,059
You mean just like Max says at minute 16:20 of the video I previously referenced? Where he qualifies his findings (at 16:45) with the comment "right now"?
Yes, right when he presents findings he knows aren‘t really valid.
 

m1maverick

macrumors 65816
Nov 22, 2020
1,368
1,267
Yes, right when he presents findings he knows aren‘t really valid.
Huh? In your opinion if he doesn't present findings for a version of FCP which is currently unavailable that somehow invalidates his findings for the currently available version of FCP?
 
Last edited:

ADGrant

macrumors 68000
Mar 26, 2018
1,689
1,059
What do you mean? The findings he presents are based on actual tests.
If I was a FinalCut user trying to decide which configuration to order, those findings are of no use. I need the timings from the next version of FinalCut.
 

m1maverick

macrumors 65816
Nov 22, 2020
1,368
1,267
If I was a FinalCut user trying to decide which configuration to order, those findings are of no use. I need the timings from the next version of FinalCut.
Why not?

Furthermore, that does not invalidate his findings on the current version of FCP.
 

Chancha

macrumors 68020
Mar 19, 2014
2,245
2,042
As if it was anyone's fault that Apple hasn't thrown out that magical pre-release version... like a month after the event.

This is the one part I don't get the modern Apple. They decided the timing of that event all by itself, while the prime focus was the Mac Studio. How the hell could the OS or the software not optimized yet weeks after users are getting hands on it.
 

ADGrant

macrumors 68000
Mar 26, 2018
1,689
1,059
As if it was anyone's fault that Apple hasn't thrown out that magical pre-release version... like a month after the event.

This is the one part I don't get the modern Apple. They decided the timing of that event all by itself, while the prime focus was the Mac Studio. How the hell could the OS or the software not optimized yet weeks after users are getting hands on it.
Software development time lines aren't always predictable and not everyone buying a Studio is interested in FinalCut.
 

m1maverick

macrumors 65816
Nov 22, 2020
1,368
1,267
Software development time lines aren't always predictable and not everyone buying a Studio is interested in FinalCut.
Yet, according to you, we're supposed to invalidate findings for the current version because a better optimized version will be released....when?
 

ADGrant

macrumors 68000
Mar 26, 2018
1,689
1,059
Yet, according to you, we're supposed to invalidate findings for the current version because a better optimized version will be released....when?
No idea, but if FinalCut performance was important to me I would hold off any purchasing decision.
 

smithdr

macrumors regular
Aug 17, 2021
207
127
Hi All:

David Harry has posted an excellent video comparing the H.265 export speed on both a 16" M1 Max and a Mac Studio Ultra base model. David goes into significant detail on how he made his comparisons between these two computers using DaVinci Resolve.

Don Barar

 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.