Estimating M1 Ultra 20CPU 64GPU performance based on current performance scaling

JimmyjamesEU · Mar 17, 2022

From the Arstechnica review

Review: The Mac Studio shows us exactly why Apple left Intel behind

Part Mac mini, part trashcan Mac Pro, the Studio is one impressive mini desktop.

arstechnica.com

Interesting that even though Cinebench is far from optimised for the M1 or arm in general, the M1 Ultra is very close to the i9-12900... at a fraction of the power. One can only imagine how badly it will beat it when more of that neon optimisation lands!

From the same place we can see that when an industry standard benchmark is used (geekbench) intel's best is humiliated.

People are gonna need to search for more irrelevant benchmarks and cherry picked stats to keep the x86 crowd happy.

Xiao_Xi · Mar 17, 2022

Does M1 Ultra scale well?

JimmyjamesEU · Mar 17, 2022

Xiao_Xi said:
Does M1 Ultra scale well?

It's an interesting question. As far as I can tell, the answer is: it depends. If the test runs for long enough, or moves enough data, then it does seem to scale. If it relies on short bursts of activity, or the test is unoptimised for the M1 it doesn't.

drmeatball · Mar 17, 2022

Blender and Linux aren't professional software? Join the industry and then spout off.

ader42 · Mar 17, 2022

mi7chy said:
One in each hand.

If you're talking about tile size, Blender did away with manual setting since version 3.0 since it's obsolete with Cycles X.

Thanks, yes some sw calls it buckets others call it tiles. Didn’t know that about CyclesX - and it could explain a lot…

I found this:

Cycles Apple Metal device feedback

Been using the Blender 3.1 Alpha on my MacBook Pro 14 10/16 16GB RAM with MacOS 12.1 and noticed that when I go to F12 render, the RAM usage on Activity Monitor spikes up very high for a second (yellow/red memory pressure), before going down to green for the rest of the render. Is there some...

devtalk.blender.org

Where a poster said:

”
Thanks everyone for making this a reality. Version 3.1.0 2021-12-16 is already a solid performer for me (MBA M1).

Some observations are raising questions though.

With tile sizes smaller than the output image there is only one tile at a time rendering (as opposed to as many as there are threads), also rendering times become longer. Take the BMW scene:
Tile size 2160 > 1 min 03
Tile size 512 > 1 min 54
Tile size 64 > 2 min 40
”

I know in previous Cycles iterations (before CyclesX) one could specify bucket/tile size and this affected render speed, as did the progressive render option - which I assume is also on by default. I know non-Apple Silicon GPUs often prefer small buckets so I suspect that this might be one issue / area for optimisation.

jmho · Mar 17, 2022

mi7chy said:
If you're talking about tile size, Blender did away with manual setting since version 3.0 since it's obsolete with Cycles X.

I believe nVidia renders progressively without tiles with Cycles X, but the Metal backend doesn't. The default tile size is set to 2048 in the BMW GPU scene.

ader42 · Mar 17, 2022

It’s a little odd that the above poster got 1 min 03 seconds with an M1 MBA 8 core GPU BMW render but people are getting 33 seconds with M1 Ultra with 64 cores…

Homy · Mar 17, 2022

Xiao_Xi said:
Does M1 Ultra scale well?

Not as we want, depending on the application. Anandtech said for example that Geekbench is short burst benchmark and M1 Max/Ultra GPU doesn't get the chance to speed up to higer clock rates before the test is done. M1 GPU seems to need more time than Intel/AMD to speed up. If you go to gfxbench.com M1 Ultra is faster than RTX 3080 and close to 3090, but Ultra is not 2x faster than Max. Also Anandtech explained last time reviewing M1 Max/Pro that M1 seems to be CPU bound in games and can't use all the memory bandwidth to feed its fast GPU.

Skärmavbild 2022-03-18 kl. 01.41.22.png

vladi · Mar 17, 2022

JimmyjamesEU said:
Here's an interesting tidbit about performance and Apple Silicon. It's from the arstechnica forum (with the original post from a Redshift page on facebook, and also from the Maxon forum).

Peek Performance: March 8th Event [Event discussion begins on p6!]

My BTO Mac Studio has just switched to "preparing to ship" :eek: Maybe I'll get it sooner rather than later? I have a hold on my credit card, but it's still at 'Processing'. No hold for the display yet though. I had to spend an hour on the phone with Apple on Saturday to get mine cleared...

arstechnica.com

also

Peek Performance: March 8th Event [Event discussion begins on p6!]

My BTO Mac Studio has just switched to "preparing to ship" :eek: Maybe I'll get it sooner rather than later? I have a hold on my credit card, but it's still at 'Processing'. No hold for the display yet though. I had to spend an hour on the phone with Apple on Saturday to get mine cleared...

arstechnica.com

A very high end render scene 'Moana'.

The M1 Max completes the scene faster than lol but a 3090. Faster than a 3080, faster than 2x2080ti.
It's not video editing and it can be (with optimisation) very fast indeed.

edited to correct the mistake that the M1 Max is faster than a 3090. It should have said “faster than all but a 3090”.

This is completely irrelevant workflow because high end production such as this "Moana" will always be put in renderfarm. There will be no Mac Studio GPU renderfarms unless your company buys Studio for each cubicle person and then performs night time render sessions. Users with Mac Studio or single/dual RTX cards will never ever be put in such intensive situation. So this much like synthetic benchmarks are useless.

JimmyjamesEU · Mar 17, 2022

Deleted

jujoje · Mar 18, 2022

vladi said:
This is completely irrelevant workflow because high end production such as this "Moana" will always be put in renderfarm. There will be no Mac Studio GPU renderfarms unless your company buys Studio for each cubicle person and then performs night time render sessions. Users with Mac Studio or single/dual RTX cards will never ever be put in such intensive situation. So this much like synthetic benchmarks are useless.

I'm going to have to vaguely disagree on this one as well. The Moana data set is useful as its indicative of the kind of the data set in film and, increasingly, tv industry. Totally agree that there would be no Mac Studio GPU render farms, but that's not what makes the Moana dataset / benchmark interesting from a high end production point of view.

While it is primarily designed as an offline render test, its interesting to see how performant it is for interactive rendering, and that's an area that the Mac Studio is more designed for, and a workflow that film studios seem to be heading towards. The more representative and accurate scene you can load locally the better it would be for doing, lighting, shading and general lookdev and layout tasks on your workstation. This is very much what the discussion on that arstechnica thread was leaning towards.

Take for example the train Coco scene that Pixar use to sell the idea of xPU and USD; you can load the entire set, do set dressing, define shot cameras, do lookdev, lighting and switch to different render delegates all in on file (with no need to split out sets, worry about continuity or publishing things to multiple shots). This seems to be where things are heading, and having a GPU on your workstation that can handle that sort of workflow is obviously going to be a massive benefit, particularly as using GPU render delegates for lookdev seems to be the goal, at least as far as Renderman and Karma xPU go.

But even if your not aboard the usd / hydra delegate hype train take a typical FX shot where you have, say, some high resolution explosions and some destruction, a high res set. Your scene data is going to be, what, 10Gb a frame and that's just geometry caches, then you've got dicing, displacement, subframe motion blur and so forth, so let's say to render it's around 40Gb. With a Mac Studio you can load that data onto the gpu and do lookdev, getting feedback in realtime, with minimal scene prep and no out of core cacheing, because unified memory. Final frames can go to the farm, because farm time is cheaper that artist time, but you're maximising your artist time and getting faster time to first pixel and quicker iteration time.

To a certain extent all these benchmarks of how long it take to get to final frame on the Blender BMW benchmark are somewhat missing the point in terms of this kind of workflow. You're not going to have artists waiting 20 min staring at a render bar for that last 10% of the render - it's the time to first pixel, interactivity and first 10% - 20% of the render time which is important.

In terms of pro 3D workflows the GPU architecture seems to me to be a bet on a vision of the future, not entirely dissimilar to the vision of the trash can Mac Pro, with it's dual graphics cards and compute power. Let's hope it pans out better.

Sopel · Mar 18, 2022

Homy said:
https://gfxbench.com/result.jsp?ben...rch-unknown=true&arch-x86=true&base=deviceNot as we want, depending on the application. Anandtech said for example that Geekbench is short burst benchmark and M1 Max/Ultra GPU doesn't get the chance to speed up to higer clock rates before the test is done. M1 GPU seems to need more time than Intel/AMD to speed up. If you go to gfxbench.com M1 Ultra is faster than RTX 3080 and close to 3090, but Ultra is not 2x faster than Max. Also Anandtech explained last time reviewing M1 Max/Pro that M1 seems to be CPU bound in games and can't use all the memory bandwidth to feed its fast GPU.

View attachment 1975522

you cherrypicked a cpu-bound result ?

GFXBench - Unified cross-platform 3D graphics benchmark database

The first unified cross-platform 3D graphics benchmark database for comparing Android, iOS, Windows 8, Windows Phone 8 and Windows RT capable devices based on graphics processing power.

gfxbench.com

OR maybe this website is completely *******? can you spot what's wrong? There's also a bunch of 60fps results due to vsync ?. This "benchmark" should never have been shared.

Homy · Mar 18, 2022

Sopel said:
you cherrypicked a cpu-bound result ?

GFXBench - Unified cross-platform 3D graphics benchmark database

The first unified cross-platform 3D graphics benchmark database for comparing Android, iOS, Windows 8, Windows Phone 8 and Windows RT capable devices based on graphics processing power.

gfxbench.com

View attachment 1975689

OR maybe this website is completely *******? can you spot what's wrong? There's also a bunch of 60fps results due to vsync ?. This "benchmark" should never have been shared.

What do you mean? Aztech Ruins High Tier Offscreen is used by many reviewers. Do you mean that 3090 has higher scores in that test? Which test are you showing? M1 Ultra has mixed results in GFXBench. In some tests it's slower than M1 Max for some reason.

Sopel · Mar 19, 2022

Homy said:
What do you mean? Aztech Ruins High Tier Offscreen is used by many reviewers. Do you mean that 3090 has higher scores in that test? Which test are you showing? M1 Ultra has mixed results in GFXBench. In some tests it's slower than M1 Max for some reason.

At fps this high you're basically testing only cpu and the driver overhead

Search

Search

Estimating M1 Ultra 20CPU 64GPU performance based on current performance scaling

JimmyjamesEU

Suspended

Review: The Mac Studio shows us exactly why Apple left Intel behind

Xiao_Xi

macrumors 68000

JimmyjamesEU

Suspended

drmeatball

macrumors regular

ader42

macrumors 6502

Cycles Apple Metal device feedback

jmho

macrumors 6502a

ader42

macrumors 6502

Homy

macrumors 68030

vladi

macrumors 65816

Peek Performance: March 8th Event [Event discussion begins on p6!]

Peek Performance: March 8th Event [Event discussion begins on p6!]

JimmyjamesEU

Suspended

jujoje

macrumors 6502

Sopel

macrumors member

GFXBench - Unified cross-platform 3D graphics benchmark database

Homy

macrumors 68030

GFXBench - Unified cross-platform 3D graphics benchmark database

Sopel

macrumors member

Our Staff