Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Appletoni

Suspended
Mar 26, 2021
443
177
That is sad to hear. Other than Anandtech, are there any other sites that do proper reviews for this kind of thing? I.e. not just running random benchmarks and extrapolating wildly.



While the CPUs are probably going to be fast regardless, I think the GPUs are going to require a fair bit of optimisation to hit that performance and really curious to see what we get with something that's fully optimised for the Apple hardware.

It'd be great if Apple released a reference renderer / metal viewport (perhaps a Hydra delegate for USD as the one in Preview is a bit meh), just to see the kinds of performance you can get from a proper implementation of TBDR. I'm guessing a lot of Apps with go for the moltenvk approach, for cross platform compatibility, which is a bit disappointing.



Speaking of optimisations, I suspect that things on the CPU side are not optimal in some cases. Testing an admittedly trivial scene on my M1 MacBook Air, Karma in Houdini was running slower on Apple Silicon than under Rosetta (1m 54s vs 1m 04). Super trivial scene so wouldn't draw too much from it, but thought it was mildly interesting.
Yes Houdini is a great chess engine and much slower on Apple devices than on Intel or AMD CPUs.
 

Analog Kid

macrumors G3
Mar 4, 2003
9,360
12,603
It's interesting to me that even with that massive cooling system (at least it seems massive in the images), they're still running the M1 Max at the same performance levels.

I'd read an article once upon a time that suggested that among the optimizations in the M1 was that it was only designed to run at one clock rate. I only remember seeing it once, and I can't find it now, so that leads me to believe it was merely speculation.

In general, though, if you limit the number of variables you're designing for you can make a thing simpler and more optimized. Apple doesn't really have a need to bin their processors for performance in the way Intel does, they're differentiating by complexity. So the idea of optimizing for one frequency doesn't seem absurd...
 
Last edited:

ahurst

macrumors 6502
Oct 12, 2021
410
815
Because Geekbench is garbage compared to real workloads.

Throw a properly multithreaded workload at both and the 64-core is closer to 2x scaling.

https://openbenchmarking.org/test/pts/blender
https://openbenchmarking.org/test/pts/stockfish
Hasn't Geekbench been verified as correlating extremely well with the industry-standard SPEC benchmarks that have been used for decades?

A little searching brings up a Medium post from NUVIA, where they tested SPECint 2006 and 2017 against Geekbench 5 on a bunch of CPUs and found a near-perfect linear correlation for both single-threaded and multi-threaded scores (less than 1% error):

1*v7e7EdoYdr4Agk3g3-rOaw.png



1*jDXMjk3WDhEq9itS1hvCzw.png
 

Ethosik

Contributor
Oct 21, 2009
8,142
7,120
Demonstration of what?

Their capability? Performance is irrelevant. Switch gets plenty of ports including Witcher 3 and Doom etc.

It’s an economic issue. The number of mac owners using steam is tiny, no matter how good it is they wouldn’t sell enough to be worth their while.

Apple would have to pay several developers to port the games, for a sustained period of time to Increase the numbers to a level that will sustain developers doing it themselves.
Yep. As I have been saying in many threads as a game developer - it’s about marketshare. The hottest game right now (Elden Ring) runs great on my GTX 1080 at 1080p resolution. Steam reports most used GPUs on Windows are 1060 and 1650. The M1 Ultra is at least better than those.
 

Ethosik

Contributor
Oct 21, 2009
8,142
7,120
The Switch on the other hand started out with mostly Nintendo 1st party games and very quickly the customers came, so the developers were happy to develop for it.
And the Switch had an outdated hardware when it launched too!
 

Ethosik

Contributor
Oct 21, 2009
8,142
7,120
It's interesting to me that even with that massive cooling system (at least it seems massive in the images), they're still running the M1 Max at the same performance levels.

I'd read an article once upon a time that suggested that among the optimizations in the M1 was that it was only designed to run at one clock rate. I only remember seeing it once, and I can't find it now, so that leads me to believe it was merely speculation.

In general, if you limit the number of variables you're designing for you can make thing simpler and more optimized. Apple doesn't really have a need to bin their processors for performance in the way Intel does, they're differentiating by complexity. So the idea doesn't seem absurd...
I wonder if this is partially the chip shortage issue. Pumping out M1 Max chips that can be in several products vs only the Mac Studio.
 

JimmyjamesEU

Suspended
Jun 28, 2018
397
426
Hasn't Geekbench been verified as correlating extremely well with the industry-standard SPEC benchmarks that have been used for decades?

A little searching brings up a Medium post from NUVIA, where they tested SPECint 2006 and 2017 against Geekbench 5 on a bunch of CPUs and found a near-perfect linear correlation for both single-threaded and multi-threaded scores (less than 1% error):

1*v7e7EdoYdr4Agk3g3-rOaw.png



1*jDXMjk3WDhEq9itS1hvCzw.png
Great post. Of course it will be ignored by those looking to post chess benchmarks and software that is unoptimised for Apple Silicon.
 

mi7chy

macrumors G4
Oct 24, 2014
10,623
11,296
There is an Asrock server board that can take a 64 core Epyc 7763 .


Too bad single socket Epyc 7763 is slightly slower and double the cost compared to Threadripper 3990x. Where the 7763 really shines is 1.87x (closer to perfect 2x) performance scaling across dual socket compared to 1.4x scaling from M1 to M1 Pro/Max 1.4x doubling of CPU performance cores on same chip.

https://openbenchmarking.org/test/pts/blender
https://openbenchmarking.org/test/pts/stockfish
 
Last edited:

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
Iirc I saw an article that the upcoming threadripper pro or whatever they call them are going to be OEM only and won’t be available to buy for consumers.

You’re only going to get them through buying a whole workstation machine.

Threadripper isn't a high volume product. Ryzen 9 16 cores is good enough for a wider range of folks. Like folks are pointing out it is a $2-6K processors. The size of the market really isn't that large.

The binding is to a OEM vendors firmware so Lenovo buyers could sell to someone wanting to upgrade their Lenovo. That will keep the 'used' prices high but it won't stop them completely.

Long , long term AMD could do an unlocked , "non Pro" version, but I suspect there probably won't be much demand for one. ( at that point the mainstream Ryzen 9 would be Zen 4 or 5 and more cost effective. )


AMD like they did last time they had the advantage over Intel, and like Intel did recently for years. They’re resting on their success and taking advantage of it. Instead of pushing on with what got them ahead.

The Threadripper is a refactored Epyc SoC. The point of "getting ahead" is making a better Epyc not a Threadripper. When there is a large excess or materials for a Epyc packages than AMD can get another "side hustle product" out of the 'golden sample' chiplets that can clock higher as fall out from doing the Epyc work.

Two fold problem at the moment. First, Not enough wafer starts to go around. So it makes about zero sense to sell a slightly lower margin Threadripper when AMD can make more money selling it as an Epyc. AMD has been holding the Threadripper 5000 back for almost a year. It isn't like they didn't prepare the product. They just didn't have enough product to sell.

When the wafer supply chain is back to running in excess of demand and AMD is still throttling Threadripper releases then will have much more evidence of a 'lazy' problem.

Secondly, while Threadripper has been on hold AMD has deployed stuff like VCache on top end of Ryzen and on Epyc. Dell/HP/Lenovo sell top end workstations based on dual socket Xeon SP. Other than scarcity if Epycs there isn't much to stop them from doing a 'workstation' by taking an Epyc board and putting it into a tower. IF had some HPC code wanted to do 'deskside' that would be viable using an Epyc VCache solution while waited to see if VCache Threadripper every showed up.

AMD has been moving on Epyc update progresssion. It is the trickle down to Threadripper that is small. But also not all that necessary because AMD isn't blocking the mainstream Ryzen from steady progressions. Threadripper is just in a small Goldilocks zone between them. Given AMD has Nvidia GPU 'war' to fight also. It isn't a stratically important Goldilocks zone. Even so, Apple is still behind AMD in workstation SoC without even putting tons of effort.
 
  • Like
Reactions: Ulfric

leman

macrumors Core
Oct 14, 2008
19,521
19,678
It's interesting to me that even with that massive cooling system (at least it seems massive in the images), they're still running the M1 Max at the same performance levels.

I'd read an article once upon a time that suggested that among the optimizations in the M1 was that it was only designed to run at one clock rate. I only remember seeing it once, and I can't find it now, so that leads me to believe it was merely speculation.

In general, though, if you limit the number of variables you're designing for you can make a thing simpler and more optimized. Apple doesn't really have a need to bin their processors for performance in the way Intel does, they're differentiating by complexity. So the idea of optimizing for one frequency doesn't seem absurd...

Yes, M1 only scales horizontally, by increasing the number of clusters. The clusters themselves are “locked” and will perform the same in all M1 chips. Some folks familiar with chip design speculated that this is how Apple achieves their superior power efficiency: by optimizing the very layout to max out at a relatively low speed. These chips can scale down very well (I remember seeing that M1 has a power level with something really ridiculous like 50mhz), but it cant be pushed higher even if there is enough thermal headroom. This is very different to x86 designs.
 

Analog Kid

macrumors G3
Mar 4, 2003
9,360
12,603
something really ridiculous like 50mhz
I get it, 50MHz seems silly because the static power is probably dominating the dynamic power at that point on a modern chip, but it just made me remember how stupid fast our current systems are. The original Mac ran a single core 68000 with a pitiful instructions per clock metric at a clock rate just under 8MHz and it seemed to run Excel just fine in its day.

The original Mac had 128K of RAM, the new Mac Studio has 128GB of RAM-- more than a million times more. ?

What on earth are we doing with all that extra computing power? Refreshing Facebook...

Turns out though that some things remain the same. When I went to double check the clock rates, I found this little tidbit from Burrell that I'd forgotten about the original Mac:

"The RAM is triple-ported; this means that the 68000, screen-displaying hardware, and sound-output hardware have periodic access to the address and data buses, so that the video, the sound, and the current 68000 task appear to execute concurrently."​

Turns out the unified memory architecture is taking us back to the future...
 
Last edited:

jujoje

macrumors regular
May 17, 2009
247
288
Yes Houdini is a great chess engine and much slower on Apple devices than on Intel or AMD CPUs.

Initially thought this was a sardonic reference to the legendary stockfish benchmark, but turns out there’s a Houdini chess engine that is influenced by stockfish. Learn something new every day :)

But yeah, the 3D programme, not the other one.
 

Homy

macrumors 68030
Jan 14, 2006
2,509
2,460
Sweden
@JimmyjamesEU posted this in another thread but it seems that M1 Ultra will be outperforming 3090 as Apple claimed. Here are some results from rendering Disney's Moana in Redshift:

2x 2080ti = 34m:17s
M1 Max = 28m:27s
Single 3090 = 21m:45s
2x 3090 = 12m:44s

My guess for M1 Ultra 48c is 18m:58s and for 64c 14m:13s.

 
  • Love
Reactions: ahurst and ader42

jujoje

macrumors regular
May 17, 2009
247
288
@JimmyjamesEU posted this in another thread but it seems that M1 Ultra will be outperforming 3090 as Apple claimed. Here are some results from rendering Disney's Moana in Redshift:

2x 2080ti = 34m:17s
M1 Max = 28m:27s
Single 3090 = 21m:45s
2x 3090 = 12m:44s

My guess for M1 Ultra 48c is 18m:58s and for 64c 14m:13s.


From the thread pretty much, but this is one of the situations where the memory makes a significant difference. It does kind of raise the question of how often you are going to be trying to render something as large as the Moana set locally*, but also points to how things like the Blender BMW or classroom benchmarks don't really reflect the advantages of all that sweet sweet memory. The M1 Ultra will be great for rendering large scenes and, more importantly, the time to first pixel should be much faster as it doesn't have to move things from RAM to GPU memory.

Looking forward to seeing some benchmarks in the next few days (but suspect we'll all be arguing over unrepresentative Blender benchmarks :D)

* Possibly, as USD becomes a thing, splitting scenes up into individual shots becomes less of a necessity and you'd just have one usd with all the things in, so displaying and rendering large scene would become more common at least as far as film and tv goes (feature animation would probably benefit most from this approach, hence Pixar creating usd).
 

Gerdi

macrumors 6502
Apr 25, 2020
449
301
These chips can scale down very well (I remember seeing that M1 has a power level with something really ridiculous like 50mhz), but it cant be pushed higher even if there is enough thermal headroom. This is very different to x86 designs.

That makes not much sense. The "pushing higher" is a process and physical design property not an architecture property in the first place - this is unless you assume that the M1 is already operating at the maximum performance corner - max Voltage, lowest VT, maximum buffered, highest performance cells - which is not the case.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,678
That makes not much sense. The "pushing higher" is a process and physical design property not an architecture property in the first place - this is unless you assume that the M1 is already operating at the maximum performance corner - max Voltage, lowest VT, maximum buffered, highest performance cells - which is not the case.

I am not a CPU engineer, so I don’t know how all this works on the technical level, but it is a fact that M1 Firestorm cores are limited to 3.2ghz (which they achieve running at approx 5W). This upper limit is the same for every M1 based product, be it the passively cooled Air or an iPad or a high-end workstation like the Studio. If Apple had a technical capability to push these CLUs past 3.2ghz on their desktop systems, they would have likely done. Another striking observation that M1 does not use frequency-based binning at all - in stark contrast to x86 CPUs. All M1 cluster performance characteristics are identical no matter which product we look at. The only binning occurs, again, at the horizontal level, by tweaking the cluster size.

I have no idea how Apple does it. The 3.2ghz limit could be a physical property of this CPU design, or it could merely be a statistical common ground most manufactured chips can sustain, or it could be a business decision on Apples side (a weird one). Regardless of the technical reason, this is what we have, and this is what I meant in my post: M1 relies exclusively on horizontal scaling, the cores themselves do not exhibit any performance scaling/binning across products. This is true for the CPU and the GPU equally.
 

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
I am not a CPU engineer, so I don’t know how all this works on the technical level, but it is a fact that M1 Firestorm cores are limited to 3.2ghz (which they achieve running at approx 5W). This upper limit is the same for every M1 based product, be it the passively cooled Air or an iPad or a high-end workstation like the Studio. If Apple had a technical capability to push these CLUs past 3.2ghz on their desktop systems, they would have likely done. Another striking observation that M1 does not use frequency-based binning at all - in stark contrast to x86 CPUs. All M1 cluster performance characteristics are identical no matter which product we look at. The only binning occurs, again, at the horizontal level, by tweaking the cluster size.

I have no idea how Apple does it. The 3.2ghz limit could be a physical property of this CPU design, or it could merely be a statistical common ground most manufactured chips can sustain, or it could be a business decision on Apples side (a weird one). Regardless of the technical reason, this is what we have, and this is what I meant in my post: M1 relies exclusively on horizontal scaling, the cores themselves do not exhibit any performance scaling/binning across products. This is true for the CPU and the GPU equally.

I think it might be a design choice. As you say it allows them to easily scale horizontally while maintaining power efficiency and they get to reuse the same basic cluster design over and over again - just cut and paste. Obviously literally in the case of Max -> Ultra. I know there is probably more to it than that but it makes sense. It simplifies the design. Maybe there are other considerations like not having to bin for tight frequency tolerances increases yields but I don't know if they're anywhere near TSMC's limits on that anyway. Maybe?
 

leman

macrumors Core
Oct 14, 2008
19,521
19,678
I think it might be a design choice. As you say it allows them to easily scale horizontally while maintaining power efficiency and they get to reuse the same basic cluster design over and over again - just cut and paste. Obviously literally in the case of Max -> Ultra. I know there is probably more to it than that but it makes sense. It simplifies the design. Maybe there are other considerations like not having to bin for tight frequency tolerances increases yields but I don't know if they're anywhere near TSMC's limits on that anyway. Maybe?

Intel and AMD also use same cores - and even same dies - for different products, by allowing different configurations at various performance and power levels. Apples approach is obviously a design choice, but it’s different enough from the mainstream practice to ask why they do it this way. Is it an arbitrary restriction (the chips can run faster but Apple limits them), is it a physical restriction of the design, is it a statistical restriction? At any rate, it’s more sane and transparent than what happens elsewhere.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.