Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

JimmyjamesEU

Suspended
Jun 28, 2018
397
426
All the news sources I follow don't include Geekbench garbage so just a habit seeing lesser garbage Cinebench R23. Actually prefer the industry standardize on Blender.
Probably more to do with you following terrible news sources than anything to do with geekbench. After all you tried to pass off hashcat as a valid metric, and also a video where someone excluded hardware acceleration on a Mac vs the pc with hardware acceleration.
 

oz_rkie

macrumors regular
Apr 16, 2021
177
165
remember how much the mac pro costs....with afterburner and after look at the mac studio
Amen
Well, I come from the Windows world and only recently bought into macs (have 2 m1s at the moment for work and personal use), so I was mainly comparing to the cost of building a high end workstation where I can pick a high end thread-ripper and an entry level GPU if I'd like, to suit my use case. Would be amazing if I had similar options on the mac side as well (powerful cpu and entry level gpu).

Mac Pro was a ridiculously priced machine anyway, I mean apart from workplaces/studios that had a very specific need to buy a macOS based machine with those specs, no one in their right mind would buy it unless money was literally not a concern for them. I would put the Mac pro very much in the professional space in terms of pricing where as the just announced mac studio would fall into a prosumer space (atleast imo).
 

januarydrive7

macrumors 6502a
Oct 23, 2020
537
578
Mac Pro was a ridiculously priced machine anyway, I mean apart from workplaces/studios that had a very specific need to buy a macOS based machine with those specs, no one in their right mind would buy it unless money was literally not a concern for them. I would put the Mac pro very much in the professional space in terms of pricing where as the just announced mac studio would fall into a prosumer space (atleast imo).
On the contrary, those who purchase these high-prices machines do so specifically because money is a concern for them -- they make a lot more money with better machines, it makes perfect sense to spend the money to make more.
 
  • Like
Reactions: windowsblowsass

Gnattu

macrumors 65816
Sep 18, 2020
1,107
1,671
There is still no evidence that this causes the rather big discrepancy. Apple is active in the Embree project, so nothing prevents them from "hand-crafting" optimizations for the M1.
Oh, come on, the embree library is open-source, there is no native neon implementation inside it.

I think it's more likely that Apple's SIMD instructions (which are only half as wide as Intel's AVX 2) are simply not as fast, perhaps because Apple prefered to spend the chip real estate on other things (such as their highly specialized video encoders).
Wrong and wrong. Apple's SIMD is not narrower than Intel's in the hardware, where Intel has 2 AVX2 and Apple as 4 neon 128, each adds up to 512 bit wide SIMD.
 
Last edited:

oz_rkie

macrumors regular
Apr 16, 2021
177
165
On the contrary, those who purchase these high-prices machines do so specifically because money is a concern for them -- they make a lot more money with better machines, it makes perfect sense to spend the money to make more.
I already mentioned that, if you read the part before the other part you highlighted
I mean apart from workplaces/studios that had a very specific need to buy a macOS based machine with those specs
I was responding to MayaUser to say that I would not compare the pricing model of the Mac Pro to the Mac Studio, simply because Mac Pro was a professional device where only the people who were making a serious ROI from the machine would be buying it where as with the mac studio, it is (relatively) cheaper that consumers/prosumers who might not make a real ROI from the device might still want to get it. My point was that in such cases, it is a bit of a shame that if you were looking to buy the highest end CPU mac has to offer, you also NEED to buy the GPU even if you don't need it.
 

januarydrive7

macrumors 6502a
Oct 23, 2020
537
578
I already mentioned that, if you read the part before the other part you highlighted
I mean apart from workplaces/studios that had a very specific need to buy a macOS based machine with those specs
I was responding to MayaUser to say that I would not compare the pricing model of the Mac Pro to the Mac Studio, simply because Mac Pro was a professional device where only the people who were making a serious ROI from the machine would be buying it where as with the mac studio, it is (relatively) cheaper that consumers/prosumers who might not make a real ROI from the device might still want to get it. My point was that in such cases, it is a bit of a shame that if you were looking to buy the highest end CPU mac has to offer, you also NEED to buy the GPU even if you don't need it.
Good point, although the base M1 does provide the highest end CPU (ST) mac has to offer. If you're getting pro/max/ultra, then presumably you're looking for a machine either with higher end GPU or better MT due to workflows that would benefit your bottom line.

Your point stands, for enthusiasts who want a lot of MT for the sake of a lot of MT.
 

Gnattu

macrumors 65816
Sep 18, 2020
1,107
1,671
There is still no evidence that this causes the rather big discrepancy.
If you want evidence, let me give you the evidence.

Below is from the patch that implements Apple silicon support. This is only a CMake config and not the implementation, so that it is readable enough for ones without programming background. When building for arm64 on Mac, Embree emulates avx2 on top of neon instead of implementing a native neon code path.
Screen Shot 2022-03-09 at 05.06.32.png
 

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,677
Presumably for the same reason that Apple fans simply discount it. :p

There are good reasons to discount it, or at least not taking it too seriously. The fact that it doesn’t properly take advantage of M1 SIMD hardware is one big red flag.

There is still no evidence that this causes the rather big discrepancy. Apple is active in the Embree project, so nothing prevents them from "hand-crafting" optimizations for the M1. I think it's more likely that Apple's SIMD instructions (which are only half as wide as Intel's AVX 2) are simply not as fast, perhaps because Apple prefered to spend the chip real estate on other things (such as their highly specialized video encoders).

Apples SIMD units are only half as wide, but Apple has twice as many of them. Apple runs on a lower clock, so on small problem sizes and very data-parallel code Intel will have a slight edge due to higher clocks, but Apple has much larger caches so they will perform better in more complex problems.

The big issue with embree is that M1 path is currently coded using x86 SSE on top of an SSE-to-NEON emulation layer. Apple can do little tweaks here and there, but they are still limited by the overall approach. And rewriting everything in NEON on a project owned by a competitor is an unreasonable waste of resources.
 

Andropov

macrumors 6502a
May 3, 2012
746
990
Spain
Apple is active in the Embree project, so nothing prevents them from "hand-crafting" optimizations for the M1.
I have a pile of issues of a certain journal on my desk that I intend to read. I actively read them from time to time. Certainly nothing prevents me from reading all of them. And yet, the pile grows larger every week. I wonder how's that possible.
 

Gnattu

macrumors 65816
Sep 18, 2020
1,107
1,671
And rewriting everything in NEON on a project owned by a competitor is an unreasonable waste of resources.
This is not necessarily the case though. Embree is an open-source project licensed under Apache License 2.0, Apple can also use this library if they want, they can even make a folk if Intel really don't want to cooperate. If it has real-world benefit other than make the benchmark numbers look prettier, they can, and in my opinion, should provide an optimized neon implementation. Vendors contribute their code to many math and crypto libraries, where platform dependent code paths are all over the place and the general implementation is usually too slow.
 

januarydrive7

macrumors 6502a
Oct 23, 2020
537
578
This is not necessarily the case though. Embree is an open-source project licensed under Apache License 2.0, Apple can also use this library if they want, they can even make a folk if Intel really don't want to cooperate. If it has real-world benefit other than make the benchmark numbers look prettier, they can, and in my opinion, should provide an optimized neon implementation. Vendors contribute their code to many math and crypto libraries, where platform dependent code paths are all over the place and the general implementation is usually too slow.
The "gotcha" here is that the anti-Apple crowd would then say that Apple is somehow cheating in benchmarks by optimizing for their platform. As usual, it's a lose-lose situation.
 
  • Like
Reactions: JMacHack

Gnattu

macrumors 65816
Sep 18, 2020
1,107
1,671
The "gotcha" here is that the anti-Apple crowd would then say that Apple is somehow cheating in benchmarks by optimizing for their platform. As usual, it's a lose-lose situation.
I don't think "optimization" is "cheating" if the workload can indeed be optimized this way. I have a faster method to solve a math problem and get the right answer, would my math teacher call me cheating? Not at all.
 
  • Like
Reactions: bobcomer

januarydrive7

macrumors 6502a
Oct 23, 2020
537
578
I don't think "optimization" is "cheating" if the workload can indeed be optimized this way. I have a faster method to solve a math problem and get the right answer, would my math teacher call me cheating? Not at all.
I don't disagree with you, I'm just extrapolating the arguments I've seen in the past; e.g., arguments along the lines of: "M1-based machines are only better at ProRes workflows because they're cheating."

Also, your math teacher would likely be reasonable, unlike a lot of the rhetoric we see here.
 

Gnattu

macrumors 65816
Sep 18, 2020
1,107
1,671
I don't disagree with you, I'm just extrapolating the arguments I've seen in the past; e.g., arguments along the lines of: "M1-based machines are only better at ProRes workflows because they're cheating."

Also, your math teacher would likely be reasonable, unlike a lot of the rhetoric we see here.
Yes, for all workloads that they perform well, they are cheating. Only the ones they suck at can reflect the real performance. M1 series are simply that bad😆.
 

Rigby

macrumors 603
Aug 5, 2008
6,257
10,215
San Jose, CA
Oh, come on, the embree library is open-source, there is no native neon implementation inside it.
Embree uses the SSE2NEON macro library, which converts SSE intrinsics to Neon intrinsics at compile time. There is no emulation as long as you don't use wider vectors than 128-bit (which Neon does not support).

Wrong and wrong. Apple's SIMD is not narrower than Intel's in the hardware, where Intel has 2 AVX2 and Apple as 4 neon 128, each adds up to 512 bit wide SIMD.
Recent Intel CPUs (including Alder Lake) actually have 3 ports with vector ALUs. But even if what you wrote was accurate, you can't just equate 2x256-wide with 4x128-wide since there are many other factors that determine the SIMD performance (such as load througput).
 

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,677
This is not necessarily the case though. Embree is an open-source project licensed under Apache License 2.0, Apple can also use this library if they want, they can even make a folk if Intel really don't want to cooperate. If it has real-world benefit other than make the benchmark numbers look prettier, they can, and in my opinion, should provide an optimized neon implementation. Vendors contribute their code to many math and crypto libraries, where platform dependent code paths are all over the place and the general implementation is usually too slow.

Forking is out of the question, as nobody would use Apples fork. And while I am sure that Apple could submit a NEON backend to the official emreee… what would be the point, really? Apple already has an official ray tracing framework on their platforms - Metal. And that’s where their focus is going forward.
 

Rigby

macrumors 603
Aug 5, 2008
6,257
10,215
San Jose, CA
If you want evidence, let me give you the evidence.

Below is from the patch that implements Apple silicon support. This is only a CMake config and not the implementation, so that it is readable enough for ones without programming background. When building for arm64 on Mac, Embree emulates avx2 on top of neon instead of implementing a native neon code path.
View attachment 1969913
All that shows is that the Neon SIMD instructions are more limited. Obviously AVX2 (256-bit) intrinsics need to be emulated because Neon is limited to 128-bit vectors. But you can also choose to use SSE (128-bit) instructions as target during compilation, in which case they will be converted at compile time and no emulation takes place.
 

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,677
Embree uses the SSE2NEON macro library, which converts SSE intrinsics to Neon intrinsics at compile time. There is no emulation as long as you don't use wider vectors than 128-bit (which Neon does not support).

Of course there is emulation. SSE2NEON emulates ISA semantics. Not all things have a one to one mapping, and many algorithms would be developed differently for SSE and NEON to achieve optimal results. For example, my SSE-optimized geometry primitive code heavily relies on PMOVMSKB, but that approach doesn’t work at all for ARM which lacks this instruction family. But NEON has very fast horizontal operations one can use instead.


Recent Intel CPUs (including Alder Lake) actually have 3 ports with vector ALUs.

Not all of which have the same capabilities.
 

Rigby

macrumors 603
Aug 5, 2008
6,257
10,215
San Jose, CA
Of course there is emulation. SSE2NEON emulates ISA semantics. Not all things have a one to one mapping, and many algorithms would be developed differently for SSE and NEON to achieve optimal results. For example, my SSE-optimized geometry primitive code heavily relies on PMOVMSKB, but that approach doesn’t work at all for ARM which lacks this instruction family. But NEON has very fast horizontal operations one can use instead.
The vast majority of SSE intrinsics are converted 1:1:


Not all of which have the same capabilities.
Do the M1's SIMD units all have the same capabilities?
 

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,677

Dubious statement at best. Can you quantify how many intrinsic have a one to one correspondence? Because just browsing through things I definitely see a lot of complex mappings. Also, what about things SSE does not have and therefore not use but NEON does?

Do the M1's SIMD units all have the same capabilities?

Yes, with the exception of division and special function which are limited to a single port. But for mul, add, fma Firestorm has double throughput of any x86 CPU, which compensates the narrower SIMD width. Of course, the situation is complicated by the fact that Intel can do addition and multiplication on different ports, so you can do two 256bit adds and one 256bit mul or via versa in parallel under certain circumstances, and there are differences in clocks, latencies and cache performance, but yeah… in general, I would expect that on a straightforward, well optimized SIMD workload (AVX2 vs. NEON) the difference in performance will more or less mirror the difference in clock rate, as throughout per clock is very similar between Firestorm and modern x86 architectures.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.