No it doesn’t.Same also applies to CPU to utilize 20 threads.
Last edited by a moderator:
No it doesn’t.Same also applies to CPU to utilize 20 threads.
Probably more to do with you following terrible news sources than anything to do with geekbench. After all you tried to pass off hashcat as a valid metric, and also a video where someone excluded hardware acceleration on a Mac vs the pc with hardware acceleration.All the news sources I follow don't include Geekbench garbage so just a habit seeing lesser garbage Cinebench R23. Actually prefer the industry standardize on Blender.
Well, I come from the Windows world and only recently bought into macs (have 2 m1s at the moment for work and personal use), so I was mainly comparing to the cost of building a high end workstation where I can pick a high end thread-ripper and an entry level GPU if I'd like, to suit my use case. Would be amazing if I had similar options on the mac side as well (powerful cpu and entry level gpu).remember how much the mac pro costs....with afterburner and after look at the mac studio
Amen
On the contrary, those who purchase these high-prices machines do so specifically because money is a concern for them -- they make a lot more money with better machines, it makes perfect sense to spend the money to make more.Mac Pro was a ridiculously priced machine anyway, I mean apart from workplaces/studios that had a very specific need to buy a macOS based machine with those specs, no one in their right mind would buy it unless money was literally not a concern for them. I would put the Mac pro very much in the professional space in terms of pricing where as the just announced mac studio would fall into a prosumer space (atleast imo).
Oh, come on, the embree library is open-source, there is no native neon implementation inside it.There is still no evidence that this causes the rather big discrepancy. Apple is active in the Embree project, so nothing prevents them from "hand-crafting" optimizations for the M1.
Wrong and wrong. Apple's SIMD is not narrower than Intel's in the hardware, where Intel has 2 AVX2 and Apple as 4 neon 128, each adds up to 512 bit wide SIMD.I think it's more likely that Apple's SIMD instructions (which are only half as wide as Intel's AVX 2) are simply not as fast, perhaps because Apple prefered to spend the chip real estate on other things (such as their highly specialized video encoders).
I already mentioned that, if you read the part before the other part you highlightedOn the contrary, those who purchase these high-prices machines do so specifically because money is a concern for them -- they make a lot more money with better machines, it makes perfect sense to spend the money to make more.
Same also applies to CPU to utilize 20 threads.
Good point, although the base M1 does provide the highest end CPU (ST) mac has to offer. If you're getting pro/max/ultra, then presumably you're looking for a machine either with higher end GPU or better MT due to workflows that would benefit your bottom line.I already mentioned that, if you read the part before the other part you highlighted
I mean apart from workplaces/studios that had a very specific need to buy a macOS based machine with those specsI was responding to MayaUser to say that I would not compare the pricing model of the Mac Pro to the Mac Studio, simply because Mac Pro was a professional device where only the people who were making a serious ROI from the machine would be buying it where as with the mac studio, it is (relatively) cheaper that consumers/prosumers who might not make a real ROI from the device might still want to get it. My point was that in such cases, it is a bit of a shame that if you were looking to buy the highest end CPU mac has to offer, you also NEED to buy the GPU even if you don't need it.
If you want evidence, let me give you the evidence.There is still no evidence that this causes the rather big discrepancy.
Presumably for the same reason that Apple fans simply discount it.
There is still no evidence that this causes the rather big discrepancy. Apple is active in the Embree project, so nothing prevents them from "hand-crafting" optimizations for the M1. I think it's more likely that Apple's SIMD instructions (which are only half as wide as Intel's AVX 2) are simply not as fast, perhaps because Apple prefered to spend the chip real estate on other things (such as their highly specialized video encoders).
I have a pile of issues of a certain journal on my desk that I intend to read. I actively read them from time to time. Certainly nothing prevents me from reading all of them. And yet, the pile grows larger every week. I wonder how's that possible.Apple is active in the Embree project, so nothing prevents them from "hand-crafting" optimizations for the M1.
This is not necessarily the case though. Embree is an open-source project licensed under Apache License 2.0, Apple can also use this library if they want, they can even make a folk if Intel really don't want to cooperate. If it has real-world benefit other than make the benchmark numbers look prettier, they can, and in my opinion, should provide an optimized neon implementation. Vendors contribute their code to many math and crypto libraries, where platform dependent code paths are all over the place and the general implementation is usually too slow.And rewriting everything in NEON on a project owned by a competitor is an unreasonable waste of resources.
The "gotcha" here is that the anti-Apple crowd would then say that Apple is somehow cheating in benchmarks by optimizing for their platform. As usual, it's a lose-lose situation.This is not necessarily the case though. Embree is an open-source project licensed under Apache License 2.0, Apple can also use this library if they want, they can even make a folk if Intel really don't want to cooperate. If it has real-world benefit other than make the benchmark numbers look prettier, they can, and in my opinion, should provide an optimized neon implementation. Vendors contribute their code to many math and crypto libraries, where platform dependent code paths are all over the place and the general implementation is usually too slow.
I don't think "optimization" is "cheating" if the workload can indeed be optimized this way. I have a faster method to solve a math problem and get the right answer, would my math teacher call me cheating? Not at all.The "gotcha" here is that the anti-Apple crowd would then say that Apple is somehow cheating in benchmarks by optimizing for their platform. As usual, it's a lose-lose situation.
If true, all Alder Lake benchmark wars will switch to single core perfMac13,2 - Geekbench
Benchmark results for a Mac13,2 with an Apple M1 Ultra processor.browser.geekbench.com
Which is expected as the Firestorm core design does not allow it to be clocked significantly higher.If true, all Alder Lake benchmark wars will switch to single core perf
I don't disagree with you, I'm just extrapolating the arguments I've seen in the past; e.g., arguments along the lines of: "M1-based machines are only better at ProRes workflows because they're cheating."I don't think "optimization" is "cheating" if the workload can indeed be optimized this way. I have a faster method to solve a math problem and get the right answer, would my math teacher call me cheating? Not at all.
Yes, for all workloads that they perform well, they are cheating. Only the ones they suck at can reflect the real performance. M1 series are simply that bad😆.I don't disagree with you, I'm just extrapolating the arguments I've seen in the past; e.g., arguments along the lines of: "M1-based machines are only better at ProRes workflows because they're cheating."
Also, your math teacher would likely be reasonable, unlike a lot of the rhetoric we see here.
Embree uses the SSE2NEON macro library, which converts SSE intrinsics to Neon intrinsics at compile time. There is no emulation as long as you don't use wider vectors than 128-bit (which Neon does not support).Oh, come on, the embree library is open-source, there is no native neon implementation inside it.
Recent Intel CPUs (including Alder Lake) actually have 3 ports with vector ALUs. But even if what you wrote was accurate, you can't just equate 2x256-wide with 4x128-wide since there are many other factors that determine the SIMD performance (such as load througput).Wrong and wrong. Apple's SIMD is not narrower than Intel's in the hardware, where Intel has 2 AVX2 and Apple as 4 neon 128, each adds up to 512 bit wide SIMD.
This is not necessarily the case though. Embree is an open-source project licensed under Apache License 2.0, Apple can also use this library if they want, they can even make a folk if Intel really don't want to cooperate. If it has real-world benefit other than make the benchmark numbers look prettier, they can, and in my opinion, should provide an optimized neon implementation. Vendors contribute their code to many math and crypto libraries, where platform dependent code paths are all over the place and the general implementation is usually too slow.
All that shows is that the Neon SIMD instructions are more limited. Obviously AVX2 (256-bit) intrinsics need to be emulated because Neon is limited to 128-bit vectors. But you can also choose to use SSE (128-bit) instructions as target during compilation, in which case they will be converted at compile time and no emulation takes place.If you want evidence, let me give you the evidence.
Below is from the patch that implements Apple silicon support. This is only a CMake config and not the implementation, so that it is readable enough for ones without programming background. When building for arm64 on Mac, Embree emulates avx2 on top of neon instead of implementing a native neon code path.
View attachment 1969913
Embree uses the SSE2NEON macro library, which converts SSE intrinsics to Neon intrinsics at compile time. There is no emulation as long as you don't use wider vectors than 128-bit (which Neon does not support).
Recent Intel CPUs (including Alder Lake) actually have 3 ports with vector ALUs.
The vast majority of SSE intrinsics are converted 1:1:Of course there is emulation. SSE2NEON emulates ISA semantics. Not all things have a one to one mapping, and many algorithms would be developed differently for SSE and NEON to achieve optimal results. For example, my SSE-optimized geometry primitive code heavily relies on PMOVMSKB, but that approach doesn’t work at all for ARM which lacks this instruction family. But NEON has very fast horizontal operations one can use instead.
Do the M1's SIMD units all have the same capabilities?Not all of which have the same capabilities.
The vast majority of SSE intrinsics are converted 1:1:
Do the M1's SIMD units all have the same capabilities?