Why? Still extremely good value, a M4 Pro will not make too big of a difference.I feel like I got ripped off buying an M3Pro MBP last November.
It’s never ending cycle. My 64 GB M1 Max still runs great, and barring Apple getting M4 Max or M5 max to 256 GB RAM, my plan is to upgrade to M6 or M5 if it is worth.I feel like I got ripped off buying an M3Pro MBP last November.
Yes you can still unpack the weights if you have access on the NPU it isn't free but it would reduce latency for sure, also smaller models can be even more bandwidth limited depending on architecture.Weight storage format does not have to be the same as the internal ALU precision. You can still use bandwidth-saving quantized models and unpack the weights in the NPU. It's as you say, the NPU will be bandwidth-limited on larger models. Increasing the ALU rate is probably not very helpful in this context.
Managed to get a preliminary benchmark on this. The new "AI coprocessor" really fools current version of Geekbench6 and makes the single core score unreasonably high that I even doubt if the Geekbench6 even means anything for M4 and beyond anymore, as the composite score bias towards the "AI workload" like object detection way too much. This might also explain why Apple's performance claim is not based on Geekbench score for M4 iPad, probably because the performance gain would be misleadingly higher than it actually is for most workloads.
Also, the first benchmark reporing 3.9GHz is probably using an early sample or some kind of detection error. The M4 in new iPad Pro can go up to 4.4GHz.
Oh, just found there is an entry in the public database:
Quick note on this: the GB6 CPU test does not use the ML coprocessor. That’s just the CPU. If that result is real deal, Apple just built the fastest CPU core in a consumer device.
If it makes you feel better, the M2 Pro/Max buyers feel more ripped off since the M3s came out in the same year.I feel like I got ripped off buying an M3Pro MBP last November.
All assuming the result is legit:If you look at the sub item you will notice that the object detection perf is more than doubled compared with M3 while other items stay within in the expected and more reasonable 10%-20% range. At least for the object detection workload, some magic coprocessor is being activated. I think it is inside the new AMX.
That isn't how this works. Yes, GB6 Object Detection and Background Blur are based on ML technology, but Primate Labs doesn't write any part of GB's CPU benchmarks to call into system frameworks which may then use a coprocessor to do the actual work. GB CPU is explicitly aimed at testing only the CPU, so Primate compiles open source implementations of the algorithms they want to test for the CPU.If you look at the sub item you will notice that the object detection perf is more than doubled compared with M3 while other items stay within in the expected and more reasonable 10%-20% range. At least for the object detection workload, some magic coprocessor is being activated. I think it is inside the new AMX.
GB CPU is explicitly aimed at testing only the CPU, so Primate compiles open source implementations of the algorithms they want to test for the CPU.
GB6 uses geometric mean across all the sub-benchmarks to create the final composite score.
Uh, unpacking the weights IS free!Yes you can still unpack the weights if you have access on the NPU it isn't free but it would reduce latency for sure, also smaller models can be even more bandwidth limited depending on architecture.
Still without software this have limited use for anyone that is not an Apple.
For the M4 case, it is the former. Add optimizations for Int8 Matrix Multiply is made specifically for AI workloads and makes a lot of sense, because such workloads will be expected to be more common.So how much are GB deficiencies or being outdated for ML characterization vs the manufacturers (including but not limited to Apple) “gaming” their implementations to optimize their benchmark scores without caring about the resulting impact on actual performance in the field?
I’ll admit that I’m prejudiced and I believe the later is the real issue.
So your analysis is that Apple makes deliberate changes to their HW designs to achieve better scores of GB6-ML and other benchmarks and thenSo how much are GB deficiencies or being outdated for ML characterization vs the manufacturers (including but not limited to Apple) “gaming” their implementations to optimize their benchmark scores without caring about the resulting impact on actual performance in the field?
I’ll admit that I’m prejudiced and I believe the later is the real issue.
Another theory to this is that, with this generation, they really exposed the AMX instructions as SME and being picked up by Geekbench 6.But GB does use the `I8MM`extensions on arm cpus and that feature is used for quantized machine learning workloads. This extension includes Int8 Matrix Multiply instructions which can be implemented in the AMX and be exposed as arm instructions under this extension.
And this is in the iPad. I wonder if there will be any differences in the MacBook Pro.Geekbench results for M4:
Better than I thought especially with single core.
No, it doesn't look like it. Just eyeball the rest of the individual scores. You're still looking at >15%, assuming my eyeballs can do math well enough.[...] If my guessing about the integer/float test classification is right, then the final composite score will be more closer to 10% over M3 instead of current almost 20% over M3 if we remove this outlier test
Apple has never been known to game benchmarks. But they are consistently working smart on top of working hard, when it comes to their silicon. The marketing dept. is not above cherrypicking data on occasion (less often than you might think though), but the engineers are laser-focused on maximizing real performance. Or rather, usually, perf/power.So how much are GB deficiencies or being outdated for ML characterization vs the manufacturers (including but not limited to Apple) “gaming” their implementations to optimize their benchmark scores without caring about the resulting impact on actual performance in the field?
I’ll admit that I’m prejudiced and I believe the later is the real issue.
We'll see. But in the meantime, why is everyone fixated on IPC?
This Twitter user seems to saying the same thing as speculated by @Gnattu
That it’s really the SME that’s helping here. Granted this person is comparing M3 Max with fans to fanless M4 and adjusting for frequency to say about 3% IPC increase.
I think you are leaping to extremely shaky conclusions.But GB does use the `I8MM`extensions on arm cpus and that feature is used for quantized machine learning workloads. This extension includes Int8 Matrix Multiply instructions which can be implemented in the AMX and be exposed as arm instructions under this extension.
No, it is not geometric mean, it is a weighted arithmetic mean that integer tests have more weight. One interesting thing is, it seems like the object detection is classified as integer score which has more weight because it is doing "int8" arithmetic. If my guessing about the integer/float test classification is right, then the final composite score will be more closer to 10% over M3 instead of current almost 20% over M3 if we remove this outlier test
For this to be true, GB6 would need to already be shipping alternate code paths which use SME instructions in their software. Why would they be doing this prior to availability of any CPUs which can execute SME on the platforms they support? How would they have debugged it, and made sure it worked? Look at this giant Arm ISA feature matrix:Another theory to this is that, with this generation, they really exposed the AMX instructions as SME and being picked up by Geekbench 6.
Great so this will work just dandy in an ever luvin 27" iMacit will draw a fair bit less power (good for MacBooks) and be cheaper to produce so prices should hold the line and we might see more base RAM or storage since Apple would have more margin to work with.
If true, that smokes my M1 Max on single /Multi Core CPU. For my workloads, M2 Ultra is still the king, M3 Max is about 40% better than M1 Max. Can’t wait for what they will do with high end M4 chips.Geekbench results for M4:
Better than I thought especially with single core.