Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
GB6 doesn't scale multi-core linearly, for those of us with tasks that scale well across cores it isn't as good of a measure as measuring our real workloads - I'm waiting to see the large codebase compilation tests.
Sure, loads that scale linearly will be different - from the numbers already posted I'd expect pretty close to even performance between M2 Pro and M3 Pro. But it's the loads that don't scale linearly that are more interesting, from an architectural standpoint. (Obviously, if you're trying to buy something, your own use case is the most interesting.) They're also a lot more common overall.
 
Does Cinebench 2024 measure your workload better?
Not likely - as I said explicitly in my post, I am waiting to see if compilation shows any changes (as I know the Pro was a developer favourite at the last company I worked for) before I decide that these multi-core scores are vindication for the Apple is always right brigade.
 
Sure, loads that scale linearly will be different - from the numbers already posted I'd expect pretty close to even performance between M2 Pro and M3 Pro. But it's the loads that don't scale linearly that are more interesting, from an architectural standpoint. (Obviously, if you're trying to buy something, your own use case is the most interesting.) They're also a lot more common overall.
Even performance is a loss, your new generation should be unequivocally faster than the last or why did you bother? I.e. Why bother with N3B and the M3 Pro if it is such a bad process that they can't build anything better than the M2 Pro? Of course I don't believe N3B is bad, I think it is a fairly good process, I just think Apple made a bad user hostile set of choices with the M3 Pro CPU cores.
 
It is already confirmed that this time even the 14” M3 Max has high power mode option now. It is either the chassis has seen some change, or the thermal behaviour of these chips have improved?
 
I am waiting to see if compilation shows any changes (as I know the Pro was a developer favourite at the last company I worked for) before I decide that these multi-core scores are vindication for the Apple is always right brigade.
If you are concerned about compilation, wouldn't Geekbench Clang be the benchmark that best shows your workload?

From Geekbench 6 CPU Workloads:
The Clang workload uses the Clang compiler to compile the Lua interpreter, a popular opensource language interpreter.
 
@sv8 wondered about the Ultra (and possible quad-chip M3) being on N3E. Their reasoning was incorrect, but I noticed something just a little while ago that bears on that.

Looking at the Max's die shot, where is UltraFusion?? Did they just hide it (IIRC they did in some of the early pictures of the M1 Max)? If so it would have to be at the bottom, under the GPUs and SLC/RAM controllers, which is reasonable.


Apple has been somewhat duplicitous about the Mn Max die twice before. At the Introduction of the M2 Max ....



screen-shot-2023-01-25-at-5-16-22-pm-png.2148256




The "substantively less than truthful" pattern is pretty clear now. They can't "delight and surprise" later if they reveal it so they 'hide it' because it is a 'secret'. Doing it once that works well, Doing it twice is a 'seen that trick before' event , so a not really surprising non-trick. Doing it a third time smacks of a magican that lost their skill to do anything different. It is just a boring part of the 'act'.



But what if there's another explanation?

They've already changed their pattern (if two generations makes a pattern) - the Pro and Max are clearly quite different from each other, not sharing nearly as much of their design as previous generations.

But .... the basic floor plan of the M2 Max and M3 Max do share tons of overlap. Yes, the Pro went in a slightly different direction from its previous iteration, but the Max sticking to the same 'fotmula'.

Not sharing is a dual edged sword. That means that the Max loosing a 'share some R&D overhead' partner. So would Apple also cut it off from the Studio deployments? If they don't ( Max variants of Studio are sharing with laptops ), then that balloon squeeze's "Lost my R&D overhead sharing partner' to just the Ultra systems to sort out all by their substantively lower volume selves.

If the changes to Pro push up monolithic only Max sales then dropping from the Mac Studio could work.


Perhaps the chip that will be the basis of the Ultra is NOT the Max, this generation. Maybe there's no Ultrafusion connector in the M3 Max die shot because it doesn't exist.

This would be pretty surprising - I wouldn't expect them to do a whole new floorplan for chips making up an Ultra,

Not too surprising. The Pro is a smaller die (more affordable). Loosing the connector would make the Max a smaller die also. (not going to make the Max 'cheap' , but Max has more transistors than Nvidia H100. Smaller because on N3B , but being on N3B costs more too. )

They don't necessarily need a "whole new floorplan". Slice the I/O interfaces off and put UltraFusion on both sides. ( the 'messy' part is likely which side of the 'cut' do they put the Display Engines . that would probably be the biggest 'reshuffling'. So not trivial cut. But the further away from I/O section the less perturbed things would get. ) Can have one , two , or three of the compute core dies and one or two dies of I/O bracketing on either side.
Just need to disaggregate what they already worked out in the base Max design. The center of gravity around the connectivity of the GPU cores , memory+system cache , NPU , and CPU cores could all stay the same. The 'top' and the 'bottom' edges are only what is needed to be more chiplet friendly.


The bigger external problem for Apple though is that there isn't access to making a "bigger than two" packages since the AI boom has lead to Nvidia , AMD, and others to basically buy up all the CoWoS capacity for years. With Info-LIS limited in size , they really only can economically get to access to a ' 2 x max class size' die packaging. So if can only make something in the 1 reticle size zone the single sided UltraFusion is the cheapest , less risky path. Keeps the Max laptops + Studio + Mac Pro bundled into a larger group to aggregate costs over.

The even bigger problem is that thees larger SoCs are a 'one and done' product. Once the next gen MBP 14/16 , Studio , and/or Mac Pro shows up the number now 'old' M(n-1) craters like a rock. No 'hand me down' products to stuff these into. So if replace them every 18-20 months ... that's it... only have 18-20 months to get 100% return on investment. It isn't just the relative low volume of the products but the relatively short lifespan also. ( e.g., the M2 Max died off in MPB 14/16 in less than a year. That isn't a sustainable thing over long term. ) . The 'quick death' thing is deep problem that seems to spur the odd chase for a Rube Goldberg 4-way system.



as it would seem to be way too low volume for that. But... They obviously know a lot more than I do, maybe they see a good reason to do this. Maybe they count it as another learning step towards a future full of high-density chiplet interconnects, and therefore worthwhile just for that.

The Mac Pro is certainly a 'way too low volume' product to support a highly forked die design. But is the Mac Studio also in that category. Mac Studio very likely is not big enough to chop into 'two' and give one '1/2' to overlap with the laptops and the other half to overlap with the Mac Pro.

But if disaggregating for a good chiplet design things should still be largely organized like had a magical 2x-4x bigger reticle limit and one big die. Then subdivide can make it using more practical to make chunks and just attach the virtual single die network back together again. Should not be disaggregating ( 'slicing') across boundaries that the design made very tight and extremely highly coupled.

If all it is mostly about is just 'practice' for the future then the somewhat superflous , 'tacked on at the end' UlfraFusion already was practice for two iterations. it is time to do some real disaggregation. Doesn't have to be chasing 4 way ... better disaggregation at just "2 way" would be an incremental step forward.




Maybe they need to do more work because they want it to go 4-way as well as 2-way?

There is no TSMC production capacity to do a 'single unified image' 4 way ... so it is curious why it keeps popping up on the radar as plausible factor. ( The 'freight train' to this CoWoS bottleneck should have been pretty years ago if was paying attention to the evolution path the AI/ML market was on. )
 
  • Like
Reactions: Rnd-chars
If you are concerned about compilation, wouldn't Geekbench Clang be the benchmark that best shows your workload?

From Geekbench 6 CPU Workloads:

I’m not sure how large that codebase is, in my own work I see all core loading in bursts, I don’t know how reflective that test will be.
 
I think we should wait for benchmarks before concluding that.
I suppose - it still feels like the M3 “Pro” is less of a successor to the M2 Pro and more of an M3 Plus (from a CPU perspective).

Edit: meant this to sound more conciliatory.
 
Last edited:
  • Like
Reactions: Chuckeee
So a Macbook with base M3 8 cores at 4.05 GHz and 5W with the single-core score 3076 is close to the best Intel desktop system with i9-14900K 24 cores at 6 GHz and 35W to 78W with 3409? That's 7-15 times less power usage. Like Borat said Naaajjss!

Skärmavbild 2023-11-01 kl. 19.18.07.png

Skärmavbild 2023-11-01 kl. 21.53.00.png
 
Last edited:
So base M3 with 8 cores at 4.05 GHz and 15W has almost the same single-core score (3076) as Intel's i9-14900K with 24 cores at 6 GHz and 125W (3121)? Like Borat said Naaajjss!
But your comparing power usage for multiscore with scores on single core?

Also this run is kinda weird as avg scores seems to be around => 3250-3300 and 24000-25000 <14900k>.
Screenshot 2023-11-01 at 20.06.07.png
 
So base M3 with 8 cores at 4.05 GHz and 15W has almost the same single-core score (3076) as Intel's i9-14900K with 24 cores at 6 GHz and 125W (3121)? Like Borat said Naaajjss!

What I find even more impressive though is that 14900K is only 60% faster for 8x more power... of course, it would be more interesting to look at a benchmark that scales well with multiple cores, like CB...
 
But your comparing power usage for multiscore with scores on single core?

Also this run is kinda weird as avg scores seems to be around => 3250-3300 and 24000-25000 <14900k>.
View attachment 2305684

What is the power usage of i9-14900 in single-core and M3 power usage in multi-core? Even then Intel uses at least 2x more power. Where is your link to the Geekbench scores? I took what I found reported by news media since it's hard to find anything at Geekbench site. That could be an overclocked system since we don’t see the cpu frequency.
 
Last edited:
If this is accurate, that's an absolutely insane result for a laptop.
Yeah, be really careful with this one. I posted it earlier and was informed the score is literally just made up. Now, that doesn’t mean the person who told me is correct either. It might be wise to wait for actual results.
 
What is the power usage of i9-14900 in single-core and M3 power usage in multi-core? Even then Intel uses at least 2x more power. Where is your link to the Geekbench scores? I took what I found reported by news media since it's hard to find anything at Geekbench site. That could be an overclocked system since we don’t see the cpu frequency.
According to the recent review at Anandtech of the i9-14900K, between 50 and 70 watts on a loaded single threaded task, with an observation of 78 Watts.
1698866909772.png

 
  • Like
Reactions: Chuckeee and Homy
According to the recent review at Anandtech of the i9-14900K, between 50 and 70 watts on a loaded single threaded task, with an observation of 78 Watts.

Oh wow, that’s absolutely insane. One core of modern Intel CPU uses more power than a high-end desktop multi-core CPU if 10 years ago? This doesn’t make any sense.

So 3-5 times more than M3.

you mean 10 times more. Single-core power consumption of M3 will be 5-6 watts.
 
Oh wow, that’s absolutely insane. One core of modern Intel CPU uses more power than a high-end desktop multi-core CPU if 10 years ago? This doesn’t make any sense.



you mean 10 times more. Single-core power consumption of M3 will be 5-6 watts.
Crazy isn’t it? That the M series is even in the same vicinity is astounding.
 
What is the power usage of i9-14900 in single-core and M3 power usage in multi-core? Even then Intel uses at least 2x more power. Where is your link to the Geekbench scores? I took what I found reported by news media since it's hard to find anything at Geekbench site. That could be an overclocked system since we don’t see the cpu frequency.
https://browser.geekbench.com/search?utf8=✓&q=14900k also no one is saying m-series cpu power usage isn't fantastic, you are just comparing max power usage said by producer (which is false for intel) to scores that arent even pushing single core to its limit in a lot of cases.
 
Last edited:
Remember every stupid lawsuit about SSD or PCIe performance is one more reason they don't talk about this stuff.

There's no upside in saying that SSDs are faster if the press story for the next month is "SCANDAL! SSDs in cheap Macs are slower than in expensive Macs!" or "Experts say SSDs in Macs, even though they start fast, will slow down over five years"...

Remember all those claims that M1 SSDs were going to die soon because of supposed massive amounts of swapping? have we seen ANY M1 SSDs die in this way? But that doesn't stop the same people who squawked loudly then from squawking just as loudly today.
Oh come on, stop being such an angry offended fanboy. Was the excess write issue overhyped? Perhaps. Was there a real problem? Yes, Apple had to fix it in an update. If they hadn't, Macs absolutely would have killed their own SSDs.

They wouldn't have all been M1 Macs either. Some of the more technical people who investigated the problem discovered that Intel Macs running the same Big Sur versions were also accumulating excessive write cycles. It was some kind of CPU architecture neutral logic bug in the VM pager which caused it to swap excessively even under relatively low memory pressure. The reason this got reported as a "M1 problem" was that all eyes were on the just-released M1 Macs, the first reports were from M1 users, and the tech press ran with the juicy headline sensationalizing it as a possible problem with Apple's fancy new silicon, and didn't bother investigating further.

Does it suck that we have such terrible tech reporting? Yeah, but that doesn't mean you get to pretend there was no problem at all.

The new Max design raises some interesting questions.

Suppose the following statements are both true:
- building and testing an EUV mask set is extremely expensive (we know this!)
- it is fairly easy, in a modern fab, to set a machine to only use PART of a mask set, and when stepping, to move the wafer based on the subarea of the mask that is used, not the whole mask.
Why should we suppose the latter statement is true? I don't think step-and-scan machines work quite the way you want them to, and I don't think you thought through all the likely issues.

Then imagine we do the following. The full Max mask set includes
- a Fusion area at the very bottom of the die
- two GPU+memory areas at the bottom
- an IO (and similar "one time" stuff. Display controller, ISP, Secure Enclave, etc) area at the top.

Now we can use this single mask set to make multiple different Max's.
- Max Ultra1 has Fusion and IO section
- Max Ultra2 has Fusion but no IO section
- Max Normal has no Fusion
- Max Minus has no Fusion AND is missing a stripe of GPU from the bottom

The details are unclear (and maybe the Max Minus does not exist as I describe it, it's always a fused or yield-salvaged Max Normal) but the geometry seems to lend itself to this idea.
If your way out there speculation was true, I'd expect to observe straight lines separating these segments of the Max die. There's nothing like that in the M3 Max die photo. For example, if you try to draw a horizontal line at the place where it cuts off 1/4 of the memory controller channels, not only is there no gap to be found, the line you'd draw based on memory controller channels slices right through the middle of 4 GPU cores and a bunch of GPU uncore, and makes a mess out of the SLC too.

On M1 and M2 Max, you can easily see how Apple laid out the die with a clean horizontal cut line in the artwork separating the M1/M2 Pro part of the chip from the Max extras. You should've looked for such structural clues on the M3 Max die before going off into your fantasy land.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.