Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

name99

macrumors 68020
Jun 21, 2004
2,407
2,309
What would be the mechanism by which IPC is traded for higher clocks? Dougall Johnson reported latency on some operations actually reducing rather than increasing.

I would rather speculate that there is some other bottleneck in the core. Maybe cache bandwidth or the amount of load/store units. Should this be the case we might see some IPC improvements in the future if Apple redesigns these parts. Another explanation is that the IPC is already approaching the limits imposed by code and that higher core utilization will only be noticeable in some corner cases.



To be honest I’m getting less and less optimistic about this. But let’s see. Maybe the larger red tops will feature overclocked “Pro/Max Plus” chips 😅

It's hard to say anything informed because no-one is providing the sorts of information that would be required.
Why might a CPU not run at full width every cycle?
Obvious issues are
- I cache misses
- D cache misses
- similarly with TLB
- branch mispredictions.

Those are all (in principle) easy to track, even so I'm unaware of any serious investigation of them for Apple's design. You can get some info on the Intel side (for example the paper I have referred to occasionally on how a few hard to predict branches dominate branch misprediction, and should be handled via other means).

But there are successor issues, in particular:
- there are runs of dependent instructions BUT
- these are interleaved with other independent runs of dependent instructions
In other words, the extent to which you can get these independent runs executing together (and thus utilizing your full width) depends on how far you look ahead in the instruction stream relative to current execution. This is basically a question of how deep your dispatch/issue scheduling queues are (as opposed to everything else like ROB depth).

We know that, IN THEORY, there is a lot of such parallelism, but
- the last serious limit studies were done 30 years ago on designs that you wouldn't want to put into a lightbulb today
- those limit studies did a poor job of investigating THIS particular issue.


A different way to tackle the problem is by tracking instruction criticality and the consequences that flow from this. If you DO track instruction criticality, you can then start to do things like ensure that only the critical instructions sit in expensive high performance scheduling queues, while the bulk of non-critical instructions can sit in cheaper buffers. As far as I can tell (which is limited, as I say, because there do not seem to be any good studies tracking the issues that really matter) this is the next frontier in wide OoO, given that Apple has implemented everything that's lower lying fruit.

In other words, IMHO (and of course I could be wrong, but this is informed speculation, not wishful thinking)
- there remains a substantial degree of IPC improvement available
- some of this is achievable in the obvious ways (more execution units, wider decode, etc) but only with very low returns to that hardware. What's needed is not more hardware but better algorithms to utilize that hardware.
- the most important of those algorithms will track critical instructions, and segregate them from the rest of the instruction stream
- some academic work on this has been done, but far too little

Do Apple know about this? As always, who knows? There IS a very recent (like past month) patent for tracking critical D-CACHE lines (ie lines whose non-presence resulted in a substantial build-up of instructions that could not progress). Once these lines are detected, they are held onto more tightly in L1, and the criticality bit is preserved if they are moved out to L2 and SLC so that they are also preferentially retained there.
This suggests that Apple in fact has a first round of criticality tracking hardware implemented (or at least being designed), and the obvious next step after that is as I suggested above.
I don't think they would blindly engage in the widening they have engaged in without being aware that, by itself, it's not enough.

Is GW3 aware of this? Who knows...?

Apart from criticality, what else limits going wider?
Decode is easy, and scheduling is as easy as it currently is IF you can segregate many more pending instructions into less high-power queues. What is hard is single-cycle rename (ie resource allocation). My suggestion for this is a decoupling queue between Decode and Rename. This allows for say 10-wide decode with 8-wide allocation, given that a substantial number of decode slots fuse two operations into one.
I was surprised that Apple don't seem to have already done this with the A17/M3, but perhaps they have and the tentative A17 explorations did not pick this up because they did not know what they were looking for?

So my view is that IPC is not yet mined out. But the EASY stuff is mined out – there were two decades worth of good ideas just sitting out there ignored by everyone (totally by Intel and AMD, and mostly by ARM) until Apple lit a fire under everyone. Going forward will require years to design, simulate, and perfect each no big idea (like criticality), and while that is going on the best you can hope for in the intermediate designs is small tweaks, things that don't look valuable until the big picture all comes together.
 

Chancha

macrumors 68020
Mar 19, 2014
2,307
2,134
There is a newly appearing Mac15,8 with 64GB RAM (the 15,9s have 48GB), I assume it is the 14”?
 

altaic

macrumors 6502a
Jan 26, 2004
711
484
Courtesy of dhinakg on the birdsite:

Code:
CPID:8122 BORD:22 J504AP    Mac15,3    MacBook Pro (14-inch, Nov 2023) [M3]
CPID:8122 BORD:28 J433AP    Mac15,4    iMac (24-inch, 2023) [M3]
CPID:8122 BORD:2A J434AP    Mac15,5    iMac (24-inch, 2023) [M3]
CPID:6030 BORD:04 J514sAP   Mac15,6    MacBook Pro (14-inch, Nov 2023) [M3 Pro]
CPID:6030 BORD:06 J516sAP   Mac15,7    MacBook Pro (16-inch, Nov 2023) [M3 Pro]
CPID:6031 BORD:44 J514cAP   Mac15,8    MacBook Pro (14-inch, Nov 2023) [M3 Max]
CPID:6031 BORD:46 J516cAP   Mac15,9    MacBook Pro (16-inch, Nov 2023) [M3 Max]
CPID:6034 BORD:44 J514mAP   Mac15,10   MacBook Pro (14-inch, Nov 2023) [M3 Max]
CPID:6034 BORD:46 J516mAP   Mac15,11   MacBook Pro (16-inch, Nov 2023) [M3 Max]
 
Last edited:

Pressure

macrumors 603
May 30, 2006
5,178
1,544
Denmark
No, all review units don't come with base RAM.

Apple knows how to game this. lol.
It also won't matter with the benchmarks those influencers put out the first few days. They can barely run a full suite of Geekbench and 3DMark or explain what the scores represent.

They will talk about the color, fingerprints or how it looks as an accessory at their local Starbucks rather than doing anything serious with these overpowered machines that will be relegated to simple video editing tasks and social media posts in their hands.

People are already posting their definitive "buyers guides" without having access to the underlying hardware, it's outright pathetic that they even get clicks and exposure for posting **** and only around the time Apple releases new hardware.
 

Chancha

macrumors 68020
Mar 19, 2014
2,307
2,134
It also won't matter with the benchmarks those influencers put out the first few days. They can barely run a full suite of Geekbench and 3DMark or explain what the scores represent.

They will talk about the color, fingerprints or how it looks as an accessory at their local Starbucks rather than doing anything serious with these overpowered machines that will be relegated to simple video editing tasks and social media posts in their hands.

People are already posting their definitive "buyers guides" without having access to the underlying hardware, it's outright pathetic that they even get clicks and exposure for posting **** and only around the time Apple releases new hardware.
Even then, not giving out any base config is Apple's way to guarantee in case some reviewer decides to speak some truth this time.

I don't know what their NDA says, it used to be even benchmark type apps were forbidden to be run, but seems they have loosen up a bit because exactly what you described, these general benchmarks are quite useless.
 

APCX

Suspended
Original poster
Sep 19, 2023
262
337
Anyone interested in Geekbench 5 (for the higher multi core)
Single core: 2355, multi core: 23013
 
  • Like
Reactions: bcortens

leman

macrumors Core
Oct 14, 2008
19,518
19,666
Don't know if this has been posted yet, but we have a base M3 breaking 3150 in GB6 ST

 
  • Like
Reactions: APCX

dgdosen

macrumors 68030
Dec 13, 2003
2,817
1,463
Seattle
Still no M3 Pro benchmarks (if 15,6 and 15,7 are the identifiers)

Waiting to see them... As it stands, as a developer (and maybe one week of the year as a sound/video creator, but who am I kidding), I'm not sure if I could tell the difference in performance between:

- M2 Macbook Air 24GB/1TB
- M3 Macbook Air 24GB/1TB (future)
- M3 Macbook Pro 24GB/1TB
- M3 Pro Macbook Pro 36GB/1TB
- M2 Pro Macbook Pro 32GB/1TB
 

Kronsteen

macrumors member
Nov 18, 2019
76
66
M3 Max Metal 158466


Well thats a tad over the M1 ultra. I mean, not bad that in two generations the max surpasses the ultra.

M1 ultra = 154110
M2 ultra = 208621

Good to see these figures now appearing.

Assuming a similar Ultra/Max ratio to the M2, that implies an M3 Ultra score of somewhere in the region of 240,000 (assuming 80 GPU cores for the M3 Ultra).

The average of the M3 Max Metal scores currently showing on GB is around 157,000. These are all 16 CPU core Macs so 40 GPU. That's around 9% higher than the M2 Max 38 core (or 3% adjusting for the number of cores).

For me, that's a perfectly respectable increase, given the additional functionality that has been shoehorned into the M3's GPUs. Although the GB figures are of some interest, I'll be a lot more interested to know how it performs with some real workload (and with my own code, although I really need to get round to converting from OpenCL to Metal ... 😬).

Incidentally, I think this figures of 208,621 for the M2 Ultra is a "blended" figure based on both 60 and 76 core versions. There are distinct values for each version in the GB 'Mac benchmarks' section (220,000 for 76 cores).

I now just need to decide whether to succumb to temptation 👹 and upgrade my 2019 16" Intel MacBook Pro to an M3 Max or be patient 😇 and wait for an M3 Studio ....
 
  • Like
Reactions: Macintosh IIcx

bcortens

macrumors 65816
Aug 16, 2007
1,324
1,796
Canada
All evidence (of which there is very little) points to A17 and M3 sharing the same CPU and GPU cores. Three are no sources saying that M3 was based on A16, just some folks making random unsubstantiated claims.
Steve Troughton Smith is using the device family tree to speculate that it is A16 family rather than A17 (Mastodon)

Edit: to be fair he doesn't claim that the GPU cores are A16 class.
 

thunng8

macrumors 65816
Feb 8, 2006
1,032
417
Don't know if this has been posted yet, but we have a base M3 breaking 3150 in GB6 ST

New one for the base 8 core of 3162 and 12k in multicore


and new high for for the max with 21560 for multicore


overall, I am really impressed with the m3max multicore. Wasn’t expecting such a big increase. eagerly awaiting some Blender 4 results that take advantage of Ray tracing
 

leman

macrumors Core
Oct 14, 2008
19,518
19,666
Steve Troughton Smith is using the device family tree to speculate that it is A16 family rather than A17 (Mastodon)

Edit: to be fair he doesn't claim that the GPU cores are A16 class.

The decisive argument against any such speculation is in the info Apple has provided about M3:

1699044101538.jpeg


A16 CPU does not have wider execution engine or improved branch prediction. In fact, it’s another iteration of Firestorm, tuned for higher frequencies.

M3 uses new CPU cores, and the information released from Apple is consistent with their description of A17. Therefore, I have little doubt that these are the same cores.
 

APCX

Suspended
Original poster
Sep 19, 2023
262
337
The decisive argument against any such speculation is in the info Apple has provided about M3:

View attachment 2306806

A16 CPU does not have wider execution engine or improved branch prediction. In fact, it’s another iteration of Firestorm, tuned for higher frequencies.

M3 uses new CPU cores, and the information released from Apple is consistent with their description of A17. Therefore, I have little doubt that these are the same cores.
Steve Troughton Smith is using the device family tree to speculate that it is A16 family rather than A17 (Mastodon)

Edit: to be fair he doesn't claim that the GPU cores are A16 class.
If he’s claiming this, it’s pretty embarrassing tbh.

Edit. Read more of his toots and he can’t be embarrassed. Oof.
 
Last edited:

bcortens

macrumors 65816
Aug 16, 2007
1,324
1,796
Canada
The decisive argument against any such speculation is in the info Apple has provided about M3:

View attachment 2306806

A16 CPU does not have wider execution engine or improved branch prediction. In fact, it’s another iteration of Firestorm, tuned for higher frequencies.

M3 uses new CPU cores, and the information released from Apple is consistent with their description of A17. Therefore, I have little doubt that these are the same cores.
Yeah, he seems to be putting a lot of faith in the internal code name numbering scheme..
 

cpnotebook80

macrumors 65816
Feb 4, 2007
1,228
550
Toronto
I noticed that no one mentioned about the openCL scores on Geekbench unless i missed it on here. I do see an openCL scoreif you search for apple m3 and its 92855 almost similar to the Rx 6800XT. It is still below the M2 ultra at 118830. So a 25% difference behind the Ultra and 15% higher than the M2 Max score of 80000 on that same page.

For the metal scores, i do not see any yet, but from the site the M2 ultra is at 208621 and M2 max at 131851 but the leaked metal score shows 158466, 27% behind the M2 ultra and close to 18% ahead of the M2 Max. Of course we compare desktop vs notebook performance but overall gpu is 15-18% than M2 max i guess in opencl and metal that too being compared against the higher end M3 Max version one would think.
 

Confused-User

macrumors 6502a
Oct 14, 2014
850
984
As it stands, as a developer (and maybe one week of the year as a sound/video creator, but who am I kidding), I'm not sure if I could tell the difference in performance between:

- M2 Macbook Air 24GB/1TB
- M3 Macbook Air 24GB/1TB (future)
- M3 Macbook Pro 24GB/1TB
- M3 Pro Macbook Pro 36GB/1TB
- M2 Pro Macbook Pro 32GB/1TB
Perhaps you couldn't tell the difference between a ten minute compile and an eight minute one. But if you're doing builds ten times a day, that's twenty more minutes you're not staring at a wall, playing solitaire, catching up on slack, etc. That's a roughly 4% increase in productivity, and the person paying your salary can see that...

(Yes, the numbers are made up, but the argument stands.)
 
  • Like
Reactions: Adult80HD
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.