Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

nquinn

macrumors 6502a
Jun 25, 2020
829
621
I happen to have a relevant graph at hand...



Single-core performance has been improving at a very stable rate.

Interesting. I was glancing through some quick searches on geekbench and saw some 40% numbers, but maybe those aren't the best samples.

+20-25% this round would be nice but obviously isn't a massive game changer. I mostly just want something cool/quiet!
 

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,677
+20-25% this round would be nice but obviously isn't a massive game changer. I mostly just want something cool/quiet!

It would be for me. 20% more ST performance on top of faster memory interface will translate to at least 10-15% faster runtime for my work which is huge.
 

Fomalhaut

macrumors 68000
Oct 6, 2020
1,993
1,724
Interesting. I was glancing through some quick searches on geekbench and saw some 40% numbers, but maybe those aren't the best samples.

+20-25% this round would be nice but obviously isn't a massive game changer. I mostly just want something cool/quiet!
Bear in mind that this is 20-25% in single-core performance on top of what is already one of the best single-core performance benchmarks of *any* current CPU.

We could expect an 80-95% improvement in multi-core performance if there were 8 performance cores, even if these are the same first generation M1 core microarchitecture. If we have 8 cores of a second generation design, then we could easily have double the multi-core performance over current M1 models, which is a big deal for applications that are heavily multi-threaded.

That could put an 8+2 core MBP into the same performance bracket as a 16-core 2019 Mac Pro (c. 15,000 GB5) - which would be astonishing! https://browser.geekbench.com/mac-benchmarks
 
  • Like
Reactions: Roode

Jorbanead

macrumors 65816
Aug 31, 2018
1,209
1,438
I think that Apple would want more single core performance for their pro machines, and it is questionable whether Firestorm can be reliably clocked higher.
So the only solution to this then would be using 5nm+ cores, which means if these are going into MBP soon they would have already hit production earlier this year. It just seems odd we haven’t heard anything about 5nm+ chips until very recently. They would also be the very first 5nm+ chips anywhere.
 

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,677
So the only solution to this then would be using 5nm+ cores, which means if these are going into MBP soon they would have already hit production earlier this year. It just seems odd we haven’t heard anything about 5nm+ chips until very recently. They would also be the very first 5nm+ chips anywhere.

No, another solution would be to use a different prosumer-oriented microarchitecture that delivers more performance at the expense of small increase in power usage. That way they can have faster more power hungry chips for larger machines and slower (but still very fast) abs more efficient chips for mobile abs entry-level. That’s the core of my speculation.
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
So the only solution to this then would be using 5nm+ cores, which means if these are going into MBP soon they would have already hit production earlier this year. It just seems odd we haven’t heard anything about 5nm+ chips until very recently. They would also be the very first 5nm+ chips anywhere.

They are in production on 5nm+, but that’s beside the point. You don’t need to move to a new node (and 5nm+ isn’t really even much of a difference) in order to create a faster core. Each time I’ve worked on the design of a new core, what we did was start with the spice models for the old node, and design the chip with an aim to be 10%-20% faster even if the node didn’t change. That way the new node is all bonus.

You do things like figure out where bottlenecks are by profiling real code running on real machines, then figure out how to solve them. So maybe you redesign multipliers to take 1 fewer cycle. Or you add a bigger reorder buffer, or deeper reservation stations. Or a new branch prediction algorithm. Or you fetch a bigger block of instructions in order pre-decode more at once so you can issue more in parallel. Maybe you add an extra pipeline to each core. Or figure out a way to hide pipeline bubbles of a certain type. There are literally a million things you can do, with the overall goal of increasing IPC, and reducing critical paths to allow increased frequency as well. Not to mention things you can do to further reduce power (which can then be dedicated to performance).
 

EntropyQ3

macrumors 6502a
Mar 20, 2009
718
824
They are in production on 5nm+, but that’s beside the point. You don’t need to move to a new node (and 5nm+ isn’t really even much of a difference) in order to create a faster core. Each time I’ve worked on the design of a new core, what we did was start with the spice models for the old node, and design the chip with an aim to be 10%-20% faster even if the node didn’t change. That way the new node is all bonus.

You do things like figure out where bottlenecks are by profiling real code running on real machines, then figure out how to solve them. So maybe you redesign multipliers to take 1 fewer cycle. Or you add a bigger reorder buffer, or deeper reservation stations. Or a new branch prediction algorithm. Or you fetch a bigger block of instructions in order pre-decode more at once so you can issue more in parallel. Maybe you add an extra pipeline to each core. Or figure out a way to hide pipeline bubbles of a certain type. There are literally a million things you can do, with the overall goal of increasing IPC, and reducing critical paths to allow increased frequency as well. Not to mention things you can do to further reduce power (which can then be dedicated to performance).
While the discussion on forums are often about the core in isolation, it would seem that Apple is pretty accomplished when it comes to connecting the greater whole. Not just the cache and memory hierarchy as seen from the cores, but in dealing with all the different functional units and their demands on common resources, differences in access patterns, avoiding/mitigating interlocks, communication between cores (and other units), ... the list of arcana is long.

What knobs might Apple tweak here, and how might increasing the number of cores and other functional units shift the optimum of the compromises involved?
 

pshufd

macrumors G4
Oct 24, 2013
10,149
14,574
New Hampshire
The M1 is enough for the vast majority of Apple customers right now. And their sales records confirms that. What I would like is an M1X and it has enough CPU and GPU for my needs, probably for another five years unless I take on some new hobby that uses a lot of CPU and GPU. I think that the range that has been discussed by the leakers will have their high-end customers happy for a while. Maybe not the highest end that use the Mac Pro. But to most of the people around here want more and I think that M1X will provide that.

AMD and Intel will take some time to catch up so I don't think that Apple has to be in a rush to keep cranking it up.

But they will anyways. And that's why I think that Apple will be a monster in the PC world
 
  • Like
Reactions: ader42

Spindel

macrumors 6502a
Oct 5, 2020
521
655
I'm late to the party and haven't read the entire thread so some of the things I say are probably already mentioned.

For the higher tier machines that are coming I don't think we will see a meaningful IPC increase, but we might see a bit higher clocks that will improve single threaded performance a bit at the same IPC as the M1.

If we talk higher tiers MBP and presumable a larger iMac they will have 6-8 high powered cores and 4 low powered cores. Also these machines will get 12-16 GPU cores.

Probably 32 GB and 64 GB RAM options.
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
While the discussion on forums are often about the core in isolation, it would seem that Apple is pretty accomplished when it comes to connecting the greater whole. Not just the cache and memory hierarchy as seen from the cores, but in dealing with all the different functional units and their demands on common resources, differences in access patterns, avoiding/mitigating interlocks, communication between cores (and other units), ... the list of arcana is long.

What knobs might Apple tweak here, and how might increasing the number of cores and other functional units shift the optimum of the compromises involved?

Again, many possibilities. I spent four years of my life on optimizing cache memory hierarchies for one simple chip, and we didn’t even have to think about GPUs, AI cores, etc. Assuming they stick with UMA, they can play around with the size, organization, and bus characteristics of the shared system cache, state synchronization techniques for non-shared caches, techniques for shifting computations between cores to avoid local hot-spots, separate engines for specific types of computations to relieve the cores, etc.
 

quarkysg

macrumors 65816
Oct 12, 2019
1,247
841
Again, many possibilities. I spent four years of my life on optimizing cache memory hierarchies for one simple chip, and we didn’t even have to think about GPUs, AI cores, etc. Assuming they stick with UMA, they can play around with the size, organization, and bus characteristics of the shared system cache, state synchronization techniques for non-shared caches, techniques for shifting computations between cores to avoid local hot-spots, separate engines for specific types of computations to relieve the cores, etc.
I would assume Apple has probably invested a fair chunk of the die real estate towards synchronizing the cache between all the processing cores. Maintaining coherency between the 8 CPU cores probably is already challenging. Having to do it for the GPU, NPU and ISP cores as well would be a nightmare while guaranteeing low latency.
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
I would assume Apple has probably invested a fair chunk of the die real estate towards synchronizing the cache between all the processing cores. Maintaining coherency between the 8 CPU cores probably is already challenging. Having to do it for the GPU, NPU and ISP cores as well would be a nightmare while guaranteeing low latency.

Cache coherency protocols are fun. Lots of ways to do it. Write-locks, write-throughs, broadcasts, etc. Not sure if the auxiliary units (NPU, etc.) have direct memory access - they may have to transfer their operands and results via a bus to a core which directs them. Again, millions of design choices.
 

macOS Lynx

macrumors 6502
Jun 3, 2019
386
555
Here are my personal thoughts:

Every computer that has an M1 processor replaced a previous model variant. The Air replaced the Air (obviously), the Mac mini replaced the quad-core Mac mini, but most importantly, the 13" Pro only replaced the 2 port 13" Pro model.

So, we have the 27in iMac, 6-core Mac mini, 4-port 13" Pro, 16" Pro, and Mac Pro left to be updated. Apple also said during the announcement of M1, they said they were making a family of SoCs. Considering the emphasis on how M1 is "low-power" silicon, I think we're going to see two more chips - the M1X/M2/M1 Pro serves as a high performance version of the M1, and there will be a second chip that's even higher performance.

I think it's pretty obvious the 4 port 13" and the 16" will get replaced by the upcoming 14" and 16" using the mid-tier chip. I also think it's pretty safe to say the Mac Pro will use the highest end chip.

For the 27" iMac, I think we have two options. Apple is either going to do what they used to do with the iMac G4 and G5, and announce a new screen size mid-cycle, or there will be a dedicated redesign for a 27" replacement (maybe 32"?), that uses either the M1 or M1X.

The Mac mini I think will also have two possibilities. The first is it's killed off, with only the regular M1 mini available, or there's a new Mac mini "pro" (Maybe Mac Pro Cube?).
 

EntropyQ3

macrumors 6502a
Mar 20, 2009
718
824
Again, many possibilities. I spent four years of my life on optimizing cache memory hierarchies for one simple chip, and we didn’t even have to think about GPUs, AI cores, etc. Assuming they stick with UMA, they can play around with the size, organization, and bus characteristics of the shared system cache, state synchronization techniques for non-shared caches, techniques for shifting computations between cores to avoid local hot-spots, separate engines for specific types of computations to relieve the cores, etc.
Yeah. I already gathered that complex stuff is complex. Or as you put it re:cache coherency schemes - ”fun”.

Thing is, while these messy interconnect parts are one hell of a lot harder than core performance to quantify or evaluate with a benchmark run, I also have the feeling that they are quite important. It’s just that they are difficult to test and too complex to reduce to a single figure of merit to be argued about. Maybe these issues really are just for those in the trade. But I would like to gain a better understanding. (While still doing my job and raising my kids. So - after I’m pensioned? ?)
 
  • Haha
Reactions: BigSplash

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
Yeah. I already gathered that complex stuff is complex. Or as you put it re:cache coherency schemes - ”fun”.

Thing is, while these messy interconnect parts are one hell of a lot harder than core performance to quantify or evaluate with a benchmark run, I also have the feeling that they are quite important. It’s just that they are difficult to test and too complex to reduce to a single figure of merit to be argued about. Maybe these issues really are just for those in the trade. But I would like to gain a better understanding. (While still doing my job and raising my kids. So - after I’m pensioned? ?)

Yeah, it’s hard to reduce to a single figure. Way back in the 90’s, when I was doing my PhD, I think I used Harvard Graphics to create these sorts of graphs, using one “metric”:

7ED64B93-FB6E-4DB5-B553-4A0030F86CD1.gif A8467AD7-B696-4F6B-BBEA-0C072BC53E3C.gif

Thousands of simulations, varying all sorts of parameters, running all sorts of benchmarks, etc. No better way to get your head around these issues than to actually sit down with a blank sheet of paper and try and design a system. I had no idea how all sorts of design choices interacted until I actually had to solve the problem.

 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
M1X or maybe some other letter or naming scheme only Apple knows. Zero chance it’s M2.
My sources say that all M1 variations (I.E. anything with firestorm/icestorm for macs and ipads) is just M1. And the chips for the new macs announced next week, supposedly, do not have firestorm/ice storm. So M2. And in september, apple will announce new iphones with “the same powerful processor technology found in macs!”
 

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,677
So, I've tried to add everyone's predictions to the first post, please ping me if I have left you out!
 

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,665
OBX
No, there is not. There are actually major advantages in keeping CPU and GPU closely integrated (which is why consoles are designed that way).
The consoles use an APU cause they get the CPU and GPU from the same manufacturer (cause AMD is giving them a sweet deal). Prior to the 8th gen the CPU and GPU came from separate vendors and thus were separate chips.
 

thenewperson

macrumors 6502a
Mar 27, 2011
992
912
We have iPad Pro with an M1, so next an iPhone Pro with an M2...
I think that'd be a waste. It'd likely have to be down clocked to fit within the power profile of the iPhone. Except if M2 = renamed A15, but that seems unnecessary.
 

senttoschool

macrumors 68030
Nov 2, 2017
2,626
5,482
The consoles use an APU cause they get the CPU and GPU from the same manufacturer (cause AMD is giving them a sweet deal). Prior to the 8th gen the CPU and GPU came from separate vendors and thus were separate chips.
I think what @leman is trying to say is that AMD could have chosen to separate out the CPU and GPU in PS4/PS5/Xbox One/Xbox Series X. But AMD didn't. AMD chose to create an APU instead of separating out the two.
 

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,677
The consoles use an APU cause they get the CPU and GPU from the same manufacturer (cause AMD is giving them a sweet deal). Prior to the 8th gen the CPU and GPU came from separate vendors and thus were separate chips.

I am not talking about APUs. I am talking about unified memory systems. Look at Xbox 360 for example. Unified memory simplifies the programming model and enables better cooperation between the CPU and the GPU.
 

senttoschool

macrumors 68030
Nov 2, 2017
2,626
5,482
Some are saying it's taken them too long to go from M1 to M1X, so therefore it must be a new chip not based off of M1, but Apple hasn't always released an X-variant of their chips within months of its sibling. For example:

A5 - March 11, 2011
A5X - March 16, 2012 (1 year later)
A10 - Sept 16, 2016
A10X - June 13, 2017 (7 months later)
The thing is... the M1 is likely the A14X already.

The SoCs going into the upcoming MBPs are likely to be wholly different.
 
  • Like
Reactions: AutisticGuy

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,665
OBX
I am not talking about APUs. I am talking about unified memory systems. Look at Xbox 360 for example. Unified memory simplifies the programming model and enables better cooperation between the CPU and the GPU.
Tha main (slow) RAM was unified, but there was a separate pool of eDRAM for the 360 as well. IIRC the One S brought eDRAM back to improve performance.

Yes unified memory is ideal.


Is Apple going to go with LPGDDR 5 for the M2/M1X or will they switch to GDDR 6X (sweet sweet bandwidth, cheaper than HBM2)?
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.