User predictions for Apple Prosumer Silicon (M1X, M2 etc.)

EntropyQ3 · May 31, 2021

cmaier said:
Yeah, it’s hard to reduce to a single figure. Way back in the 90’s, when I was doing my PhD, I think I used Harvard Graphics to create these sorts of graphs, using one “metric”:

View attachment 1784622 View attachment 1784621

Thousands of simulations, varying all sorts of parameters, running all sorts of benchmarks, etc. No better way to get your head around these issues than to actually sit down with a blank sheet of paper and try and design a system. I had no idea how all sorts of design choices interacted until I actually had to solve the problem.

Cliff Maier's PhD Dissertation

Awesome!
Skimmed a bit, and realised that it required a warm day in the shade with some beer and a cigar, and in paper form. I promise not to pester you with questions.

leman · May 31, 2021

diamond.g said:
Is Apple going to go with LPGDDR 5 for the M2/M1X or will they switch to GDDR 6X (sweet sweet bandwidth, cheaper than HBM2)?

I doubt we will se GDDR6. It’s too hot, latency is too high, it’s just not the RAM for the job. My bet is on LPDDR and I sure hope it’s LPDDR5 this time.

diamond.g · May 31, 2021

leman said:
I doubt we will se GDDR6. It’s too hot, latency is too high, it’s just not the RAM for the job. My bet is on LPDDR and I sure hope it’s LPDDR5 this time.

With it's giant cache is Apple Silicon really that latency sensitive?

leman · May 31, 2021

diamond.g said:
With it's giant cache is Apple Silicon really that latency sensitive?

No idea. But the CPU has limited means of hiding latency (unlike the GPU), so significantly increasing the latency is probably not the best thing…

cmaier · May 31, 2021

EntropyQ3 said:
Awesome!
Skimmed a bit, and realised that it required a warm day in the shade with some beer and a cigar, and in paper form. I promise not to pester you with questions.

Hah! Nothing too exciting in it - this was for a very simple CPU, back before we had to worry about multiple cores, GPUs sharing memory, etc. But it does show that even for a very simple machine there are lots of things to think about.

cmaier · May 31, 2021

leman said:
No idea. But the CPU has limited means of hiding latency (unlike the GPU), so significantly increasing the latency is probably not the best thing…

Well, except of course that a giant cache means you don’t pay the price of that latency very often. It turns out you can afford a pretty high latency if you have a high enough cache hit rate (and trace simulations will show you that, for example, doubling the latency of RAM when you have a 90+% cache hit rate has little overall effect on the overall average cycles-per-instruction.)

Of course, if your workload is one that is super sensitive to latency, and has a highly stochastic memory access pattern, then all of that theory does you no good.

dogslobber · May 31, 2021

There will be a new chip which can move mountains. The fact Apple hasn't updated the 16" MBP and still sells the Intel 27" iMac means they are segmenting those markets in anticipation of a new chip. Or not. They may simply be bolting two of these M1 chips together with a fast backplane technology to go multi-chip. Nothing says these machines need to have only one M1 chip.

cmaier · May 31, 2021

dogslobber said:
There will be a new chip which can move mountains. The fact Apple hasn't updated the 16" MBP and still sells the Intel 27" iMac means they are segmenting those markets in anticipation of a new chip. Or not. They may simply be bolting two of these M1 chips together with a fast backplane technology to go multi-chip. Nothing says these machines need to have only one M1 chip.

Yeah, they’re not doing that.

diamond.g · May 31, 2021

cmaier said:
Well, except of course that a giant cache means you don’t pay the price of that latency very often. It turns out you can afford a pretty high latency if you have a high enough cache hit rate (and trace simulations will show you that, for example, doubling the latency of RAM when you have a 90+% cache hit rate has little overall effect on the overall average cycles-per-instruction.)

Of course, if your workload is one that is super sensitive to latency, and has a highly stochastic memory access pattern, then all of that theory does you no good.

Are there any current apps that are actually latency sensitive on Apple Silicon? And would upping the clock rate of GDDR x help any?

cmaier · May 31, 2021

diamond.g said:
Are there any current apps that are actually latency sensitive on Apple Silicon? And would upping the clock rate of GDDR x help any?

I don’t know. You’d have to have a pretty random memory access pattern, or one that aliases with the cache memory replacement algorithm in such a way that the cache has the wrong addresses a lot. The whole goal of cache design is to avoid that. But there are always outlier cases.

leman · May 31, 2021

diamond.g said:
And would upping the clock rate of GDDR x help any?

The simple reason why you won’t see GDDR6 in a Mac laptop is because GDDR6 uses insane amounts of power and lacks any of the advanced power-saving features LPDDR has. It’s simply not going to happen. They are not going to ship a MacBook Pro with sub 5 hours battery life.

Fomalhaut · May 31, 2021

thingstoponder said:
M1X or maybe some other letter or naming scheme only Apple knows. Zero chance it’s M2.

Why do you think so? If the next Apple Silicon for Macs (presumably already in pre-production) is using a next generation microarchitecture with common elements to the A15, then it would make sense to name it in such as way that demonstrates an incremental improvement. As A14->A15, so M1->M2.

Naming it M1X or M1 Pro or similar would imply that the next AS SoCs are a variant of the M1. This may be the case, but it seems about as likely that the next Macs will use a second-generation core architecture.

@cmaier has real-world experience with designing and releasing CPUs, and is of the opinion that the next release could realistically be a second-generation microarchitecture. It's by no means guaranteed, but I'm optimistic.

EntropyQ3 · May 31, 2021

diamond.g said:
Are there any current apps that are actually latency sensitive on Apple Silicon? And would upping the clock rate of GDDR x help any?

The difficulty from a user perspective is - how would you know? The app does its thing and normally nothing outwardly demonstrates its access patterns. You need to use analytics tools when you write the code to determine these things.
I’ll say this though - when I was active writing scientific code (you have a problem and need to solve it, but typically don’t need to care about ”the user experience”), I was effectively always limited by memory somehow. My more intelligent/creative code by latencies, my more brute force code by bandwidth, and speeding things up when needed always involved trying to optimize data access patterns/flow. Rarely if ever was I constrained by ALU resources, keeping the beasts fed were the problem.
(This may also be a reason you always see the same old benchmark applications used for CPU benchmarking….but that’s its own discussion.)
Which is the background to my desire to go beyond CPU core analysis for assessments of SoCs.

JouniS · May 31, 2021

diamond.g said:
Are there any current apps that are actually latency sensitive on Apple Silicon? And would upping the clock rate of GDDR x help any?

As a rule of thumb, anything that requires a lot of memory is sensitive to memory latency. If caching helps on one level of the memory hierarcy, it probably helps on multiple levels. Then your data might as well reside on disk, and caches will probably handle the rest.

cmaier · May 31, 2021

JouniS said:
As a rule of thumb, anything that requires a lot of memory is sensitive to memory latency. If caching helps on one level of the memory hierarcy, it probably helps on multiple levels. Then your data might as well reside on disk, and caches will probably handle the rest.

Since most caches use a Least Recently Used cache replacement policy, it all comes down to what your memory access pattern is. If you are constantly trying to read memory addresses that you haven’t read in a long time, then you end up with a lot of costly memory accesses.

thenewperson · May 31, 2021

Is HBM2e an option for RAM type? Even BTO?

Kung gu · May 31, 2021

thenewperson said:
Is HBM2e an option for RAM type? Even BTO?

its will most likely be lpddr5

leman · Jun 1, 2021

thenewperson said:
Is HBM2e an option for RAM type? Even BTO?

Theoretically. But as @Kung gu wrote above, LPDDR is the most likely option. It's cheaper, more flexible and supports advanced power management.

thenewperson · Jun 1, 2021

Kung gu said:
its will most likely be lpddr5

leman said:
Theoretically. But as @Kung gu wrote above, LPDDR is the most likely option. It's cheaper, more flexible and supports advanced power management.

True, it's what I expect as well. I was just wondering what they'd do for bandwidth. The previous highs were HBM2 @ 400GBps in the MBP16 and 512GBps in the iMac Pro. They'd probably go for quad channel LPDDR5 to get to 64GB and ~200GBps right? I'm just wondering if they'd be okay with half the max bandwidth they used to have.

leman · Jun 1, 2021

thenewperson said:
True, it's what I expect as well. I was just wondering what they'd do for bandwidth. The previous highs were HBM2 @ 400GBps in the MBP16 and 512GBps in the iMac Pro. They'd probably go for quad channel LPDDR5 to get to 64GB and ~200GBps right? I'm just wondering if they'd be okay with half the max bandwidth they used to have.

It is true that on paper 200GB/s looks to be significantly less compared to HBM2 bandwidth used in high-end Mac GPUs, but Apple Silicon is simply less reliant on bandwidth. Apple uses large caches (even the M1 has 16MB LLC where 5700XT has only 4MB L2 GPU cache), compute data compression and TBDR technology to optimize memory access and bandwidth utilization. M1 shows that they don't need ridiculous RAM bandwidth to achieve respectable results, and I am sure that the same will be true for prosumer chips that will likely come with even more cache and advanced technology.

senttoschool · Jun 1, 2021

leman said:
It is true that on paper 200GB/s looks to be significantly less compared to HBM2 bandwidth used in high-end Mac GPUs, but Apple Silicon is simply less reliant on bandwidth. Apple uses large caches (even the M1 has 16MB LLC where 5700XT has only 4MB L2 GPU cache), compute data compression and TBDR technology to optimize memory access and bandwidth utilization. M1 shows that they don't need ridiculous RAM bandwidth to achieve respectable results, and I am sure that the same will be true for prosumer chips that will likely come with even more cache and advanced technology.

This person says that Nvidia's Grace has 8 channels of LPDDR5X which adds up to 546GB/s of bandwidth. I could see Apple going this route for Mac Pros but reduce the number of channels to 4 for Macbook Pros. Not sure how memory channels affect power usage. Eight channels of memory must use a lot of power, right?

https://twitter.com/i/web/status/1381722139888398339

quarkysg · Jun 1, 2021

senttoschool said:
This person says that Nvidia's Grace has 8 channels of LPDDR5X which adds up to 546GB/s of bandwidth. I could see Apple going this route for Mac Pros but reduce the number of channels to 4 for Macbook Pros. Not sure how memory channels affect power usage. Eight channels of memory must use a lot of power, right?

The 2019 Mac Pro already sports a 6 channels ECC DDR4 memory bus, so going to 8 channels is definitely possible for Apple.

leman · Jun 1, 2021

senttoschool said:
This person says that Nvidia's Grace has 8 channels of LPDDR5X which adds up to 546GB/s of bandwidth. I could see Apple going this route for Mac Pros but reduce the number of channels to 4 for Macbook Pros. Not sure how memory channels affect power usage. Eight channels of memory must use a lot of power, right?

More memory channels simply mean more independent RAM chips. I assume that RAM power usage scales linearly. M1's RAM power usage is crazy efficient and usually only uses around 0.2 watts in everyday tasks. For gaming benchmarks it's around 0.7 watts. In benchmarks that really hammer the memory subsystem it can get up to 1.5 watts.

There is good reason to assume that doubling the memory channels will simply double the energy consumption so instead of RAM drawing 0.2 watts you'd have RAM drawing 0.5 watts on average. Not a big deal on a machine with a larger battery.

diamond.g · Jun 1, 2021

Oh and maybe Apple will support PCIe gen 5 with it's onboard controller in the M2? Or do we think that they will stick with Gen 3? If they stick with 3, will they update their storage controller to match current NVMe speeds on Gen 4?

leman · Jun 1, 2021

diamond.g said:
Oh and maybe Apple will support PCIe gen 5 with it's onboard controller in the M2? Or do we think that they will stick with Gen 3? If they stick with 3, will they update their storage controller to match current NVMe speeds on Gen 4?

Who cares really? Apple does not use any PCIe devices so it’s mostly about Thunderbolt that’s limited anyway. Their SSD use custom communication Chanel that’s hooked directly into the SoC and they can make it as fast or as slow as they want. They are not limited by PCIe in this area.

User predictions for Apple Prosumer Silicon (M1X, M2 etc.)

macrumors 6502a

macrumors Core

macrumors G4

macrumors Core

Suspended

Suspended

macrumors 601

Suspended

macrumors G4

Suspended

macrumors Core

macrumors 68000

macrumors 6502a

macrumors 6502a

Suspended

macrumors 6502a

Suspended

macrumors Core

macrumors 6502a

macrumors Core

macrumors 68030

macrumors 65816

macrumors Core

macrumors G4

macrumors Core

Our Staff