Rumored pro chip

Bug-Creator · May 18, 2021

Don't see why only 2 "cool" core would mean anything as these are really only there to run the OS while idling (or run background tasks when heavy lifting is done elsewhere).

And I can tell you the latest version of macOS does idle quite well on my 12" rMB with only 2 not so "cool" cores which I'd guess are on par with a "slow&cool" M1 core ... for the 1st 10 seconds.....

Falhófnir · May 18, 2021

Lemon Olive said:
Sounds very unusual and unlikely to me, but the description is different enough that I have hard time believing this could be an M1-variant.

What really stands out are the 2 efficiency cores. I don't think that means there are simply fewer. I think that means totally redesigned efficiency cores, so that only 2 are needed. All of which speaks to an entirely new chip design.

It would certainly be unprecedented, but sounds like it might be true.

The problem I see there is it significantly increases the R&D cost... it's not impossible, but I think they'd need a very good reason to spend money creating more powerful efficiency cores when 4 of the ones they've got already will do fine... unless this is a next gen chip, and the 15th gen efficiency cores are so good they only need 2 (i.e. they're 2x as fast as ice storm)?

Edit: sorry I think I misread/ misunderstood your post the first time, think you're saying largely the same thing!

Bug-Creator · May 18, 2021

Falhófnir said:
unless this is a next gen chip, and the 15th gen efficiency cores are so good they only need 2

Or.....

- they couldn't make the die bigger for some reason
- they couldn't connect more cores to the infinity matrix (CPU,GPU,NEURAL all end there)

Neither would be an issue for the A15 or the base M2 (if it is the new design) as these are smaller and have a lower overall core count and would also have the least negative impact for a "Pro" chip (compared to nixing a GPU or fast CPU).

vadimyuryev · May 18, 2021

I can't believe how many people think that the M1X is going into the redesigned MacBook Air.
This is laughable at this point.

cmaier · May 18, 2021

Bug-Creator said:
Don't see why only 2 "cool" core would mean anything as these are really only there to run the OS while idling (or run background tasks when heavy lifting is done elsewhere).

And I can tell you the latest version of macOS does idle quite well on my 12" rMB with only 2 not so "cool" cores which I'd guess are on par with a "slow&cool" M1 core ... for the 1st 10 seconds.....

No, that's not what those cores are "only there" for.

JMacHack · May 18, 2021

bobcomer said:
The fastest i9's (X and XE parts) are well over twice as fast as the M1...

No.

Hands-on with the Apple M1—a seriously fast x86 competitor [Updated]

Apple’s M1 proves that ARM can compete with x86 in high-end systems.

arstechnica.com

faster, possibly. Twice? No.

bobcomer · May 18, 2021

JMacHack said:
faster, possibly. Twice? No.

The benchmarks I look are at about 2.3X the M1 (Passmark), but whatever, benchmarks are benchmarks and are only indicators.

omenatarhuri · May 18, 2021

leman said:
32 GPU cores (assuming same per-core performance as M1) should outperform the RTX 3060, at least in the graphics department... Since it's also likely to be a new GPU architecture, we'd probably see somewhere around 10-15 TFLOPS compute power.

Is there something credible out there to support this? Genuinely curious. I'd love to see Apple put out such awesome GPU power, but I have doubts if they can run circles around Nvidia as they have around Intel.

What would be their fundamental design change that would allow such a step up?

CWallace · May 18, 2021

I think two efficiency cores will be enough to handle the general background tasks and low-power needs, allowing more die space to be used to for the additional performance cores.

As for the GPU, there are the rumors of the "Lifuka" GPU so this could be an "external" GPU to keep the die size down on the main SoC by removing the on-die GPU cores as found on the M1. This GPU could still be on-package (like the DRAM of the M1) so it would benefit from physical proximity and high-bandwidth connections. We could also see a mix of on-die and on-package GPU cores like we have now with the Intel iGPU and AMD dGPUs.

leman · May 18, 2021

omenatarhuri said:
Is there something credible out there to support this? Genuinely curious. I'd love to see Apple put out such awesome GPU power, but I have doubts if they can run circles around Nvidia as they have around Intel.

You mean other than benchmarks and GPU specs? Not much. My post was based on available graphical benchmarks, in particular the new 3DMark Wild Life that runs natively on both M1 machines and Windows PCs. The M1 scores around 5000 points in that benchmark, a desktop RX 3060 scores 18000. Quadruple the GPU cores and assuming linear performance scaling (as GPUs usually do, especially if they are so power efficient), and that's what you get.

omenatarhuri said:
What would be their fundamental design change that would allow such a step up?

No design change needed, you just use more GPU cores. Graphics (and compute) is a parallelizable tasks: you can just shade more triangles/process more compute threads in parallel using more compute cores (this is an oversimplification, but it gives you an idea).

A single M1-based G13 GPU core consists of 4 32-wide FP32 ALUs, making it 128 "compute units" per GPU core. Apple shader units are capable of a single floating point or integer operation per clock. Apple GPU in the M1 runs at approximately 1.26ghz, which gives it a peak compute throughtput of 2.6TFLOPS (1024 compute units * 1.26 ghz * 2 — the multiplier for fused multiply add which is a single operation that counts as two).

To compare this with Nvidia Ampere, it's "compute unit" (CUDA core) can do either a FP32+INT32 or two FP32 operations per clock. Take an RTX 3060 that has somewhere around 1920 units (it was never really clear to me) with the boost frequency of 1.77 ghz and you get the peak throughput of 1920*2 (two FP32 per clock)*1.77*2(again used multiple add) ~ 13.5 TFLOPS, which is the number you can see in Nvidia's marketing material.

Of course, these numbers should be taken with a big grain of salt as the represent theoretical maximums that will never be reached under normal conditions. I did reach the declared 2.6TFLOPS on the Apple GPU using a specially written compute shader that did nothing but a long chain of multiply+adds, but that's not how real world code looks like.

One thing about Apple GPUs is that they are just so incredibly power efficient. M1 (total power consumption including memory ~ 12 watts) needs just 4.6 watts per TFTLOP, while the RTX 3060 needs 13.5 watts (this number will be similar for all Ampere GPUs). So a hypothetical 32-core Apple GPU using G13 technology would be a 10TFLOPS part that uses just around 40-45 watts of power.

jeanlain · May 18, 2021

leman said:
The rest of the rumor makes perfect sense. Up to 64GB RAM suggests quad-channel LPDDR5. Option of 16 or 32 core GPU suggests a separate GPU die.

Can this be reconciled with the unified memory architecture?
And will there be enough memory bandwidth for the GPU?

Hexley · May 18, 2021

In terms of economies of scale & supply chain would it make any sense to make specialized chips specifically for mid-range to high-end Macs that probably sells less than 4.5 million units per year?

Would it not make more sense to have multiple M1 SoCs instead in 1 Mac? In terms of logic board space Macs have plenty compared to say a iPad Pro.

CWallace · May 18, 2021

jeanlain said:
Can this be reconciled with the unified memory architecture?
And will there be enough memory bandwidth for the GPU?

If the GPU is on-package like the DRAM, then it should be able to use UMA and since the memory bandwidth is sufficient for 8 GPUs, I would think it should be sufficient for more.

Hexley said:
In terms of economies of scale & supply chain would it make any sense to make specialized chips specifically for mid-range to high-end Macs that probably sells less than 4.5 million units per year?

Apple Silicon is designed to be scaleable and really the only "issue" is a higher-core-count SoC will have a larger die so you will not be able to make as many per wafer. So production costs will be a bit higher, but you're still running them through the same FAB on the same process so you still benefit from production scale cost reductions.

From a design standpoint, multi-chip packages are more complicated so I expect a 32-core "M1X" would have better performance than four 8-core M1s lashed together and the production costs would certainly be lower.

cmaier · May 18, 2021

jeanlain said:
Can this be reconciled with the unified memory architecture?
And will there be enough memory bandwidth for the GPU?

It could all work with unified memory architecture. The latency may be higher just due to distance, but some of that can be negated with increased bandwidth and/or bigger system cache.

leman · May 18, 2021

jeanlain said:
Can this be reconciled with the unified memory architecture?
And will there be enough memory bandwidth for the GPU?

I don't see any principal problems here (but then again, I am not a chip designer). As I understand it, even with separate chips the topology essentially stays the same: you have multiple processors connected via a high-speed fabric to a common cache/memory controller (the last two components are what constitute unified memory). Just with an SoC, all these processors and caches are inside a single chip and the high-speed fabric is inside that chip as well. In a system-in-a-package design these processors can be on separate dies and the high-speed fabric is in the package substrate, connecting all them together.

This is essentially what AMD has been doing for a while: they have multiple CPU chips (chipsets) with 8-cores per chip, all connected via a shared I/O chip that hosts the cache and memory controllers. This is also the architecture of the upcoming Nvidia supercomputer chip as far as I understand. I think Apple will use something similar, because it just makes sense.

Bandwidth etc. is the same, the unstacked design might be slightly less efficient in terms of latency and power efficiency but it's not a big deal. It's cheaper and more flexible.

Hexley said:
In terms of economies of scale & supply chain would it make any sense to make specialized chips specifically for mid-range to high-end Macs that probably sells less than 4.5 million units per year?

That's why I am betting on system-in-a-package design that would allow Apple to manufacture a smaller amount of chips (with higher yields) and combine them into the final system.

Or maybe it will just be a single SoC, with some cores disabled to cover for manufacture defects. I can also see that happening.

Hexley said:
Would it not make more sense to have multiple M1 SoCs instead in 1 Mac? In terms of logic board space Macs have plenty compared to say a iPad Pro.

Because multiprocessor systems don't scale nearly as well as you think. They have their use in servers, but using multiple M1 chips like that would basically throw away all the advantages of Apple Silicon platform (unified memory, high performance, low latency, high power efficiency).

Lemon Olive · May 18, 2021

Serban55 said:
So
- M1 with 8core cpu and gpu)
- M2 with 10core cpu and 16/32 core gpu
- M1x with 8 faster cpu cores cpu and 10 core gpu

I think only these 3 will have this year

I guess it depends on how you read the report. I originally thought the above as well, because I thought the report said the MacBook Air chip was simply adding a 10 core gpu variant. But if they are faster 8 cores as well, that doesn't sound like it can still be the M1 unless they are just clocking it higher, which I doubt.

It will be interesting to see. It is starting to sound more like all the new chips are not M1 variants, unless the report about the MacBook Air is misleading.

JMacHack · May 18, 2021

bobcomer said:
The benchmarks I look are at about 2.3X the M1 (Passmark), but whatever, benchmarks are benchmarks and are only indicators.

https://www.notebookcheck.net/Apple...single-thread-performance-chart.531181.0.html
Single core is equal or superior here,

PassMark Software - List of Benchmarked CPUs

PassMark Software - CPU Benchmarks - Over 1 million CPUs and 5,000 models benchmarked and compared in graph form, updated daily!

www.cpubenchmark.net

Passmark has the 11900k at 9k higher, so a third maybe in a single benchmark, passmark.
With double the logical cores, 100 extra watts, 2.0 extra gigahertz.
Somehow I think this is beatable for Apple.

dgdosen · May 18, 2021

cmaier said:
It could all work with unified memory architecture. The latency may be higher just due to distance, but some of that can be negated with increased bandwidth and/or bigger system cache.

How big will these die sizes be? or will Apple start their own stacking?

Stephen.R · May 18, 2021

Bug-Creator said:
couldn't connect more cores to the infinity matrix

That’s really ironic naming if so.

el-John-o · May 18, 2021

I'm betting the two efficiency cores has a lot more to do with OS integration than anything about them, specifically.

Didn't Apple say that the OS offloads certain tasks onto the efficiency cores all the time? In that case, there could be an advantage to continuing to offload certain tasks onto cores that generate less heat and use less energy; giving more electrical and thermal bandwidth to the high performance cores who might, in theory, produce more heat / use more energy to perform those exact same tasks.

cmaier · May 18, 2021

dgdosen said:
How big will these die sizes be? or will Apple start their own stacking?

The die don’t have to be very big - they are laterally arranged in the single package.

satcomer · May 18, 2021

I’m starting to believe we won’t see a Pro M2 until the fall just in time for Christmas shopping season!

MalcolmH · May 18, 2021

cmaier said:
So now the rumor is a 10-core chip with only two efficiency cores. If so, the efficiency cores must be quite different than the M1 efficiency cores. I continue to stand by my (not at all informed by any wine I swear) claim that these will be M2 and not M1x.

Or the rumor an (nobody i swear) could be wrong. Since a 4:1 ratio seems a little odd.

cmaier said:
If true, then the efficiency cores must be much more powerful this time around, I’d think. Maybe double-wide issue or something.

Now that Apple controls the entire hardware and software stack we could see some aggressive tuning on how the efficiency cores are being used in addition to hardware changes. This article is interesting ..

How MacOS uses efficiency cores

MalcolmH · May 19, 2021

cmaier said:
LOL. Doubt it

Why don’t Apple use hyper threading ?

leman · May 19, 2021

MalcolmH said:
Why don’t Apple use hyper threading ?

Why would they? Hyperthreading is a technique to improve the utilization of the execution backend by sharing execution resources between multiple threads. Given Apple's unprecedentedly large out-of-order execution window and already high backend utilization, they simply don't need hyperthreading to get most out of their CPUs. Besides, hyperthreading complicates chip design and doesn't really work well with Apple's goals of optimizing performance per watt.

Rumored pro chip

macrumors 68000

macrumors 603

macrumors 68000

macrumors member

Suspended

Suspended

macrumors 601

macrumors 65816

macrumors G5

macrumors Core

macrumors 68020

Suspended

macrumors G5

Suspended

macrumors Core

Suspended

Suspended

macrumors 68030

Suspended

macrumors 68000

Suspended

Suspended

macrumors member

macrumors member

macrumors Core

Our Staff