M4+ Chip Generation - Speculation Megathread [MERGED]

Antony Newman · Oct 15, 2024

mr_roboto said:
I'm afraid your posts are a maze of bad information and poor reasoning based on it.
..

You make several good points; I had not factored the enclosures need to power TB/USB devices.

If the Mac Studio sales are >> than the Mac Pro - can you think of a reason why Gurman believed the Hidra would only be coming to the Mac Pro?

crazy dave · Oct 15, 2024

Antony Newman said:
You make several good points; I had not factored the enclosures need to power TB/USB devices.

If the Mac Studio sales are >> than the Mac Pro - can you think of a reason why Gurman believed the Hidra would only be coming to the Mac Pro?

Hmmm ... I hadn't thought he had specified which device the Hidra was to go into, but I'm probably wrong about that and I can't reread the Bloomberg article in question since I must've used my last free article with them. However, if that's the case and operating under the assumption that he is correct*, then it's probably a question of both economies of scale and, yes, thermals. Reusing the Brava chips in as many devices as possible lowers the per unit cost, keeps the Studio affordable (at least far as Macs go) and it doesn't need all the extra PCIe lanes a Hidra die would come with and the Studio's emphasis is on smaller footprint and quiet operations. The M2 Mac Pro meanwhile was (correctly) castigated for being overpriced relative to the Studio and being woefully underpowered for its chassis (both thermals and PCIe) with too little to differentiate itself from the Studio. The Hidra could solve those issues with price mitigated by Apple being its own best customer of the Hidra-based chips. Apple wants those chips for itself anyway, so it might as well sell them externally too and get more mileage out of their development costs. The Hidra-based chip differentiates the Pro from the Studio and, frankly, gives a reason to have such a large chassis with all the internal PCIe lanes and thermal capacity.

*He isn't always - a recent case in point: he said the Mini would launch with the Macs and instead was updated silently. A small mistake perhaps, but he's made other larger ones like saying that there would be no M2 Studios and that the M2 would be a stop gap generation (which turned out to be much more true of the M3)

cassmr · Oct 15, 2024

If the Mac Studio sales are >> than the Mac Pro - can you think of a reason why Gurman believed the Hidra would only be coming to the Mac Pro?

Yes. One it has better thermals. Two, the fact Mac Pro is unpopular is because there is little reason to get it as it doesnt offer performance benefits over the studio.

I would actually be very surprised if it went into the Studio, because it really makes the mac pro seem redundant. Why push the envelope of the studios thermals or redesign it as you had proposed, when you have a design ready to go.

Forcing people who need the power to buy the more expensive version to get it, seems very Apple to me. Also this would make paper sense for the line up:

Mac Mini - M4, M4 Pro
Mac Studio - M4 Max, M4 Ultra
Mac Pro - M4 Ultra, M4 Hidra

To be honest i havent followed the hidra rumors closely. So maybe it would be only the m4 Hidra in the mac pro going forward just with different configs.

Chuckeee · Oct 15, 2024

crazy dave said:
Apple wants those chips for itself anyway, so it might as well sell them externally too and get more mileage out of their development costs.

Are we referring to the same Apple? The Mac Pro is already a small market segment and additional fragmentation would be a death kneel. And Apple is incredibly stingy* about selling or licensing what it views as their unique technology.

* [That is not a bad thing but it is a distinct contrast some other companies whose policy to develop and license technologies]

Antony Newman · Oct 15, 2024

crazy dave said:
Hmmm ... I hadn't thought he had specified which device the Hidra was to go into, but I'm probably wrong about that and I can't reread the Bloomberg article in question …

Apple Plans to Overhaul Entire Mac Line With AI-Focused M4 Chips

(Bloomberg) -- Apple Inc., aiming to boost sluggish computer sales, is preparing to overhaul its entire Mac line with a new family of in-house processors designed to highlight artificial intelligence.Most Read from BloombergChina Tells Iran Cooperation Will Last After Attack on IsraelBeyond the...

finance.yahoo.com

“The M4 chip line includes an entry-level version dubbed Donan, more powerful models named Brava and a top-end processor codenamed Hidra. The company is planning to highlight the AI processing capabilities of the components and how they’ll integrate with the next version of macOS, which will be announced in June at Apple’s annual developer conference.

The Donan chip is coming to the entry-level MacBook Pro, the new MacBook Airs and a low-end version of the Mac mini, while the Brava chips will run the high-end MacBook Pros and a pricier version of the Mac mini. For the Mac Studio, Apple is testing versions with both a still-unreleased M3-era chip and a variation of the M4 Brava processor.

The highest-end Apple desktop, the Mac Pro, is set to get the new Hidra chip. The Mac Pro remains the lower-selling model in the company’s computer lineup, but it has a vocal fan base. After some customers complained about the specifications of Apple’s in-house chips, the company is looking to beef up that machine next year.

As part of the upgrades, Apple is considering allowing its highest-end Mac desktops to support as much as a half-terabyte of memory. The current Mac Studio and Mac Pro top out at 192 gigabytes — far less capacity than on Apple’s previous Mac Pro, which used an Intel Corp. processor. The earlier machine worked with off-the-shelf memory that could be added later and handle as much as 1.5 terabytes. With Apple’s in-house chips, the memory is more deeply integrated into the main processor, making it harder to add more.”

Boil · Oct 15, 2024

Donan = M4 SoC

11" iPad Pro
13" iPad Pro
13" MacBook Air
15" MacBook Air
14" MacBook Pro
24" iMac
Mac mini

Brava = M4 Pro SoC

14" MacBook Pro
16" MacBook Pro
Mac mini

Brava x2 = M4 Max (laptop) SoC

14" MacBook Pro
16" MacBook Pro

Hidra = M4 Max (desktop) SoC

Mac Studio

Hidra x2 = M4 Ultra SoC

Mac Studio
Mac Pro

Hidra x4 = M4 Extreme SoC

Mac Pro Cube
Mac Pro

crazy dave · Oct 15, 2024

Chuckeee said:
Are we referring to the same Apple? The Mac Pro is already a small market segment and additional fragmentation would be a death kneel. And Apple is incredibly stingy* about selling or licensing what it views as their unique technology.

* [That is not a bad thing but it is a distinct contrast some other companies whose policy to develop and license technologies]

By externally I meant to Apple customers in the Mac Pro as opposed to internally for Apple Intelligence Clusters known as PCC. I was not suggesting that they would sell the chips to third parties.

The Mac Pro is already a separate line of device from the Studio, adding the Hidra processor to the Mac Pro only wouldn't be additional fragmentation but rather giving the Mac Pro something to justify its existence - something it struggles with right now. Again the costs would be defrayed by Apple reusing the dies in its own internal PCC.

mr_roboto · Oct 15, 2024

Antony Newman said:
You make several good points; I had not factored the enclosures need to power TB/USB devices.

If the Mac Studio sales are >> than the Mac Pro - can you think of a reason why Gurman believed the Hidra would only be coming to the Mac Pro?

I mean, I'm not a huge believer in Gurman. It usually seems like he just kinda throws lots of guesses out, then narrows down to the right thing a few weeks before launch, when lots more people are seeing the product (because it's being mass produced and shipped out).

But I agree with the idea that a very high power SoC might show up only in the Mac Pro. It's not about which one currently has higher sales, it's that right now if you buy a Mac Pro you're basically buying a M2 Ultra Mac Studio built into a PCIe expansion chassis. An extremely well-built and high performance chassis, higher performance than any you can attach to an actual Mac Studio, but you just aren't getting any more CPU and GPU.

In the Intel Mac era, the big boxy towers always had lots more CPU and GPU power available than any other Mac. Apple may be interested in bringing that back. There's only so much you can do inside a box the size of the Studio; the performance ceiling of the Mac Pro chassis is much higher.

cassmr · Oct 15, 2024

I recall, there were fairly accurate leaks early on in the m1 days. I cant recall if it was prior to launch, or prior to the Pro/max chip launch. That in addition to fairly accurately guessing the pro, max and ultra chip strategy (in terms of core counts etc, not naming) and that they were going to have a configuration beyond ultra. However that final level never appeared.

I think there was discussion that they were having trouble pulling off 2xultra, or they just didnt believe demand was there, or they needed to focus more engineering resources on their core market. I mean it made sense to go ham on Laptops in 2020, not only are they their biggest segment, but it was a booming market at the time.

Anyway, my expectation prior to the launch of the Mac Pro was that it would have a higher than ultra option even if it was just 2 x ultras in the old school parallel method, and i think it was kind of shocking it didnt have some higher performance option. My guess is they werent able to, for whatever reason, deliver that in time for mac pro launch, but had committed to transitioning in a set time, had already done the engineering for the other parts of the mac pro. It also wouldnt be a good look to pull the prior mac pro from sale, but also would have been bad to leave it on sale much longer when in many tasks it was being out performed by a much cheaper laptop. Apple does not seem overly concerned about this perception, but it may have been another factor in releasing the mac pro with the chip they had rather than maybe the chip they had planned to have.

The move to more P and less E cores on max chips, i think shows a bit more thinking from them on is needed in a pro desktop. So will be interesting to see if these hidra chips are really more desktop/server focused, even if i still expect them to retain perf/watt as a key focus.

Actually when m1 launched, i expected they would differentiate the higher end chips with different letters (i.e. m = mobile, maybe a D or P for desktop or performance/pro).

Anyway probably have to wait almost a full year to see what happens on that front.

tenthousandthings · Oct 16, 2024

The story about iPhone 18, Apple A20, TSMC N2, and WMCM (“Wafer-level Multi-Chip Module”) packaging has been overrun by RAM trolling, not a single mention of the new packaging in 119 comments.

The question I have is, does this change from InFO-PoP (package on package) packaging in the iPhone to this new WMCM packaging also herald a shift in the M6 generation?

Note that I don’t know how “new” WMCM is, I’m guessing it is an industry term, but I don’t see any sign of it on TSMC’s website.

Chuckeee · Oct 16, 2024

tenthousandthings said:
The question I have is, does this change from InFO-PoP (package on package) packaging in the iPhone to this new WMCM packaging also herald a shift in the M6 generation?

It would appear to be especially tempting for the larger chips (Max, Ultra, Hidra?)

Also possibly low leveler integration of a custom Apple modem?

tenthousandthings · Oct 16, 2024

Chuckeee said:
It would appear to be especially tempting for the larger chips (Max, Ultra, Hidra?)

Also possibly low leveler integration of a custom Apple modem?

Yes, the latter (the modem chip) seems like the obvious thing driving Apple’s investment in the change, so maybe it will just be limited to iPhones and iPads, but I don’t know, it isn’t hard to imagine other chips for the M-series.

tenthousandthings · Oct 16, 2024

tenthousandthings said:
The story about iPhone 18, Apple A20, TSMC N2, and WMCM (“Wafer-level Multi-Chip Module”) packaging […] Note that I don’t know how “new” WMCM is, I’m guessing it is an industry term, but I don’t see any sign of it on TSMC’s website.

I take that back. TSMC wafer-level integration has been in the news fairly recently, see here (RIP Anandtech):

TSMC's System-on-Wafer Platform Goes 3D: CoW-SoW Stacks Up the Chips

www.anandtech.com

This doesn’t seem ~~quite~~ at all the same, but maybe related?

name99 · Oct 16, 2024

tenthousandthings said:
The story about iPhone 18, Apple A20, TSMC N2, and WMCM (“Wafer-level Multi-Chip Module”) packaging has been overrun by RAM trolling, not a single mention of the new packaging in 119 comments.

The question I have is, does this change from InFO-PoP (package on package) packaging in the iPhone to this new WMCM packaging also herald a shift in the M6 generation?

Note that I don’t know how “new” WMCM is, I’m guessing it is an industry term, but I don’t see any sign of it on TSMC’s website.

Packaging is really difficult to say anything about. Even more so than SoC, it's driven by so many details (eg costs, factory capacity) that we know nothing about. We can imagine through the tech options but that only suggests what will happen over the next decade, not what will happen next year...

But think about it. The iPhone has limited area, so right from the start A-series have used some sort of vertical mounting, first PoP, then InFO-PoP, to mount the DRAM above the SoC. Takes less area, probably limits heat dissipation, but in a phone if you want to dissipate that much heat you probably have bigger problems...

Meanwhile M- devices (even the smallest of them, probably the new mac mini) aren't so constrained by area. So what's the advantage in putting the DRAM above, rather than beside the SoC? Probably none. So this is probably not interesting to M- series, at least DIRECTLY in how A does things.

But DRAM isn't all there is. One could IMAGINE other things stacked on an SoC. Obviously something like V-cache, less obviously something like an MRAM storage (combination of L4 cache with persistence). The MRAM case is more interesting in the sense that
- Apple, more so than anyone else, could update the OS and HW simultaneously to exploit this new functionality (unlike eg Optane, where Intel was unable to ever co-ordinate with MS and dozens of vendors to actually create real value)
- stacking is probably necessary if you want this functionality, because MRAM is only available on non-leading edge processes, and that will probably remain the case for many years.

name99 · Oct 16, 2024

tenthousandthings said:
I take that back. TSMC wafer-level integration has been in the news fairly recently, see here (RIP Anandtech):

TSMC's System-on-Wafer Platform Goes 3D: CoW-SoW Stacks Up the Chips

www.anandtech.com

This doesn’t seem quite the same, but maybe related?

System on wafer is for huge designs, things like Cerebras.

But generically one way to package separate chips together (I want to stack A on B) is one A chip on one B chip at a time. Another way to do it is to "glue" wafer B on top of wafer A, before you even separate the chips, then dice the paired wafers.
The advantage of the first scheme is that you only pair known-good-dies, you have already binned dies before packaging. The advantage of the second scheme is that you get a lot more packaging done in one step, and if your yields (for at least one of A or B) are good, then the issue of mismatched pairs (a good A with a bad B, or vice versa) is an acceptable cost. Of course it's harder to align (to a micrometer scale or smaller) large wafers rather than small chips... But this sort of aligning has been solved over the past few years anyway, for other reasons.

So my guess is that WMCM is yet another variant of stacking wafer A on wafer B. Obviously details matter immensely to the engineers involved (in this scheme A and B communicate by matching copper pads, in that scheme they communicate by TSVs, in this third scheme we build an RDL on top of wafer A that matches the pads on wafer B, or whatever) which is why we get all these different names.
But the outsider view I've given covers, I think, everything that actually matters to the rest of us.

Boil · Oct 16, 2024

New packaging options could be the way to get more GPU cores for the larger M-series products; 256, 512, 1024 core GPUs in the Mac Studio, Mac Pro Cube, Mac Pro; yes, please...! ;^p

Confused-User · Oct 16, 2024

name99 said:
But DRAM isn't all there is. One could IMAGINE other things stacked on an SoC. Obviously something like V-cache, less obviously something like an MRAM storage (combination of L4 cache with persistence). The MRAM case is more interesting in the sense that
- Apple, more so than anyone else, could update the OS and HW simultaneously to exploit this new functionality (unlike eg Optane, where Intel was unable to ever co-ordinate with MS and dozens of vendors to actually create real value) [...]

Much like HBM - assuming they don't use *only* HBM, which would be really pricey even for Apple - this would require them to build memory tiering into the OS. Not a small task, but not that gigantic either.

What's really interesting about that is that if they bother to do that... well, then, it's not so hard to have another tier - say, a couple TB of DDR5 DIMMs, in your Mac Pro.

I know there are other possibilities for huge RAM, some of which they have patented, but this one comes not-quite-but-almost free if they do MRAM or HBM.

Boil · Oct 16, 2024

Mac Pro Cube
64-core CPU
1024-core GPU
256-core NPU
512GB MRAM
4TB RAM
32TB NVMe SSD
;^p

leman · Oct 19, 2024

Confused-User said:
Much like HBM - assuming they don't use *only* HBM, which would be really pricey even for Apple - this would require them to build memory tiering into the OS. Not a small task, but not that gigantic either.

Does this has to be done on OS level? Can’t this kind of configuration work like any other cache?

BTW, Apple has long had patents describing using cache memory as RAM, so it is likely they already have this kind of functionality built in.

Confused-User · Oct 20, 2024

leman said:
Does this has to be done on OS level? Can’t this kind of configuration work like any other cache?

BTW, Apple has long had patents describing using cache memory as RAM, so it is likely they already have this kind of functionality built in.

Sure, you could use HBM as cache, but is it cost-effective? I have no idea. I mean, it will obviously depend on whether there's a significant category of work where you have a working set that's way too big for the existing caches, but fits in 8 or 12GB probably - maybe double that but that's a lot of shoreline on the chip for a cache, unless you're doing some sort of stacking. Is that a thing? I honestly have no idea.

As for the patents, unless they explicitly cover the OS aspects of the tech, I wouldn't assume that the code was done, at least at that point.

I think that the ideal implementation allows for at least three different memory models:
1) Use HBM as cache
2) Allow OS to automatically assign HBM/regular memory to processes as best it can, possibly with hinting - even something as basic as a file flag or xattr saying "prefer HBM" or "prefer normal".
3) Allow specifying use of memory types at the app layer

I don't know if #3 has been done in this way before. But there are precedents of sorts. Without giving it too much thought I imagine you'd at least have variations and extensions of mmap(), madvise(), malloc(), etc.

leman · Oct 20, 2024

Confused-User said:
Sure, you could use HBM as cache, but is it cost-effective?

That's why I don't believe Apple will use HBM. I don't see it as technology that makes much sense for them. Rather, I can imagine them using a dedicated cache die. Something like this (from this patent)

cLlFl_KKraQxmdSe0m6QHqpSfHfjcKGrZLMxYuVeQ9tCMFOwRBvfFGm3MBH_y1gs.png

Confused-User said:
I don't know if #3 has been done in this way before. But there are precedents of sorts. Without giving it too much thought I imagine you'd at least have variations and extensions of mmap(), madvise(), malloc(), etc.

Or it would work entirely transparently just like every Apple Silicon system until now. They don't even give you APIs for controlling thread and memory residency across their multi-SoC computers, I doubt that they would expose the details of their tiered memory, if it ever comes.

Also, we already have MADV_WILLNEED. Do you think you need some additional hint or setting?

name99 · Oct 20, 2024

US20240323471A1 - Accessory for electronic device including broadcast signal tuner - Google Patents

A stylus usable with an electronic device to provide an input to the electronic device, may include a stylus body, a battery within the stylus body, a first antenna configured to receive a broadcast signal, the broadcast signal including video content, a second antenna configured to communicate...

patents.google.com

Apple pencil that allows you to tune into and watch broadcast TV!
(This is a truly weird one, solving a problem I was unaware anyone was even complaining about! How many people who buy Apple Pencils even still watch ATSC, rather than using Netflix or Prime or whatever?)

Let the wild speculation begin...

Oh, for extra yucks, Apple Ring!

US20240281065A1 - Ring Device - Google Patents

A ring device may be worn on a user's finger. The ring device may include near-field communications circuitry for emulating near-field communications tags based on biometric data and/or for logging health-related actions such as medicine intake. An inertial measurement unit in the ring device...

patents.google.com

cassmr · Oct 20, 2024

name99 said:
US20240323471A1 - Accessory for electronic device including broadcast signal tuner - Google Patents

A stylus usable with an electronic device to provide an input to the electronic device, may include a stylus body, a battery within the stylus body, a first antenna configured to receive a broadcast signal, the broadcast signal including video content, a second antenna configured to communicate...

patents.google.com

Apple pencil that allows you to tune into and watch broadcast TV!
(This is a truly weird one, solving a problem I was unaware anyone was even complaining about! How many people who buy Apple Pencils even still watch ATSC, rather than using Netflix or Prime or whatever?)

Let the wild speculation begin...

Oh, for extra yucks, Apple Ring!

US20240281065A1 - Ring Device - Google Patents

A ring device may be worn on a user's finger. The ring device may include near-field communications circuitry for emulating near-field communications tags based on biometric data and/or for logging health-related actions such as medicine intake. An inertial measurement unit in the ring device...

patents.google.com

That's not what they mean by broadcast. They basically just mean transmit radio signals, i.e. just communicate. This appears to just be some updated patents related to the pencil for communication between tablet and pencil. The reference to broadcast of video signal between the stylus and other devices is unclear to me, that may relate to how they process stylus input, or perhaps they may intend to allow you to use the stylus on say your phone /ipad surface but do the actual drawing on different display or mac, like a wacom. But its probably just some optimisation to how the pencil works.

Confused-User · Oct 21, 2024

leman said:
Or it would work entirely transparently just like every Apple Silicon system until now. They don't even give you APIs for controlling thread and memory residency across their multi-SoC computers, I doubt that they would expose the details of their tiered memory, if it ever comes.

Also, we already have MADV_WILLNEED. Do you think you need some additional hint or setting?

...maybe?

I don't know anything about academic research on hierarchical memories. So I don't know if certain schemes have already been modeled or tested. I do have some understanding of the related problem in storage, but I don't even know if that understanding is portable - does the relationship between disk and SSD match that between RAM and HBM? No idea.

But I can at least imagine a few different ways you might want to approach the problem of the OS supporting some level of memory-tier awareness. Obviously you could support direct allocations from HBM or RAM, with varying failure modes when HBM isn't available. You could have priority levels for determining who gets evicted to DRAM. This might need to interact with OS-level permissions.

When you talk about a "dedicated cache die", I assume you're talking about something like AMD's X3D. Does that make sense for Apple? I mean, what does it buy them, in their target markets? So far, X3D has proven useful mostly (though not exclusively) in games. That's not exactly a motivating factor for Apple.

Of course Apple would have a minor advantage with a cache die, in that they wouldn't have to reduce their clock speed to take advantage of it the way AMD does, since they clock lower anyway.

leman · Oct 21, 2024

Confused-User said:
...maybe?

I don't know anything about academic research on hierarchical memories. So I don't know if certain schemes have already been modeled or tested. I do have some understanding of the related problem in storage, but I don't even know if that understanding is portable - does the relationship between disk and SSD match that between RAM and HBM? No idea.

But I can at least imagine a few different ways you might want to approach the problem of the OS supporting some level of memory-tier awareness. Obviously you could support direct allocations from HBM or RAM, with varying failure modes when HBM isn't available. You could have priority levels for determining who gets evicted to DRAM. This might need to interact with OS-level permissions.

I agree that there are probably different ways to model memory hierarchy in APIs, and that furthermore optimal APIs for different use cases can differ as well. However, here I am mainly concerned with the question "what would Apple do?". Historically, they seem to prefer solutions that hide this kind of complexity from the user, going to great lengths to smooth things out for the client software. NUMA and tiered memory programming is fun on supercomputers, not that much on consumer hardware. Apple wants their APIs to be scalable across the different types of software. They don't even give you APIs for pinning threads to CPU cores. Their CPU hierarchy abstraction is shared L2 cache, and that's it. And even then those APIs don't work reliably. I am experimenting with SME currently and I can't get the work to execute on E-cluster for example.

Apple invested who knows how much $$$ designing a NUMA system (M1/M2 Ultra) that would appear to have uniform access properties. And their patents go into great detail describing systems that would copy data across memory controllers to optimize access patterns and power efficiency. So no, I don't expect them to expose more details to the user. Rather they will try to design systems that do the "right thing" automatically. Whether it will be successful or not is a different question.

Confused-User said:
When you talk about a "dedicated cache die", I assume you're talking about something like AMD's X3D. Does that make sense for Apple? I mean, what does it buy them, in their target markets? So far, X3D has proven useful mostly (though not exclusively) in games. That's not exactly a motivating factor for Apple.

Of course Apple would have a minor advantage with a cache die, in that they wouldn't have to reduce their clock speed to take advantage of it the way AMD does, since they clock lower anyway.

A number of important differences come to mind. First, Apple is feeding a bunch of bandwidth-reliant IP blocks off the same memory interface, not just the CPU cores. Second, Apple's CPU cores have access to much higher bandwidth than the AMD cores. Third, a large cache (I am talking about hundreds of MBs, not dozens) could potentially allow Apple to use a smaller memory interface to the main RAM, saving costs. That is the solution in Fig. 2/3 that I have posted in #1145

M4+ Chip Generation - Speculation Megathread [MERGED]

macrumors member

macrumors 68000

macrumors member

macrumors 68040

macrumors member

macrumors 68040

macrumors 68000

macrumors 6502a

macrumors member

Contributor

macrumors 68040

Contributor

Contributor

macrumors 68030

macrumors 68030

macrumors 68040

macrumors 6502a

macrumors 68040

macrumors Core

macrumors 6502a

macrumors Core

macrumors 68030

macrumors member

macrumors 6502a

macrumors Core

Our Staff