M4+ Chip Generation - Speculation Megathread [MERGED]

Guenter · Jun 9, 2024

Doesnt servers use ecc ram? As far as I know this is not supported by current m prozessors. Will this change or is the integrated ram less sensitive for errors?

vigilant · Jun 9, 2024

leman said:
Using custom SoC with large NPUs would indeed be the most reasonable way to build an Apple LLM server. Of course, it would also be a huge investment and it is not clear whether the current-gen NPU will work well with future LLMs. This is why I find all these rumors rather dubious.

I’d tend to agree with you on this.

Could Apple do a specialty Server build of an M processor with more NPUs? Sure.

What seems more likely, is that Apple can have multiple M4 Ultras in a server, liquid cooled. Potentially even a higher tier “binned” part that surpasses what Apple puts in Mac Studios. Considering Apple has been using OSX as the basis for literally all of their mobile devices, it’s not ridiculous to think that Apple could build “serverOS” with purpose built optimizations around the fact that the GPU isn’t powering a display, increasing the amount of instructions that need to be processed. If memory serves me correctly, this is a hardware feature thats been in the M1 since the beginning. I expect more intelligent people to correct that.

Could Apple be looking at long term building custom Server SOCs? Sure, but it would need to reconcile priorities for its fabbing budget with TSMC. They can only fabricate so many chips. With as many Apple Silicon SOCs that they are pumping out, theres limitless variations of “binned” parts they can build boxes out of for various types of uses without having to spend the R&D.

Xiao_Xi · Jun 9, 2024

leman said:
I had another look at ARM's Architecture Manual, and the matter is indeed extremely confusing. They define a series of features as part of ARMv9, but pretty much all of them are marked as optional. In fact, ARMv9 is pretty much defined as equivalent to ARMv8.5.

It appears that the difference between an Armv8 core and an Armv9 core is what trace features implement. According to the Arm Architecture Reference Manual Supplement Armv9 (pag 25):

An implementation of the Armv9-A architecture cannot include an Embedded Trace Macrocell (ETM)

The Embedded Trace Extension (ETE) and the Trace Buffer Extension (TRBE) are trace components introduced in Armv9-A.

leman · Jun 9, 2024

Guenter said:
Doesnt servers use ecc ram? As far as I know this is not supported by current m prozessors.

How do you know that M-series are not using error-corrected RAM? Apple hasn't given us any information about this. They have a bunch of patents describing error correction with LPDDR5, whether these things are actually being used or not — we simply have no way of knowing.

vigilant said:
What seems more likely, is that Apple can have multiple M4 Ultras in a server, liquid cooled. Potentially even a higher tier “binned” part that surpasses what Apple puts in Mac Studios.

They don't even need to liquid cool it. It's not like M2 Ultra draws that much power. I still don't really see the purpose of such a server. Certainly not for ML workloads. It seems to me that Nvidia systems would be better and more cost-effective for both training and inference.

vigilant said:
optimizations around the fact that the GPU isn’t powering a display, increasing the amount of instructions that need to be processed

The GPU doesn't really have anything to do with a display. It's a parallel processor with some graphics-related functionality. It takes data from memory, does some processing, and writes the results back to memory. What exactly happens to the result is not the GPU's business. Will GPU have more time for compute if there is no display connected? Certainly. Will it meaningfully change the performance? Absolutely not

vigilant said:
Could Apple be looking at long term building custom Server SOCs? Sure, but it would need to reconcile priorities for its fabbing budget with TSMC. They can only fabricate so many chips.

If they use the older 5nm process, manufacturing won't be a problem. The bigger constraint is RAM.

vigilant · Jun 9, 2024

leman said:
How do you know that M-series are not using error-corrected RAM? Apple hasn't given us any information about this. They have a bunch of patents describing error correction with LPDDR5, whether these things are actually being used or not — we simply have no way of knowing.

They don't even need to liquid cool it. It's not like M2 Ultra draws that much power. I still don't really see the purpose of such a server. Certainly not for ML workloads. It seems to me that Nvidia systems would be better and more cost-effective for both training and inference.

The GPU doesn't really have anything to do with a display. It's a parallel processor with some graphics-related functionality. It takes data from memory, does some processing, and writes the results back to memory. What exactly happens to the result is not the GPU's business. Will GPU have more time for compute if there is no display connected? Certainly. Will it meaningfully change the performance? Absolutely not

If they use the older 5nm process, manufacturing won't be a problem. The bigger constraint is RAM.

It depends on how dense they make each server and how many U’s they are. The Mac Studio has a fairy substantial heatsink.

If you were to pack pack 4 M2 Ultras in say a 1U server, I don’t think it could be done without better cooling than what the Mac Studio offers. If they were to use the exact same heat sink as the Mac Studio, that would be, what probably a 2U server? Plus you have to take into account airflow to ensure the heat is being dissipated without any obstructions from the other heat sinks in the server.

The NVidia systems going out to Microsoft for Open AI to use are starting at $100k. The lead time on these systems is also pretty long as well. Microsoft and OpenAI have spent $10s of billions building out their hardware. Whether it be from a budgeting perspective or lead time, I don’t know if it’s practical to wait until an order comes in by the end of this year to make a dent in appleOS 18. The ML market is going to get some disruption in the next year or two. Not because NVidia will no longer be the “golden goose”. Rather it’s because competitors are forming that could be “good enough” at various price points. I’m thinking specifically about Google’s TPU, Amazon and Microsoft ML Inference focused custom silicon as well. I don’t see NVIDIA losing its crown any time soon. But I do see a time where you can get 75% of the work that gets done on NVIDIA hardware being possible at half or a quarter of the price, granted it may take a little bit longer. It’s the same reason why Windows won. Windows has never been superior to the Mac in use ability (though until OSX, it was superior technically). Windows won the desktop because it was “good enough for the money”.

The purpose of my post wasn’t to say that Apple SHOULD use M silicon in a server. Just whether or not it makes sense to design something from scratch, or use what already exists. Apple probably has hundreds of thousands of M2 variants sitting around. Considering running their data center isn’t “directly” adding value to any specific product (yes, I know, providing AI in their own data center is “indirectly” benefitting the existing product lines).

I do agree with you on the RAM constraint being the biggest thing they’ll need to overcome.

Guenter · Jun 9, 2024

On cpumonkey.com they say ecc: no, but I dont know how they come to that conclusion.

MrGunny94 · Jun 9, 2024

Guenter said:
Doesnt servers use ecc ram? As far as I know this is not supported by current m prozessors. Will this change or is the integrated ram less sensitive for errors?

Take a look at Amazon Gavitron and Azure ARM chips they are ECC memory built in package

tenthousandthings · Jun 9, 2024

treehuggerpro said:
My speculation was based on the (seemingly) more reliable rumours we’ve had so far. Any, or all, of which could still be off or wrong.

What the rumours have speculated for the M4s is:

The previous three chip building block approach for Apple’s product stack will remain, but the focus / use of each chip tier may be changing.

The mid-tier (Pro or Brava) chip, is supposed to meet the requirements of the MacBook Pros, i.e. a mobile focused chip that can fill both the current Pro and Max roles, presumably x1 Brava for a Pro and x2 Brava for the Max.

The top-tier (Max or Hidra) chip, is supposed to be switching away from mobile to become an x2 / x4 (Hydra) only chip. The concept of the Hidra chip being for x2 (Ultra) and x4 (Extreme) configurations only, would suggest Apple can salvage some die space from any of the unnecessary duplications that might occur using a chip that also has to work in a standalone application.

The core counts per chip tier in my post above were just intended to fit approximately, and comfortably, within Apple’s current M-chip die sizes, while not disrupting the steps in their current product stack. The additional bit of speculation (using Gurman’s M4 timeline) is an assumption based on the availability of Eliyan’s new interconnect, due in the third quarter.

Along with low power consumption and design flexibility, Eliyan’s interconnect provides the packaging approach Apple’s patent outlines as a lower cost, higher yielding, faster way to build multi-chip-modules. Good-all-round it seems.

@tenthousandthings follows this stuff pretty closely, he might know some, or ten thousand, things that could counter my assumptions on Eliyan’s interconnect as Apple’s next step for the M series of chips?
[...]

Thanks for the mention! With WWDC coming up, I'm just wondering if Apple will use it to say something more about M4. You never know.

In terms of your assumptions, I guess I'd point out that in 2022, TSMC spent (USD) $3.6 billion on advanced packaging capacity, Intel spent $4 billion, Samsung $2 billion. [Anandtech] Not to mention spending by members of TSMC's 3DFabric Alliance and other industry partners.

As I understand it, Eliyan has raised $120 million total in two rounds. One of their investors is Intel (both rounds), so it wouldn't be unreasonable to speculate (this is the speculation thread, after all!) that their activity on N3 at TSMC is related to Intel's upcoming products on N3 (Lunar Lake, etc.) The timing seems about right, but there are a lot of unknowns. It would, however, explain how a startup like Eliyan was able to access TSMC's N3 capacity at all.

I have no comment on the pending patent, but I'd be wary of reading anything into that. Keep in mind the massive scale of the industry-wide effort in this direction, for both standard packaging and advanced packaging. System-in-Package is a thing, but it's a really, really big thing. [!]

Pressure · Jun 9, 2024

tenthousandthings said:
Thanks for the mention! With WWDC coming up, I'm just wondering if Apple will use it to say something more about M4. You never know.

In terms of your assumptions, I guess I'd point out that in 2022, TSMC spent (USD) $3.6 billion on advanced packaging capacity, Intel spent $4 billion, Samsung $2 billion. [Anandtech] Not to mention spending by members of TSMC's 3DFabric Alliance and other industry partners.

As I understand it, Eliyan has raised $120 million total in two rounds. One of their investors is Intel (both rounds), so it wouldn't be unreasonable to speculate (this is the speculation thread, after all!) that their activity on N3 at TSMC is related to Intel's upcoming products on N3 (Lunar Lake, etc.) The timing seems about right, but there are a lot of unknowns. It would, however, explain how a startup like Eliyan was able to access TSMC's N3 capacity at all.

I have no comment on the pending patent, but I'd be wary of reading anything into that. Keep in mind the massive scale of the industry-wide effort in this direction, for both standard packaging and advanced packaging. System-in-Package is a thing, but it's a really, really big thing. [!]

The Apple Watch is SiP.

Confused-User · Jun 9, 2024

leman said:
Quite a lot of testing has been done on P- and E-cores. So far, the conclusion is that E-cores use considerably less power for every tested workload, so I expect this to generalize. The difference in power consumption is just too large. Maybe it is possible to craft an artificial workload that exploits architectural differences between P- and E-cores to make the P-core more efficient, I doubt that such workload will be useful.
[...]
If we are talking about mainstream ML using CPU cores, that is always a massively parallel task (GEMM is trivially parallelizable, which is why we use GPUs to accelerate ML). But I think that in this day and age, building CPU clusters to accelerate ML is a nonsensical enterprise. Other hardware solutions (NPUs, GPUs) are simply better suited for the task.

That was sort of my point. If you're actually using CPU clusters (as opposed to GPU/NPU), then chances are, you're *not* doing the typical massively-parallel AI tasks. So in that case, it may be that you lose way too much in efficiency spreading the load widely for E cores to be as good as P cores.

My other point (which I think was pretty clear) was that latency may be a significant factor for customer-facing work, in which case P cores may be a better choice even if less efficient for the task.

I think without knowing what they want to do with these servers, we can't really know what's better.

treehuggerpro · Jun 9, 2024

@tenthousandthings, thanks for the reply! Unhealthy gobs of procrastination and the wait . . . for the near mythical x4 Mac peaked my interest not so long ago. So I have a, more or less, tunnel vision appreciation gained from meandering through the articles on Eliyan’s News / Press page. Starting with a good article linked by @Antony Newman a couple months back.

The things gleaned don’t necessarily point in any specific direction (unfortunately), but they are intriguing, no less. Especially so now, in terms of Apple’s x4 ambitions and Gurman’s speculated M4 timing. It would be great if Apple has something to say tomorrow, but my expectations are low given the quiet lead up to WWDC.

Whether the Eliyan dots (rationally) connect or not, and hearing some objective input / scrutiny, is the main motivation for speculating. So no need to be wary.

Beyond Apple’s patent in this direction, there are a few tidbits / loose assumptions that spurred my interest on:

• While Eliyan’s interconnect has a specific aim, and claim, in resolving the bandwidth requirements for larger SIP solutions, it also works with existing interconnect standards and in advanced packaging technology. Meaning it is complimentary with the existing investment / tech and, for the right customer, could be adopted for uniformity and scale without necessarily being driven by a single and/or specific end design goal.

• While Eliyan are well backed, where their interconnect offers the greatest potential gains right now (in large AI / GPU SIP designs), the current market is somewhat monopolised by profit motivated decisions, as per The Next Platform article.

• What Eliyan needs is a client with a specific goal to make the jump (at scale) for a different type of journey down the SIP path.

• This is why Apple seems a good and likely fit as an early adopter. Leveraging the benefits Eliyan are spruiking for SIP solutions does not carry the same negative market / profit motivated challenges for Apple's customer facing business that Timothy Prickett Morgan discusses for the other AI server related examples.

• And Eliyan does have one (undisclosed) client:

Timothy Prickett Morgan says:
MARCH 29, 2024 AT 8:37 PM
They have done their PHY in 5 nanometer and 3 nanometer processes as far as I know. And they have one large scale customer, someone not quite a hyperscaler but more than a typical service provider.

This may or may not be Apple, but it is a positive indication they have a foothold with someone looking to leverage the benefits their interconnect offers.

Confused-User · Jun 9, 2024

I am skeptical about all this talk of Eliyan. Apple has established pretty clearly that they want to own their stack from the bottom to the top. They're trying to bring everything in-house that isn't already there. The last major piece of the puzzle for them is RF stuff; being able to dump Qualcomm and Broadcom is extremely important for them. (There are other smaller pieces, much of it analog stuff. But they're working on that too, I expect.)

I don't see them taking on a new dependency now. That's exactly the opposite of their general direction as a company. The one way I can imagine this happening is if they have an option to acquire Eliyan completely - which is not impossible. But I do think it's much more likely they will continue to build things on their own. They've certainly had good success with that strategy so far.

leman · Jun 9, 2024

Confused-User said:
That was sort of my point. If you're actually using CPU clusters (as opposed to GPU/NPU), then chances are, you're *not* doing the typical massively-parallel AI tasks. So in that case, it may be that you lose way too much in efficiency spreading the load widely for E cores to be as good as P cores.

My other point (which I think was pretty clear) was that latency may be a significant factor for customer-facing work, in which case P cores may be a better choice even if less efficient for the task.

I think without knowing what they want to do with these servers, we can't really know what's better.

Then it seems we are in full agreement. I was merely commenting on a CPU-heavy server for ML workloads, as that is what the poster I replied to appeared to be talking about.

treehuggerpro · Jun 10, 2024

Confused-User said:
I am skeptical about all this talk of Eliyan. Apple has established pretty clearly that they want to own their stack from the bottom to the top. They're trying to bring everything in-house that isn't already there. The last major piece of the puzzle for them is RF stuff; being able to dump Qualcomm and Broadcom is extremely important for them. (There are other smaller pieces, much of it analog stuff. But they're working on that too, I expect.)

I don't see them taking on a new dependency now. That's exactly the opposite of their general direction as a company. The one way I can imagine this happening is if they have an option to acquire Eliyan completely - which is not impossible. But I do think it's much more likely they will continue to build things on their own. They've certainly had good success with that strategy so far.

Yes, Apple’s vertical stack is very much to their advantage. Unless there is a particular deficiency Apple needs addressed though, I’d venture the type of IP Eliyan has sitting under their PHY, enabling its bandwidth and size, is not the kind of IP Apple would have a particular interest in re-spinning. That's the assumption I'm working with, because it is IP packaged within the existing industry standards, and that can basically slot in everywhere existing interconnects already sit.

@tenthousandthings would have better knowledge on just how specific and/or idiosyncratic proprietary solutions might get in this space, but, for the most part, where standards are present, there’s generally benefits to broad adoption.

leman · Jun 10, 2024

treehuggerpro said:
Yes, Apple’s vertical stack is very much to their advantage. Unless there is a particular deficiency Apple needs addressed though, I’d venture the type of IP Eliyan has sitting under their PHY, enabling its bandwidth and size, is not the kind of IP Apple would have a particular interest in re-spinning. That's the assumption I'm working with, because it is IP packaged within the existing industry standards, and that can basically slot in everywhere existing interconnects already sit.

@tenthousandthings would have better knowledge on just how specific and/or idiosyncratic proprietary solutions might get in this space, but, for the most part, where standards are present, there’s generally benefits to broad adoption.

I am skeptical about Apple using Eliyan tech for two reasons. First, Apple never cared about UCIe and multi-chip-module standardization because they are not interested in integrating third-party dies into their solutions. Second, Apple already has comparable tech with their UltraFusion interconnect. If I understand it correctly, the main advantage of Eliyan's IP (besides standardization) is that it does not require a bridge and communicates via a standard substrate. This is a great cost saver if you need a few GB/s die-to-die interconnect and don't want to pay for advanced packaging. I am not convinced that a no-bridge solution is viable for Apple with their multi-TB/s requirements. They will likely still need a bridge. And if they need a bridge, why license some third-party tech if you already have a perfectly fine solution?

Pressure · Jun 10, 2024

Just to add to @leman position on that specific interconnect it is clear that Apple have been racing to eliminate third party solutions to get full control of the hardware stack for the last decade. This has enabled the great performance per watt we have come to expect from Apple Silicon.

Apple doesn't like paying unneeded royalties even if it means they need to heavily invest in the short term.

treehuggerpro · Jun 10, 2024

leman said:
I am skeptical about Apple using Eliyan tech for two reasons. First, Apple never cared about UCIe and multi-chip-module standardization because they are not interested in integrating third-party dies into their solutions. Second, Apple already has comparable tech with their UltraFusion interconnect. If I understand it correctly, the main advantage of Eliyan's IP (besides standardization) is that it does not require a bridge and communicates via a standard substrate. This is a great cost saver if you need a few GB/s die-to-die interconnect and don't want to pay for advanced packaging. I am not convinced that a no-bridge solution is viable for Apple with their multi-TB/s requirements. They will likely still need a bridge. And if they need a bridge, why license some third-party tech if you already have a perfectly fine solution?

Eliyan’s PHY has a universal set of applications (die to die / die to memory / memory to memory in both advanced and standard packaging), but yes, the significant leap, of sorts, that it has on offer is in the flexibility it provides for larger SIP designs. High bandwidth, low power consumption, low latency and physical reach for configurations, specifically, where the limitations of a silicon interposer and/or a bridge are not viable is where its super strengths lie. Like the x4 configuration put forward in Apple’s patent.

And to be clear, I’m not suggesting the use of an ultra-fusion type bridge will end if, or when, Apple manages to execute an x4 SIP. They are different applications addressing the different constraints of different objectives for different categories of products.

What I am suggesting, is there is a significant, elegant (universal) solution in Eliyan’s UMI-SBD tech, because it is not constrained to a particular objective and/or end solution. And, currently, Eliyan stands alone in presenting a comprehensive interconnect solution for addressing where the industry is heading in terms of SIP design.

Apple may indeed, have its own parallel and completely proprietary technology aimed at solving the exact same set of problems for itself, but we can’t discuss that as yet, can we?

And it’s valid to say Apple can and/or will roll its own Simultaneous Bidirectional PHY using interference cancellation techniques to double bandwidth and fit neatly within the bump pitches of both Advanced and Standard packaging, but that’s an opinion I cannot venture into with any certainty regarding Apple’s future plans.

The only potential counter, I suppose, is pragmatism and efficiency will probably win the day, whichever way that goes! It usually does in my industry.

I design buildings not tech, my interest here is in the end product and the scope of my curiosity has been following leads to that end. Perhaps you could expand your position on ultra-fusion with some explanations and/or A / B extrapolations on how its capacity compares in the range of applications that make Eliyan’s interconnect an intriguing point of reference in the current vacuum?

It’s the current vacuum after all, that provokes these discussions and curiosities. And if Gurman’s M4 schedule is more correct than wrong tomorrow, it’s a vacuum we’ll remain in for some time to come.

tenthousandthings · Jun 10, 2024

treehuggerpro said:
Yes, Apple’s vertical stack is very much to their advantage. Unless there is a particular deficiency Apple needs addressed though, I’d venture the type of IP Eliyan has sitting under their PHY, enabling its bandwidth and size, is not the kind of IP Apple would have a particular interest in re-spinning. That's the assumption I'm working with, because it is IP packaged within the existing industry standards, and that can basically slot in everywhere existing interconnects already sit.

@tenthousandthings would have better knowledge on just how specific and/or idiosyncratic proprietary solutions might get in this space, but, for the most part, where standards are present, there’s generally benefits to broad adoption.

It's true that Eliyan's "magical" interference cancellation techniques apply to both standard packaging and advanced packaging. [EE Times Europe] But they are hardly alone in making progress, and TSMC and others have also referenced major, orders-of-magnitude, breakthroughs in this area (substrates) in the past few years.

Thanks for the vote(s) of confidence, but I'm best thought of as a competent historian. That's my field of training. I think the semiconductor industry is high art, in every sense of that word. So I'm interested. I'm okay at not jumping to conclusions and I'm good at evaluating secondary sources, but I make mistakes (one recently with regard to what is a local silicon interconnect), and I'm very much NOT a trained engineer or scientist. I have one small advantage, in that I have Chinese language skills that enable me to be a little more confident in that world, when talking about China and/or Taiwan. I also participated in the AppleSeed program throughout the early years of OS X.

This is me: https://www.earlymacintosh.org/
This is also me (with friends): https://www.chinesemac.org/

treehuggerpro · Jun 10, 2024

tenthousandthings said:
It's true that Eliyan's "magical" interference cancellation techniques apply to both standard packaging and advanced packaging. [EE Times Europe] But they are hardly alone in making progress, and TSMC and others have also referenced major, orders-of-magnitude, breakthroughs in this area (substrates) in the past few years.

Thanks for the vote(s) of confidence, but I'm best thought of as a competent historian. That's my field of training. I think the semiconductor industry is high art, in every sense of that word. So I'm interested. I'm okay at not jumping to conclusions and I'm good at evaluating secondary sources, but I make mistakes (one recently with regard to what is a local silicon interconnect), and I'm very much NOT a trained engineer or scientist. I have one small advantage, in that I have Chinese language skills that enable me to be a little more confident in that world, when talking about China and/or Taiwan. I also participated in the AppleSeed program throughout the early years of OS X.

This is me: https://www.earlymacintosh.org/
This is also me (with friends): https://www.chinesemac.org/

Thanks again for the reply, and yeah, as mentioned above, I’m no tech boffin either. I have however, kept an eye on your posts because of your interest in this topic. That's what makes the forum, what it is!

name99 · Jun 10, 2024

leman said:
I am skeptical about Apple using Eliyan tech for two reasons. First, Apple never cared about UCIe and multi-chip-module standardization because they are not interested in integrating third-party dies into their solutions. Second, Apple already has comparable tech with their UltraFusion interconnect. If I understand it correctly, the main advantage of Eliyan's IP (besides standardization) is that it does not require a bridge and communicates via a standard substrate. This is a great cost saver if you need a few GB/s die-to-die interconnect and don't want to pay for advanced packaging. I am not convinced that a no-bridge solution is viable for Apple with their multi-TB/s requirements. They will likely still need a bridge. And if they need a bridge, why license some third-party tech if you already have a perfectly fine solution?

Isn't that exactly what I described in patents like https://patents.google.com/patent/US20220013504A1
The timeline is that we get a few of the RDL/stitching patents in 2018, then the embedded bridge patents a year or so later.
Looks to me like Apple wanted the RDL/stitching type solution from the start but perhaps it was too much too soon for TSMC so they went with something slightly simpler. And presumably they will return to RDL/stitching as soon as TSMC is capable of fabbing it?

Now maybe Eliyan have something particular that can make it work, that neither Apple nor TSMC have? At which point we're way beyond what any of us know in terms of either the relevant engineering capabilities or Apple business decisions.

name99 · Jun 10, 2024

leman said:
How do you know that M-series are not using error-corrected RAM? Apple hasn't given us any information about this. They have a bunch of patents describing error correction with LPDDR5, whether these things are actually being used or not — we simply have no way of knowing.

Apple do have a bunch of patents on a particular idea. All LPDDR5 has to have internal error correction, but normally they do that internally and that's the end of the story. Apple's patents are about how to propagate the info out of the DRAM (I don't know if the details of how they do this can be done on all DRAM, or they ask for a slight tweak by Micron et al) so that the OS can track the patterns of where error correction is needed and respond appropriately (which may range from giving that page of DRAM more frequent refreshes to masking it out and never using it again).

This strongly suggests that, at least at the low-end, they don't use "special" ECC DRAM beyond the normal LPDDR5.

At this point the discussion generally devolves into
- uninteresting legalistic arguments about whether this is "real ECC and/or
- totally uninformed by any sort of data claims as to whether it is or is not "good enough"
and I tune out.

leman said:
If they use the older 5nm process, manufacturing won't be a problem. The bigger constraint is RAM.

Remember that patent I pointed out regarding ranks? Even if we never see ranks in a consumer Mac...

treehuggerpro · Jun 11, 2024

Horror vacui. When I posted the initial question there was a bunch of links included for anyone curious about the subject. The intention and subsequent invitations, for the interested, was to review the material and pick apart the speculation (objectively).

The forum’s a void for tossing out such queries, right? The compulsion however, I feel, too often leans towards five second assertions without review. It’s the kind of thing, I think, should make us pause on the concerns around AI. Because, while AI is built on the premise of review and assemble, if humans can’t review, before they assemble, we’re in real trouble! We’ll end up a bunch of budgies blathering at ourselves in the mirror because we like the reflection.

Anyway, name99 and tenthousandthings have continued my faith for curiosity’s sake. So here's another stab at filling the current vacuum with a little more (reasoned and simplified) speculation. Probably with many technical considerations overlooked, so the caveat is, this is a long way from my own discipline and I happily invite further objective scrutiny.

UltraFusion provides about 2.5TB/s off of a 20 mm approx. shoreline. Please check my math and assumptions! This rounds to ~1.0 Tbps/mm (20 Terabits / 20 mm). From the linked article, the table below shows Eliyan’s interconnect against some of the other (current) protocols and puts UltraFusion’s capacity in context.

In line with @name99's post, when Apple cooked up UltraFusion they were pushing, and were able to market it as ‘4x the bandwidth of the leading . . . interconnect’ at the time, a statement they subsequently removed with the marketing of the M2 Ultra. By current standards however, UltraFusion sits around Intel’s MDFIO in the table above, and is now a relatively anaemic interconnect on Advanced Packaging. The tech in UltraFusion was conceived on 5nm and quite possibly wasn't going to meet an M3 Ultra’s requirement across a 22 mm shoreline, but I’ll leave that to the boffins.

This may however, put a more rational narrative around the lack of an interconnect on the M3 Max, and make sense of why Apple’s speculated M4 release schedule is more likely what Gurman has suggested?

For the M1 Ultra, Apple (with TSMC) needed to come up with a solution that was not readily available. That has now changed and more significantly perhaps, it has change for Standard Packaging with Eliyan’s interconnect arriving in the third quarter.

As sardonically implied (apologies), Eliyan's IP is a leap in particular for Standard Packaging, all things considered (FoM). Its bandwidth (2.0-4.0 Tbps/mm) will allow Apple to market their next step in this direction with the Hidra x2 and x4 Macs as UltraFusion 2 (at 5TB/s minimum to 10TB/s) off of a 22 mm chip shoreline. And they could also continue to market it as UltraFusion at 2.5TB/s min. (in Standard Packaging) off of an 11 mm shoreline for an x2 Brava configuration as the new Max.

So again, my speculation is that Apple wants to move to Standard Packaging for their MCMs and is waiting (in part at least) for a new interconnect (from Eliyan), because it will allow them to finally execute an x4 Mac. And because Standard Packaging is cheaper, higher yielding and faster to manufacture (according to Apple’s patent and Eliyan’s articles). They'll likely maintain the economics of their existing three chip strategy (as Gurman’s code names suggests) via the following format:

M4 (Donan)
M4 Pro (Brava)
M4 Max (2x Brava)
M4 Ultra (2x Hidra)
M4 Extreme (4x Hidra)

And the A/S transition will finally be over, Amen! (or not?)

Confused-User · Jun 12, 2024

treehuggerpro said:
Horror vacui. When I posted the initial question there was a bunch of links included for anyone curious about the subject. The intention and subsequent invitations, for the interested, was to review the material and pick apart the speculation (objectively).

The forum’s a void for tossing out such queries, right? The compulsion however, I feel, too often leans towards five second assertions without review. It’s the kind of thing, I think, should make us pause on the concerns around AI. Because, while AI is built on the premise of review and assemble, if humans can’t review, before they assemble, we’re in real trouble! We’ll end up a bunch of budgies blathering at ourselves in the mirror because we like the reflection.

Anyway, name99 and tenthousandthings have continued my faith for curiosity’s sake. So here's another stab at filling the current vacuum with a little more (reasoned and simplified) speculation. Probably with many technical considerations overlooked, so the caveat is, this is a long way from my own discipline and I happily invite further objective scrutiny.

UltraFusion provides about 2.5TB/s off of a 20 mm approx. shoreline. Please check my math and assumptions! This rounds to ~1.1 Tbps/mm (22 Terabits / 20 mm). From the linked article, the table below shows Eliyan’s interconnect against some of the other (current) protocols and puts UltraFusion’s capacity in context.

View attachment 2387827

In line with @name99's post, when Apple cooked up UltraFusion they were pushing, and were able to market it as ‘4x the bandwidth of the leading . . . interconnect’ at the time, a statement they subsequently removed with the marketing of the M2 Ultra. By current standards however, UltraFusion sits around Intel’s MDFIO in the table above, and is now a relatively anaemic interconnect on Advanced Packaging. The tech in UltraFusion was conceived on 5nm and quite possibly wasn't going to meet an M3 Ultra’s requirement across a 22 mm shoreline, but I’ll leave that to the boffins.

This may however, put a more rational narrative around the lack of an interconnect on the M3 Max, and make sense of why Apple’s speculated M4 release schedule is more likely what Gurman has suggested?

For the M1 Ultra, Apple (with TSMC) needed to come up with a solution that was not readily available. That has now changed and more significantly perhaps, it has change for Standard Packaging with Eliyan’s interconnect arriving in the third quarter.

As sardonically implied (apologies), Eliyan's IP is a leap in particular for Standard Packaging, all things considered (FoM). Its bandwidth (2.0-4.0 Tbps/mm) will allow Apple to market their next step in this direction with the Hidra x2 and x4 Macs as UltraFusion 2 (at 5TB/s minimum to 10TB/s) off of a 22 mm chip shoreline. And they could also continue to market it as UltraFusion at 2.5TB/s min. (in Standard Packaging) off of an 11 mm shoreline for an x2 Brava configuration as the new Max.

So again, my speculation is that Apple wants to move to Standard Packaging for their MCMs and is waiting (in part at least) for a new interconnect (from Eliyan), because it will allow them to finally execute an x4 Mac. And because Standard Packaging is cheaper, higher yielding and faster to manufacture (according to Apple’s patent and Eliyan’s articles). They'll likely maintain the economics of their existing three chip strategy (as Gurman’s code names suggests) via the following format:

M4 (Donan)
M4 Pro (Brava)
M4 Max (2x Brava)
M4 Ultra (2x Hidra)
M4 Extreme (4x Hidra)

And the A/S transition will finally be over, Amen! (or not?)

I read the article on nextplatform when you first posted it.

Assuming Eliyan can actually deliver, they might be an attractive option for Apple... except, as I said yesterday, Apple has shown an *intense* desire to bring everything in-house. Will they look at this as a physical packaging thing (which they're happy to contract out, at least to TSMC), and so a plausible exception to that strategy? Maybe. I'm not convinced, though I don't think it's impossible.

A minor nitpick: 2.5TBps = 20tbps. I don't know if signaling overhead, FEC, etc. is included in that number, but it's the only number we've got. So you're looking at 1tb/s/mm. I am not sure that there is a definitive answer about whether it's 20tbps in each direction, or summed.

I wonder why you think Apple is stuck with that performance level. They built that more than two years ago. In the meantime, nVidia has recently started shipping an interconnect that at 10TBps is twice as fast as Apple's (or four times, depending on whether or not that 20tbps figure is unidirectional). It's hard to imagine nVidia doing a substantially better job than Apple on this- they're both working with similar building blocks, iso-process, and Apple is in fact on a newer process- though it's not actually clear to me that the process matters much for TSMC's packaging tech.

Given Apple's importance to TSMC, it would be pretty surprising if they *weren't* collaborating on newer versions of UltraFusion. So if I had to bet, I'd still bet that Apple's going to do their own thing, relying on TSMC to some extent.

Of course I've bet wrong a few times recently (I bet on new Studios at WWDC, though I was clear that was a low-confidence bet). Could be my streak will continue. :-/

treehuggerpro · Jun 12, 2024

Confused-User said:
I read the article on nextplatform when you first posted it.

Assuming Eliyan can actually deliver, they might be an attractive option for Apple... except, as I said yesterday, Apple has shown an *intense* desire to bring everything in-house. Will they look at this as a physical packaging thing (which they're happy to contract out, at least to TSMC), and so a plausible exception to that strategy? Maybe. I'm not convinced, though I don't think it's impossible.

A minor nitpick: 2.5TBps = 20tbps. I don't know if signaling overhead, FEC, etc. is included in that number, but it's the only number we've got. So you're looking at 1tb/s/mm. I am not sure that there is a definitive answer about whether it's 20tbps in each direction, or summed.

I wonder why you think Apple is stuck with that performance level. They built that more than two years ago. In the meantime, nVidia has recently started shipping an interconnect that at 10TBps is twice as fast as Apple's (or four times, depending on whether or not that 20tbps figure is unidirectional). It's hard to imagine nVidia doing a substantially better job than Apple on this- they're both working with similar building blocks, iso-process, and Apple is in fact on a newer process- though it's not actually clear to me that the process matters much for TSMC's packaging tech.

Given Apple's importance to TSMC, it would be pretty surprising if they *weren't* collaborating on newer versions of UltraFusion. So if I had to bet, I'd still bet that Apple's going to do their own thing, relying on TSMC to some extent.

Of course I've bet wrong a few times recently (I bet on new Studios at WWDC, though I was clear that was a low-confidence bet). Could be my streak will continue. :-/

Ahh thanks, I took that from the 2.5 Tebibytes line of an online calculator. Fixed.

I think Steve Jobs core philosophy was for vertical integration “around the user experience”. That philosophy has led Apple along its software / hardware path and model. But as a business, I’d assume Apple is as omnivorous as any other company for opportunities wherever they exist. Where in-house control serves their core philosophy, and the dollars stack up, always, and when an alternative is available that serves their philosophy just the same . . .

Apple isn’t ‘stuck’ at a ‘performance level’, I was filling in the details of leman’s ‘perfectly fine solution’ statement. The speculation is, that a combination of Apple’s x4 ambitions and various timing have led to the current, rather disjointed, release schedule.

They probably could have retooled and timed the M3 Max for TSMC/GUC’s GLink (5Tbps/mm) on Advanced Packaging again without too much hassle (though I have no real clue), but they didn’t. These are typically guarded proprietary decisions and Apple's extreme case of “we’ll never know” is a PITA.

Apple's x4 configuration is too big for the reticle limit of a silicon interposer. And the capabilities of Eliyan’s interconnect are a breakthrough for Standard Packaging, so they’ve jumped. That’s about it. (pure speculation)

name99 · Jun 12, 2024

Confused-User said:
I read the article on nextplatform when you first posted it.

Assuming Eliyan can actually deliver, they might be an attractive option for Apple... except, as I said yesterday, Apple has shown an *intense* desire to bring everything in-house. Will they look at this as a physical packaging thing (which they're happy to contract out, at least to TSMC), and so a plausible exception to that strategy? Maybe. I'm not convinced, though I don't think it's impossible.

A minor nitpick: 2.5TBps = 20tbps. I don't know if signaling overhead, FEC, etc. is included in that number, but it's the only number we've got. So you're looking at 1tb/s/mm. I am not sure that there is a definitive answer about whether it's 20tbps in each direction, or summed.

I wonder why you think Apple is stuck with that performance level. They built that more than two years ago. In the meantime, nVidia has recently started shipping an interconnect that at 10TBps is twice as fast as Apple's (or four times, depending on whether or not that 20tbps figure is unidirectional). It's hard to imagine nVidia doing a substantially better job than Apple on this- they're both working with similar building blocks, iso-process, and Apple is in fact on a newer process- though it's not actually clear to me that the process matters much for TSMC's packaging tech.

Given Apple's importance to TSMC, it would be pretty surprising if they *weren't* collaborating on newer versions of UltraFusion. So if I had to bet, I'd still bet that Apple's going to do their own thing, relying on TSMC to some extent.

Of course I've bet wrong a few times recently (I bet on new Studios at WWDC, though I was clear that was a low-confidence bet). Could be my streak will continue. :-/

(a) When I did the math on Blackwell's nvLink, I may have got it incorrect BUT my conclusion is that an individual nvLink is behind UltraFusion. However Blackwell implements a number of such links. That's how they hit a larger aggregate bandwidth.
The equivalent would be say a future M4-whatever implementing 3 UltraFusion links so that it could, in principle, connect to three other such chips to form an Extreme.

This is not a dick-measuring contest. nV has different goals (in particular scaling up to many links on a single chip; and going long distances) that differ from Apple's goal and result in a different solution.

(b) I think it's unhelpfully dogmatic to say "Apple has shown an *intense* desire to bring everything in-house". I'd say it's more Apple has a desire to do things correctly (by their lights). When doing so in house is the only option, that will be done. But they farm out most of manufacturing to outsiders - having established that they can work with those outsiders to do things as Apple wishes. Up till now (of course this is starting to change as we saw in the discussion re Apple Intelligence) they've been happy to use outside clouds for at least some purposes. Similarly LLVM is an essential part of Apple, but is somewhat outside; and even Swift is kinda sorta partially outside. Or the "partnerships" with Google and now OpenAI.

I'd say that if Eliyan offer functionality that Apple wants AND the cheapest way to get that functionality is through them rather than replicating in-house, they will do so. Why not? How's it different from using TSMC, or Micron, or LG?

M4+ Chip Generation - Speculation Megathread [MERGED]

macrumors regular

macrumors 6502a

macrumors 68000

macrumors Core

macrumors 6502a

macrumors regular

macrumors 65816

Contributor

macrumors 603

macrumors 6502a

macrumors regular

macrumors 6502a

macrumors Core

macrumors regular

macrumors Core

macrumors 603

macrumors regular

Contributor

macrumors regular

macrumors 68030

macrumors 68030

macrumors regular

macrumors 6502a

macrumors regular

macrumors 68030

Our Staff