Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

tenthousandthings

Contributor
May 14, 2012
276
323
New Haven, CT
If we get new macs in oct they may as well be M5 series? And then M5 Ultra for Studio and Pro next summer?
No, really not possible. I didn’t mean to suggest that. N3P won’t be in volume production in time for anything like that. Maybe in time for the Ultra to skip M4 and go straight to M5, but that would be unprecedented, to say the least.

Seems just as likely that M4 Ultra will arrive via InFO (so UltraFusion 1.1, not 2.0) advanced packaging on N3E sooner (in March 2025) rather than later, and this 3D stuff is still a couple of years away from appearing in a product.

(edited for clarity)
 
Last edited:

treehuggerpro

macrumors regular
Oct 21, 2021
111
124
Excellent find. Admirable restraint on your part. Easy to leap to conclusions, especially when combined with the two January 10 news releases, which appear to line up with this. Could just be science news, and not directly related to their customers, but the specifics, like you say, are "promising." . . .

Yes, sweet sounding science / validation / quality control processes etc.


. . . UltraFusion 2.0, here we come?

Yes again. The speculation being Apple will (most likely) take a forward leaning position, by jumping directly to the bandwidth benefits and interposer freedom of the UMI Standards.

The industry narrative, is the “chiplet economy” / era has arrived. The underpinning for making this economy productive, is the standardisation of inter-die / memory communications. Perhaps the best way to view these Standards; is the push for a “chiplet economy” needed coherence and for the tech to rapidly evolve, with Eliyan stepping into the breach:

UCIe –> BoW –> UMI –> UMI-SBD

Standards.jpg


• The UCIe protocol (gifted by Intel) formed the “unidirectional” base and is progressing with its own backward compatibility. Incidentally, the signal / data rate per lane doubling from 16 to 32 in the 2024/01/10 GUC press release you linked was, IIUC, Eliyan’s contribution to the UCIe standard.

• The BoW Standard (also created by the Eliyan founder Ramin Farjadrad) is the next (bidirectional) evolution and backward compatible with UCIe.

• And the UMI Standards, which are Eliyan's and described by them as a superset of UCIe and BoW, are backward compatible with both UCIe and BoW.

All of this, together, could mean in 2025 we will once again be treated to Apple being a couple of years ahead of everyone else. . . .

And absolutely Yes. I believe Apple is getting out in front and future proofing, (ultimately) with a leap to the UMI Standards and CoWoS-R interposers. Well, here's hoping anyway!
 
Last edited:
  • Like
Reactions: tenthousandthings

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,866
• The UCIe protocol (gifted by Intel) formed the “unidirectional” base and is progressing with its own backward compatibility. Incidentally, the signal / data rate per lane doubling from 16 to 32 in the 2024/01/10 GUC press release you linked was, IIUC, Eliyan’s contribution to the UCIe standard.

• The BoW Standard (also created by the Eliyan founder Ramin Farjadrad) is the next (bidirectional) evolution and backward compatible with UCIe.

• And the UMI Standards, which are Eliyan's and described by them as a superset of UCIe and BoW, are backward compatible with both UCIe and BoW.



And absolutely Yes. I believe Apple is getting out in front and future proofing, (ultimately) with a leap to the UMI Standards and CoWoS-R interposers. Well, here's hoping anyway!
I want to caution you against reading too much into all the things you've been researching, because it honestly feels like you don't understand enough about these technologies to see that they don't fit as a successor to what Apple built in M1 and M2 Ultra.

Specifically, protocols like UCIe and Eliyan's technology are based on high speed SERDES (short for serializer-deserializer). That means wide and (relatively) slowly-clocked on-chip data links get gearboxed to narrow-and-fast for transmission across the chip-to-chip connection. This greatly improves bandwidth per pin, and can sometimes reduce energy per bit, but costs a lot of latency since everything has to flow through p2s and s2p gearboxes (parallel/serial conversions) on each end of a high speed serial link.

Ultra Fusion is laser focused on keeping latency as low as possible. Apple's goal was to pair two SoCs together so tightly that, as far as software is concerned, it behaves almost as if they'd built it as a single monolithic device. They couldn't totally prevent non-uniform access latencies when crossing the Ultra Fusion link, but they wanted the latency penalty to be as low as possible. That's why Apple built the link as a ~10,000 pin connection spanning the entire width of one edge of the die - they just brute forced it, using absurdly huge link width to enable very high bandwidth without needing to gearbox to a higher bit rate.

Since Apple is uninterested in optimizing for bandwidth per pin if it costs them latency, to the point that they indirectly emphasized it in their Ultra Fusion PR, I think you're leading yourself down a blind alley. Don't get me wrong, it's possible that Apple will decide to reverse course on this, but it seems unlikely.
 

MayaUser

macrumors 68040
Nov 22, 2021
3,177
7,196
So its almost for sure an smaller mac mini with M4 and M4 Pro is coming
So i guess we get M4 this year, and M5 next one..still Apple is on the march
 

treehuggerpro

macrumors regular
Oct 21, 2021
111
124
I want to caution you against reading too much into all the things you've been researching, because it honestly feels like you don't understand enough about these technologies to see that they don't fit as a successor to what Apple built in M1 and M2 Ultra.

Specifically, protocols like UCIe and Eliyan's technology are based on high speed SERDES (short for serializer-deserializer). That means wide and (relatively) slowly-clocked on-chip data links get gearboxed to narrow-and-fast for transmission across the chip-to-chip connection. This greatly improves bandwidth per pin, and can sometimes reduce energy per bit, but costs a lot of latency since everything has to flow through p2s and s2p gearboxes (parallel/serial conversions) on each end of a high speed serial link.

Ultra Fusion is laser focused on keeping latency as low as possible. Apple's goal was to pair two SoCs together so tightly that, as far as software is concerned, it behaves almost as if they'd built it as a single monolithic device. They couldn't totally prevent non-uniform access latencies when crossing the Ultra Fusion link, but they wanted the latency penalty to be as low as possible. That's why Apple built the link as a ~10,000 pin connection spanning the entire width of one edge of the die - they just brute forced it, using absurdly huge link width to enable very high bandwidth without needing to gearbox to a higher bit rate.

Since Apple is uninterested in optimizing for bandwidth per pin if it costs them latency, to the point that they indirectly emphasized it in their Ultra Fusion PR, I think you're leading yourself down a blind alley. Don't get me wrong, it's possible that Apple will decide to reverse course on this, but it seems unlikely.

Hi mr_roboto, in the end, whatever Apple gives us, is what we get. The point of the discussion, the circumstance, tap-tap-tap, is we are patiently awaiting that eventuality.

No question, latency among the other spec’s discussed and documented throughout the links are critical considerations. There is a documented 1ns (end to end) penalty involved with moving from a Silicon to RDL interposer presented for the new Standards on the previous page. And, indeed, much of what you’ve just raised on µbumps and traces etc, was addressed in recent links.

If you go back through the posts however, you will find this discussion was initially posed as a question on the delays and relative timelines. My line of investigation has simply been to follow my own design heuristic; find the critical constraints, and work back in search of addressable parameters.

Apple’s manufacture and design approaches are researchable. Eliyan is a startup with a good body of accessible material on their interconnect tech. This is the primary material my posts are based upon, just follow the links.

With the x4 Mac still mythical and the current x2 Mac MIA, why not review the material and give us your objective evaluation? Consider my ignorance a canvas for your findings . . .
 
Last edited:

treehuggerpro

macrumors regular
Oct 21, 2021
111
124
I assume you will not be giving us a review @mr_roboto?

Getting acquainted with the two key advances here, RDL interposers and much improved PHYs, requires a small investment in time. These insanely small structures of insanely dense signal delivery are not as abstract as many might imagine though. Think of them as well serviced high-rise buildings collapsed into a µm layered pancake. This is the limit of the subject matter, signal (service) delivery with ample overhead to connect four SoCs as arranged in Apple’s x4 Patent.

My previous post was nudging you to look at the realities of the current and incoming Standard’s for their Latency and Bandwidth specifications. The point of introducing Standards and Standard Specifications for the chiplet economy is that they set minimum performance requirements (as a baseline) for any product manufactured to meet the specification. The two relevant tables on the previous page provide the (theoretical) Bandwidths and worst case scenarios for Latency (defined as <2ns (Tx + Rx) in the UCIe Standard). Aside from the impact of ≤25 mm between Dies (~1x Die Width), Latency does not differ between Silicon and RDL interposers in the spec.

It’s not clear what your proportional Latency / Bandwidth narrative was based upon. Perhaps you were extrapolating and/or scripting over the historical motherboard reference in Apple’s M1 Ultra statement?

To build M1 Ultra, the die of two M1 Max are connected using UltraFusion, Apple’s custom-built packaging architecture. The most common way to scale performance is to connect two chips through a motherboard, which typically brings significant trade-offs, including increased latency, reduced bandwidth and increased power consumption. However, Apple’s innovative UltraFusion uses a silicon interposer that connects the chips across more than 10,000 signals, providing a massive 2.5TB/s of low-latency, inter-processor bandwidth — more than 4x the bandwidth of the leading multi-chip interconnect technology. This enables M1 Ultra to behave and be recognised by software as one chip, so developers don’t need to rewrite code to take advantage of its performance.

Apple achieved a 2Gbps / Lane (20000Gbps / 10000) bandwidth through the PHY used in UltraFusion. They state above that this was “4x the bandwidth of the leading multi-chip interconnect technology,” of the M1 Ultra’s day / design timeframe. The reference rate Apple used would equate to 0.5Gbps / Lane. The available Data Rate / Lane within the UCIe standard is now at 32Gbps / Lane (16x UltraFusion/Lane) (UMI = 32x UltraFusion/Lane).

Just so there is no confusion, Latency is Latency and Bandwidth is Bandwidth under the Standards. If either one were to shift or fall outside these operational requirements, the PHY would not be considered compliant with the specification.
 
Last edited:

Confused-User

macrumors 6502a
Oct 14, 2014
852
986
So I know less about this that @mr_roboto. I haven't looked closely at UltraFusion, but as I understand it, Apple's not using any kind of SerDes, and really not anything you'd call a PHY at all. (I could easily be wrong about this though and welcome correction.) That makes for a fundamental difference with any standard that implements one, as it entirely eliminates most of the delay (the serialization, and then the reverse) you get with that solution. As I said I really don't know what it looks like but I'm guessing it's the minimal solution you need to get enough power to talk across that link. No link-specific error correction, etc.
 

treehuggerpro

macrumors regular
Oct 21, 2021
111
124
So I know less about this that @mr_roboto. I haven't looked closely at UltraFusion, but as I understand it, Apple's not using any kind of SerDes, and really not anything you'd call a PHY at all. (I could easily be wrong about this though and welcome correction.) That makes for a fundamental difference with any standard that implements one, as it entirely eliminates most of the delay (the serialization, and then the reverse) you get with that solution. As I said I really don't know what it looks like but I'm guessing it's the minimal solution you need to get enough power to talk across that link. No link-specific error correction, etc.

Hi Confused, I’ve put a dimension on the M1 Ultra’s µbump pitch in the image below. Apple’s x4 Patent also references a 25µm pitch, so TSMC did something amazing on that front for UltraFusion. This is a smaller pitch than I’ve seen documented anywhere else and I assume the PHY was also uniquely tailored to achieve this density.

PHY is short for physical layer, this image may help to explain the reality of that definition. Nowhere have I seen it suggested you can push an internal (nm conditions) signal off die like that, these would be very different signal conditions, mediums and/or requirements I'd suggest, but that’s the limit of my knowledge.


M1 XSection.jpg
 

Confused-User

macrumors 6502a
Oct 14, 2014
852
986
Nowhere have I seen it suggested you can push an internal (nm conditions) signal off die like that
This is exactly what I was talking about - I was under the impression that they had done that, or very close, skipping the usual convolutions necessary to generate a signal strong enough to (for example) control DRAM or drive a PCIe bus.
 

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,866
I assume you will not be giving us a review @mr_roboto?
Not sure what that would accomplish. I feel no need to go off into the weeds analyzing UCIe or UMI in detail when my point is that if you're Apple, you probably don't want a PHY or SERDES at all.

Getting acquainted with the two key advances here, RDL interposers and much improved PHYs, requires a small investment in time. These insanely small structures of insanely dense signal delivery are not as abstract as many might imagine though. Think of them as well serviced high-rise buildings collapsed into a µm layered pancake. This is the limit of the subject matter, signal (service) delivery with ample overhead to connect four SoCs as arranged in Apple’s x4 Patent.

My previous post was nudging you to look at the realities of the current and incoming Standard’s for their Latency and Bandwidth specifications.
My post was trying to gently nudge you into understanding that while you're doing an admirable job of looking information up, your lack of background in these topics has caused you to go down a blind alley.

You linked to that Apple patent. You'd do well to carefully read through 0020 and 0021. 0020 describes why many large MCMs have been designed using high speed SERDES for die-to-die interconnect. 0021 contrasts that with what Apple is trying to patent, a different approach which instead focuses on high wire counts with reduced data rates on each wire.

Apple achieved a 2Gbps / Lane (20000Gbps / 10000) bandwidth through the PHY used in UltraFusion. They state above that this was “4x the bandwidth of the leading multi-chip interconnect technology,” of the M1 Ultra’s day / design timeframe. The reference rate Apple used would equate to 0.5Gbps / Lane. The available Data Rate / Lane within the UCIe standard is now at 32Gbps / Lane (16x UltraFusion/Lane) (UMI = 32x UltraFusion/Lane).
I think you've confused yourself again. Apple's M1 Ultra marketing copy says:

However, Apple’s innovative UltraFusion uses a silicon interposer that connects the chips across more than 10,000 signals, providing a massive 2.5TB/s of low latency, inter-processor bandwidth — more than 4x the bandwidth of the leading multi-chip interconnect technology.
4x refers to the performance of the interconnect as a whole, not individual signals. They're claiming that the best multi-chip interconnect prior to UltraFusion provided less than 0.625 TB/s of total interconnect bandwidth.

There's no way Apple was instead bragging about achieving 2 Gbps per wire, because that's not notable at all. After all, PCIe 2.0 hit 5 Gbps per pair (2.5 Gbps per wire) way back in 2007. UltraFusion's claim to fame is "more than 10,000 signals". That's a lot! I feel that you don't understand just how exceptional that number is, and how it affects everything else, such as Apple's decision to clock slow so they don't have to use complex PHYs or a SERDES.
 

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,866
PHY is short for physical layer, this image may help to explain the reality of that definition. Nowhere have I seen it suggested you can push an internal (nm conditions) signal off die like that, these would be very different signal conditions, mediums and/or requirements I'd suggest, but that’s the limit of my knowledge.
Chip designers routinely drive millimeter scale on-die wires with nothing more than ordinary logic gate output structures, and maybe the occasional repeater if the wire is really long or drives a lot of loads. Why would you need much more than a slightly upsized version of that (stronger drive) for off-die, if you're not pushing high frequencies and the channel consists of one transmitter and one receiver?
 

treehuggerpro

macrumors regular
Oct 21, 2021
111
124
. . . these technologies . . . don't fit as a successor to what Apple built in M1 and M2 Ultra.

I agree.

The question, and discussion, from the outset was around Apple moving on to new processes in order to meet their x4 ambitions. And it was stated early, not “necessarily” the Silicon bridge for x2 configs, but the argument goes; RDL interposes have lower costs, higher yields and reduce production times, so the maths would be in favour.

There’s been no suggestion about Apple not wishing to maintain similar signal densities (µm pitch + traces) when moving to a new UltraFusion. As said, this thread topic was motivated by the RDL interposer in the x4 patent. CoWoS-R provides what Apple specified.

The second requirement of moving away from Silicon Interposers for larger MCM configs, is signal integrity over distance. The UMI Standard / Tech, being discussed as a breakthrough for this reason, is the first PHY to provide low power, low latency and high bandwidth over RDL distances (≤25 mm).

Collectively, this is the capacity Apple needs for an x4, and to move ahead with their MCM requirements. If they use the UMI Standard, they will configure it however they require. The point is, UMI and CoWoS-R provide Apple enormous headroom to work with.

This is just a rehash, for which CoWoS-R is a recent part. With your stated disinterested in reviewing the posts, subject matter or links, you chose, rather, to review a random poster instead. What use is this, without context or any informed reference of the subject?

This ardent, seemingly threatened response, to the suggestion of Apple moving up and on from existing processes is hard to fathom, isn’t that what Apple does?

The Gbps / Lane breakdown was simply because your narrative focused on UltraFusion's 10000 pins. 2.5TB/s = 20000Gb/s (20000/10000) = 2Gb/s. This allowed anyone reading to assess UltraFusion next to the current Standards in your 10000 pin / Lane terms.

And finally, the No-Physical-Layer-Interface / PHY needed argument, or theory you guys are working on, what can we say . . . , I suggest you patent it. PHYs are already a multi-billion dollar industry and the chiplet economy is coming.

I stand corrected . . . Apply with Apple. 👀
 
Last edited:

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,866
The second requirement of moving away from Silicon Interposers for larger MCM configs, is signal integrity over distance. The UMI Standard / Tech, being discussed as a breakthrough for this reason, is the first PHY to provide low power, low latency and high bandwidth over RDL distances (≤25 mm).
You didn't read the patent like I suggested, did you? It explicitly says that the idea is to avoid using complex PHY and SERDES even for relatively long distance connections.

Collectively, this is the capacity Apple needs for an x4, and to move ahead with their MCM requirements. If they use the UMI Standard, they will configure it however they require. The point is, UMI and CoWoS-R provide Apple enormous headroom to work with.
Look, part of the problem here is you just fundamentally don't understand much more than the surface level "standard X promises Y bandwidth per lane", and that leads you to hyperfocus on inappropriate technologies. UMI is a memory interface standard. It's not something Apple would use to build the interconnect used to glue multiple Apple-designed SoCs into a single virtual SoC. UMI wasn't designed to do that, it would be a very poor match. At best Apple might borrow the physical layer, if they were interested in using high speed SERDES, but as I keep trying to get across to you, they probably aren't interested in doing that.

Same comment on UCIe, which is essentially just "what if PCIe/CXL was tweaked for use in chiplets rather than boards". Please, please, please, if nothing else, stop thinking that Apple's ever going to be interested in replacing UltraFusion with these two technologies specifically. They weren't designed to do what UltraFusion does, they're a very poor match.

This is just a rehash, for which CoWoS-R is a recent part. With your stated disinterested in reviewing the posts, subject matter or links, you chose, rather, to review a random poster instead. What use is this, without context or any informed reference of the subject?
Why am I obligated to do exactly what you want me to? I'm "reviewing" your posts because they're full of strange ideas that aren't likely to be fruitful, and I'm trying to push back on that.

This ardent, seemingly threatened response, to the suggestion of Apple moving up and on from existing processes is hard to fathom, isn’t that what Apple does?
I am not threatened, I am annoyed. Wouldn't you be annoyed if you saw someone obsessively promoting ideas you know are wrong, using language that indicates they're not really familiar with the field?

The Gbps / Lane breakdown was simply because your narrative focused on UltraFusion's 10000 pins. 2.5TB/s = 20000Gb/s (20000/10000) = 2Gb/s. This allowed anyone reading to assess UltraFusion next to the current Standards in your 10000 pin / Lane terms.

And finally, the No-Physical-Layer-Interface / PHY needed argument, or theory you guys are working on, what can we say . . . , I suggest you patent it. PHYs are already a multi-billion dollar industry and the chiplet economy is coming.
My eyes are rolling so hard. "the chiplet economy is coming", LOL. Seriously?

SERDES+PHY is old technology which predates the rise of chiplets by decades. Apple understands it very well. I occasionally check their job postings to see if there's something I'm interested in and they often have a position or two related to SERDES or physical layer. That's not a recent development at all, they were doing it long before M1. If they had wanted to use SERDES for UltraFusion, they could have.

The reason why I keep bringing up UltraFusion's 10K pins and pointing you at the actual contents of Apple's patent is that these things tell you a lot about their design philosophy: they're willing to take the cost hit of advanced packaging with extreme wire density to achieve high bandwidth by using lots of lanes running at a very low speed per lane.

The need for a complex PHY is a function of speed, distance, and the characteristics of the transmission medium. If you keep the bit rate low and the distance short (and no, 25mm is not necessarily long) and the signal path clean, you don't need anything complicated. And if you keep the speed low enough, you don't need a SERDES either. Avoiding these things has benefits! Complex SERDES+PHY doesn't just add latency, it adds die area and power. Granted, Apple takes an area hit because they have to terminate 10K connections, but the UltraFusion beachfront visible in die photos doesn't seem too large, all things considered.

I'm not going to patent the idea because Apple already did! That's the patent you linked. Like many patents, it's not anything revolutionary, and probably ought to be denied as something that's obvious to anyone reasonably well versed in the field, but that seldom stops the USPTO from rubberstamping.
 

Confused-User

macrumors 6502a
Oct 14, 2014
852
986
There's no way Apple was instead bragging about achieving 2 Gbps per wire, because that's not notable at all. After all, PCIe 2.0 hit 5 Gbps per pair (2.5 Gbps per wire) way back in 2007. UltraFusion's claim to fame is "more than 10,000 signals". That's a lot! I feel that you don't understand just how exceptional that number is, and how it affects everything else, such as Apple's decision to clock slow so they don't have to use complex PHYs or a SERDES.
Isn't it more than 2gbps, because a lot of those wires would be ground? (Not that I think that challenges your argument, I just keep seeing this figure and I think it's off.)

Anyway... my read on this all along has been that they spent shoreline to buy reduced latency and energy. If that's right, then going with Eliyan or similar tech would be a complete reversal. I could sorta-maybe imagine them doing that, if for example they needed to connect four chiplets together without some bridge chiplet (like the "I/O die" in AMD's world) and just ran out of shoreline entirely. But that seems like an unlikely choice for many reasons.
 
  • Like
Reactions: caribbeanblue

tenthousandthings

Contributor
May 14, 2012
276
323
New Haven, CT
So basically it looks like neither of the recently news-worthy Cadence Design Systems advanced-packaging collaborations with TSMC/GUC for SerDes signaling (April 2023) and wafer-on-wafer 3D-IC stacking (January 2024) are realistic candidates for having anything to do with "UltraFusion 2.0" (scare quoting my earlier, speculative use of this), barring a major sea change by Apple, away from chip-first packaging, toward chip-last packaging. Is that correct, in layman's terms?
 

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,866
Isn't it more than 2gbps, because a lot of those wires would be ground? (Not that I think that challenges your argument, I just keep seeing this figure and I think it's off.)
Good point, as there's almost certainly some ground shielding to try to keep crosstalk under control. However, as Apple describes it as 10,000 signals, I've been interpreting it as 10K signal wires plus some unknown amount of ground. You wouldn't describe a ground wire as a signal wire, after all.
 

Confused-User

macrumors 6502a
Oct 14, 2014
852
986
Good point, as there's almost certainly some ground shielding to try to keep crosstalk under control. However, as Apple describes it as 10,000 signals, I've been interpreting it as 10K signal wires plus some unknown amount of ground. You wouldn't describe a ground wire as a signal wire, after all.
I wouldn't, but who knows what game of telephone gets played between engineers and marketers?

So since I have your attention :), next question... is differential signaling a thing, at that level? I was wondering, if you have the shoreline to burn (and it's not clear to me at all one way or the other - how many "wires" can come out per mm?), maybe that's one easy way to maintain signal integrity for a short trip off-chip. If so maybe there's 20,000 signals! Or alternatively (again, think telephone games) the data rate is 4gbps and there's 5k different signals, each carried on two wires.
 

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,866
So since I have your attention :), next question... is differential signaling a thing, at that level? I was wondering, if you have the shoreline to burn (and it's not clear to me at all one way or the other - how many "wires" can come out per mm?), maybe that's one easy way to maintain signal integrity for a short trip off-chip. If so maybe there's 20,000 signals! Or alternatively (again, think telephone games) the data rate is 4gbps and there's 5k different signals, each carried on two wires.
Differential signaling can be a thing anywhere you want it to be. :) It's just a question of whether it makes sense in that context.

Generally speaking the purpose of differential is to improve noise immunity. The basic idea is that by routing the diff pair's two wires together, always keeping them very close to each other and making sure that one side of the pair isn't longer than the other, any impinging electromagnetic noise should affect both the positive and negative signal equally. If you then construct the receiver on the other end of the channel to look at the difference between the two wires, it doesn't matter if noise makes both of them jump the same way - the receiver will still see the correct difference and correctly interpret it as a zero or one.

My guess is that Apple went single-ended on UltraFusion. My handwavy justification is that they kept the signal paths real short and had total control over the geometry of the connections, proximity of neighboring aggressor signals, amount of shielding, and so on. If line rate is low enough and signal integrity good enough to make single ended signaling work, it's the obvious best choice - half the connections.

Standards like UCIe and UMI are based on differential signaling. That's one of the prices you pay for extremely high speed: it becomes difficult or impossible to make single-ended signaling work.
 

Confused-User

macrumors 6502a
Oct 14, 2014
852
986
You know, thinking about this more... I'm not sure we're even close on this.

What exactly is going across all those links? I mean, it's going to be, I imagine, the NoC, and possibly large (I figure 512 bit, one cache line) direct paths between L2 caches, and... what else? Equally large paths between the GPUs- likely between the GPU common areas? I'm having a hard time figuring what 10000 or even 5000 signals are for. And conversely, wouldn't those signals need to move well faster than 2gbps (or even 4gbps) to not lose cycles on every transaction?

That would suggest that there *is* a thing vaguely like a SerDes on each side, but used in the opposite manner to what you'd normally expect. Call it a ParDep :). That is, they're taking serial signals (each internally perhaps going at, for example, 6 GHz), and distributing them across multiple links round-robin-every-cycle. So if the Fusion links are in fact going at 2GHz, then for 6GHz intra-chip signalling you'd need 3 Fusion wires for each internal wire. You'd take a 2-cycle latency penalty (3c total) per transit across UltraFusion, but you wouldn't ever stall because there would always be an available wire. (Yes I expect I'm oversimplifying.)

Or am I wildly off base and missing something fundamental?
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.