Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,677
An interesting post on RTF (https://www.realworldtech.com/forum/?threadid=205830&curpostid=205830) is discussing how Apple might be manufacturing the Ultra. Now, I have very little clue about chip technology, but from what I gather, a core challenge for modern multi-chip technologies is packaging multiple chips together. What is commonly used is an interposer - an additional, larger, simpler chip that acts like a sort of a communication network and on top of which the actual chips are mounted. This works, but is expensive and not very space-efficient. The post describes various state of the art technologies in great details and speculates that Apple might be using a different approach that has its roots in packaging of mobile chips. If I understand it correctly, it is argued that they are using some smart tricks to precisely position chips on a carrier wafer and lock them together, which essentially creates a new “fake wafer” with chip innards exposed. They can then use more or less conventional technique to put an additional interconnect level over this “fake wafer” - just as it is normally done with any chips (this is called BEOL). The result is a tightly integrated, compact structure that has much better signal efficiency and electrical properties than interposer technology, while also being much cheaper as you don’t need to manufacture the extra interposer chip. This is literally a “monster chip” built from multiple chips, bypassing yield issues and other problems.

If this speculation is correct, then it spells out very good news for Apple Silicon. This is a very impressive technology that is inherently very scalable. In the future Apple could probably use it more aggressively to build up very flexible configurations from individual dies. The thing is, the chips connected this way don’t need to be identical. Imagine smaller building blocks: CPU/GPU clusters, cache blocks etc., using smaller dies with high yields, being packaged together on demand and glued into one big SoC. One could build some truly monstrous systems without incurring the high cost and suffering from low yields usually associated with them.

P.S. the author of the RTW Post is probably Maynard Handley who previously published a lot of in-depth information on Apple Silicon, most notably this insanely cool investigation. He is a member here on MR with the handle name99.
 
  • Like
Reactions: ASentientBot

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,677
I wish Apple would explain their SOCs in as much detail as Nvidia did with the NVIDIA HGX H100.

To be honest, I don’t see any more information in the Nvidia product Brief than what Apple tells us in their presentation. Sure, Nvidia’s stuff looks more detailed, but it’s because the hardware has more complexity. Apple doesn’t need to connect multiple discrete GPUs together, they don’t support FP64, and their architecture is in general “flatter” and more streamlined. But they did offer some details on how the SoC works, including the basic logical architecture of the GPU cores. We do know for example that an Apple GPU core consists of four 1024-bit wide compute units each capable of executing an independent instruction stream.
 

thenewperson

macrumors 6502a
Mar 27, 2011
992
912
Is there a way to view that forum better? The whole 1 post/comment per page thing is so wasteful.
 

repoman27

macrumors 6502
May 13, 2011
485
167
I'm pretty sure the M1 Ultra is made using TSMC CoWoS-L.

Apple clearly states that "UltraFusion uses a silicon interposer that connects the chips" and their illustration shows two M1 Max dies coming together over what certainly appears to be a Local Silicon Interconnect (LSI) bridge. TSMC has developed both InFO and CoWoS packaging technologies incorporating LSI. The key distinction between the two is that InFO is chip-first, and CoWoS is chip-last. InFO starts with building a reconstituted wafer by placing known good dies (KGDs) on a carrier and then adds redistribution layers (RDL) for fanout and optionally LSI bridges. With CoWoS-L, a reconstituted wafer is built up by placing the LSI bridges on the carrier before filling and adding RDL, then the KGDs are bonded to the top of the stack. Apple has been using InFO_PoP for the A series chips since the A10, but with two dies as big as the M1 Max (18.26 mm x 21.36 mm = 390 mm²), a chip-last technology is the only feasible option. This is true both from an economic / yields standpoint and because InFO-L doesn't scale beyond 1x reticle size yet (26 mm x 33 mm = 858 mm²).

TechInsights also discussed this a bit in a recent blog post which included some images showing the bump pitch for the UltraFusion link, which at 25 µm matches that of TSMC LSI.
 

Gnattu

macrumors 65816
Sep 18, 2020
1,107
1,671
I'm pretty sure the M1 Ultra is made using TSMC CoWoS-L.

Apple clearly states that "UltraFusion uses a silicon interposer that connects the chips" and their illustration shows two M1 Max dies coming together over what certainly appears to be a Local Silicon Interconnect (LSI) bridge. TSMC has developed both InFO and CoWoS packaging technologies incorporating LSI. The key distinction between the two is that InFO is chip-first, and CoWoS is chip-last. InFO starts with building a reconstituted wafer by placing known good dies (KGDs) on a carrier and then adds redistribution layers (RDL) for fanout and optionally LSI bridges. With CoWoS-L, a reconstituted wafer is built up by placing the LSI bridges on the carrier before filling and adding RDL, then the KGDs are bonded to the top of the stack. Apple has been using InFO_PoP for the A series chips since the A10, but with two dies as big as the M1 Max (18.26 mm x 21.36 mm = 390 mm²), a chip-last technology is the only feasible option. This is true both from an economic / yields standpoint and because InFO-L doesn't scale beyond 1x reticle size yet (26 mm x 33 mm = 858 mm²).

TechInsights also discussed this a bit in a recent blog post which included some images showing the bump pitch for the UltraFusion link, which at 25 µm matches that of TSMC LSI.
Making sense. The post on RTF referenced an Apple patent and make a hypothesis based on that patent, which is an InFO based packaging (which is used since A10), and the author assumed that M1 Ultra packaging is InFO based as well, but it is different from how Apple describes the interposer, which is more likely a CoWoS packaging.
 
  • Like
Reactions: leman

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,677
I'm pretty sure the M1 Ultra is made using TSMC CoWoS-L.

Apple clearly states that "UltraFusion uses a silicon interposer that connects the chips" and their illustration shows two M1 Max dies coming together over what certainly appears to be a Local Silicon Interconnect (LSI) bridge. TSMC has developed both InFO and CoWoS packaging technologies incorporating LSI. The key distinction between the two is that InFO is chip-first, and CoWoS is chip-last. InFO starts with building a reconstituted wafer by placing known good dies (KGDs) on a carrier and then adds redistribution layers (RDL) for fanout and optionally LSI bridges. With CoWoS-L, a reconstituted wafer is built up by placing the LSI bridges on the carrier before filling and adding RDL, then the KGDs are bonded to the top of the stack. Apple has been using InFO_PoP for the A series chips since the A10, but with two dies as big as the M1 Max (18.26 mm x 21.36 mm = 390 mm²), a chip-last technology is the only feasible option. This is true both from an economic / yields standpoint and because InFO-L doesn't scale beyond 1x reticle size yet (26 mm x 33 mm = 858 mm²).

TechInsights also discussed this a bit in a recent blog post which included some images showing the bump pitch for the UltraFusion link, which at 25 µm matches that of TSMC LSI.

Thanks, yes this sounds very plausible to me and more realistic than the Maynard's post. Just to make sure I understand it correctly: the LSI approach seems to be a way to reduce the costs and improve the yields by reducing the size of the complex interposer to a minimum? So instead of using a large expensive complex interposer, they combine a cheaper, simpler carrier wafer with a smaller bridge-style interposer, right? Sorry if my take in this is naive, just trying to understand these things from my amateur perspective.

Ah, if you can find the time I would very much appreciate if you can post this on RTW as well. There are many knowledgeable people there that might have more interesting comments.
 

leman

macrumors Core
Original poster
Oct 14, 2008
19,521
19,677
P.S. What are the current practical limits of the current CoWoS package? Wikichip mentions something about 1700mm2, no idea how up to date that site is. Does inclusion of LSI bridges change this somehow?
 

repoman27

macrumors 6502
May 13, 2011
485
167
Thanks, yes this sounds very plausible to me and more realistic than the Maynard's post. Just to make sure I understand it correctly: the LSI approach seems to be a way to reduce the costs and improve the yields by reducing the size of the complex interposer to a minimum? So instead of using a large expensive complex interposer, they combine a cheaper, simpler carrier wafer with a smaller bridge-style interposer, right? Sorry if my take in this is naive, just trying to understand these things from my amateur perspective.

Ah, if you can find the time I would very much appreciate if you can post this on RTW as well. There are many knowledgeable people there that might have more interesting comments.
Yes, that's the gist of it.

With TSMC's traditional silicon interposer technology, CoWoS-S (sorry these names are ridiculous), you start with a standard silicon wafer, etch and fill to create through-silicon vias (TSVs), lay down all of your RDLs using standard photolithographic techniques, bond your active dies on top, grind down the back of the wafer to expose the TSVs, dice the wafer to singulate your interposers, and then attach each of those to a conventional organic substrate. Hence "Chip on Wafer on Substrate - Silicon interposer".

Processing silicon wafers is expensive, and although the interposers don't have active / polysilicon layers requiring the smallest achievable feature sizes, they do need TSVs and > 1x reticle lithographic techniques for the RDL / metal layers. Just to throw out a number (and I'm not sure I'm even in the ballpark here), let's say the finished interposer wafer costs $650. The M1 Ultra interposer would need to be a hair larger than the two M1 Max dies taken together, or > 18.26 mm x 42.72 mm. You can only cram 70 interposers that size on a standard 300 mm wafer, and might only yield 43 that were defect free due to their size, so each good interposer would end up costing around $15.12.

However, the M1 Ultra only requires the fine interconnect pitch of a silicon interposer along the relatively small area where the two dies meet—the UltraFusion part. A local silicon interconnect (LSI) bridge would only need to be ~18.26 mm x 3.8 mm, which is well within the reticle limit. You could fit 880 of those bridges on a 300 mm wafer and yield nearly 95% of them due to their smaller size. The cost of a silicon bridge would only be around $0.78.

With TSMC CoWoS-L, the LSI bridges would be placed on a carrier wafer, copper pillars would be built up around them, the empty space filled with resin, RDLs added using standard photolithographic techniques, active dies bonded on top, the carrier separated, the reconstituted wafer diced, and those assemblies would then be attached to a conventional organic substrate. Thus you have "Chip on Wafer on Substrate - Local silicon interconnect bridge".

P.S. What are the current practical limits of the current CoWoS package? Wikichip mentions something about 1700mm2, no idea how up to date that site is. Does inclusion of LSI bridges change this somehow?
5th generation CoWoS-S (CoWoS-S5) employs a novel 2-way lithographic stitching approach to achieve 3x reticle size (~2500 mm²) and was on track to achieve qualification by the end of 2021. CoWoS-L was undergoing qualification for 1.5x reticle size (~1250 mm²) last year, and a 3x reticle test vehicle was slated for Q2'21.
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
I'm pretty sure the M1 Ultra is made using TSMC CoWoS-L.

Apple clearly states that "UltraFusion uses a silicon interposer that connects the chips" and their illustration shows two M1 Max dies coming together over what certainly appears to be a Local Silicon Interconnect (LSI) bridge. TSMC has developed both InFO and CoWoS packaging technologies incorporating LSI. The key distinction between the two is that InFO is chip-first, and CoWoS is chip-last. InFO starts with building a reconstituted wafer by placing known good dies (KGDs) on a carrier and then adds redistribution layers (RDL) for fanout and optionally LSI bridges. With CoWoS-L, a reconstituted wafer is built up by placing the LSI bridges on the carrier before filling and adding RDL, then the KGDs are bonded to the top of the stack. Apple has been using InFO_PoP for the A series chips since the A10, but with two dies as big as the M1 Max (18.26 mm x 21.36 mm = 390 mm²), a chip-last technology is the only feasible option. This is true both from an economic / yields standpoint and because InFO-L doesn't scale beyond 1x reticle size yet (26 mm x 33 mm = 858 mm²).

Looks like Apple Package is more "stacked".

"... According to a presentation demonstrated by the foundry at the International Symposium on 3D IC and Heterogeneous Integration, Apple uses Integrated Fan-Out (InFO) with local silicon interconnect (LSI) and a redistribution layer (RDL). The slide was republished by Tom Wassick, a semiconductor packaging engineering professional. ..."
https://www.tomshardware.com/news/tsmc-clarifies-apple-ultrafusion-chip-to-chip-interconnect


So a. "quad" Max-like die size solution isn't really possible with this tech. Riding right at the reticle limit coupled to the packaging.

So the RAM is on a more traditional, lower tech interposer (not minimized bump connections) and this package stacked on top.

Unless there is a change on the InFo-LSI limit, that puts a kink on how much bloat they could add to the M2 series version of a Max die. Too big and blow past Info-LSI limit even on two dies. (e.g., two more E cores and some small tweaks that basically keeps the overall die size approximately the same. ) Or adds more pressure to jump to TSMC N3 . (to stuff substantially more into a the same size die. )


For a "quad" Max-like die sized solution CoWoS-LSI doesn't have the reticle constraint. Pretty good chance it is more expensive also. Multiple UltraFusion redistribution chips and larger set of chips to place and attach. Or Apple would need another variant of an interconnect that went longer distances through bigger 'bumps'/'pads' to second InFo-LSI package (and spend more power doing that).
 
  • Like
Reactions: Gnattu
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.