MP All Models 40 core ASI with PCI-E slot is around the corner!!

Mago · Mar 14, 2023

I come here to debunk an false leak (bring here by me) about m2 max UltraFusion, and confirm another.

UltraFusion in m2 Max is at one side just like the m1 max, confirmed thru x-ray on M2 Mac book Pro system board.

That's x-ray also confirm double Data path and what seems a central bifurcation, or if it where separation for two sockets.

It reminded an theory or leak from a Chinese source which states m2 extreme are daisy chained in asymetrical arrangements, which seems me weird but consistent.

From that pair of UltraFusion (name it primary and secondary) one is directed to a root soc and the secondary is connected to the primary UF from child SOC.

How to arrange It along interposers is where speculation comes.

Someone suggested apple could put a pair of m2 Ultra side by side in front another pair in an ladder pattern. This is good for short path bias but also eclipses half ram channels, so an m2 extreme to have the same number of RAM channels as an m2 ultra (8).

An alternative arrangements implies two longer interposer, bit longer as the m2 max, in asymetrical or ladder arrangement the one at left connecting to the precedent or father SOC and the one at right connecting to the child SOC, in implies all m2 max disposed the same alignment (all up, not one up-one down as M1 ultra). Also implies a bit longer Data path but noise in silicon substrates it's easy to handle so it won't affect performance neither stability. But M2 extreme interposer to cost twice each at least and at least 3 are required to interface 4 m2 max.

A diagram:

Code:

Proposed S arrangement:

   (Ram)|M2|M2#(Ram)
        Π  U
(Ram)#W2|W2|(Ram)

Stacked:
   (Ram)|M2#(Ram)
        ↑==↓ (UF below the SOC)
   (Ram)|M2|(Ram)
        ↑==↓
   (Ram)|M2|(Ram)
        ↑==↓
   (Ram) M2|(Ram)

# denotes PCIe5 UltraFusion I/O terminator.

Both arrangements have its own pro and cons:

S: half ram channels blocked, all thunderbolt and native m2max I/O available. More complicated cooling.

Stacked: only master m2ax I/O available, all memory channels available, easier cooling.

S arrangement also allow for an PCIe5 I/O terminator at each extreme unused UltraFusion, while the stacked arrangement only one (at master/root), given each UF provides 2.5 TB a single PCIe5 UltraFusion terminator is enough for 5 PCIe5 slots, so I bet apple to deploy m2 extreme as a stacked m2 max, teaming 2,3 UpTo 4 soc. As only the master requires all it's I/O provisions.

Apple Knowledge Navigator · Mar 14, 2023

Mago said:
I come here to debunk an false leak (bring here by me) about m2 max UltraFusion, and confirm another.

UltraFusion in m2 Max is at one side just like the m1 max, confirmed thru x-ray on M2 Mac book Pro system board.

That's x-ray also confirm double Data path and what seems a central bifurcation, or if it where separation for two sockets.

It reminded an theory or leak from a Chinese source which states m2 extreme are daisy chained in asymetrical arrangements, which seems me weird but consistent.

From that pair of UltraFusion (name it primary and secondary) one is directed to a root soc and the secondary is connected to the primary UF from child SOC.

How to arrange It along interposers is where speculation comes.

Someone suggested apple could put a pair of m2 Ultra side by side in front another pair in an ladder pattern. This is good for short path bias but also eclipses half ram channels, so an m2 extreme to have the same number of RAM channels as an m2 ultra (8).

An alternative arrangements implies two longer interposer, bit longer as the m2 max, in asymetrical or ladder arrangement the one at left connecting to the precedent or father SOC and the one at right connecting to the child SOC, in implies all m2 max disposed the same alignment (all up, not one up-one down as M1 ultra). Also implies a bit longer Data path but noise in silicon substrates it's easy to handle so it won't affect performance neither stability. But M2 extreme interposer to cost twice each at least and at least 3 are required to interface 4 m2 max.

A diagram:

Code:

Proposed S arrangement: (Ram)|M2|M2#(Ram) Π U (Ram)#W2|W2|(Ram) Stacked: (Ram)|M2#(Ram) ↑==↓ (UF below the SOC) (Ram)|M2|(Ram) ↑==↓ (Ram)|M2|(Ram) ↑==↓ (Ram) M2|(Ram) # denotes PCIe5 UltraFusion I/O terminator.

Both arrangements have its own pro and cons:

S: half ram channels blocked, all thunderbolt and native m2max I/O available. More complicated cooling.

Stacked: only master m2ax I/O available, all memory channels available, easier cooling.

S arrangement also allow for an PCIe5 I/O terminator at each extreme unused UltraFusion, while the stacked arrangement only one (at master/root), given each UF provides 2.5 TB a single PCIe5 UltraFusion terminator is enough for 5 PCIe5 slots, so I bet apple to deploy m2 extreme as a stacked m2 max, teaming 2,3 UpTo 4 soc. As only the master requires all it's I/O provisions.

Makes perfect sense. Having the additional UltraFusion connector on just one side would mean that one Ultra group (2 x Max dies) would face one orientation, the other group the opposite. So although from a parallel perspective it would appear to be 4 x dies next to each other, it would actually be two up and two down.

As for cooling, this would provide a nice (unintended) benefit of allowing a heatsink to be placed either side of the Extreme setup, much like a sandwich - which would effectively halve the distribution of heat on either side.

Boil · Mar 14, 2023

Mago said:
I come here to debunk an false leak (bring here by me) about m2 max UltraFusion, and confirm another.

So the whole double UF was a false rumor (from your inside source), and you are now back to confirm another false rumor...?

Mago said:
UltraFusion in m2 Max is at one side just like the m1 max, confirmed thru x-ray on M2 Mac book Pro system board.

That's x-ray also confirm double Data path and what seems a central bifurcation, or if it where separation for two sockets.

Pics of these x-ray images, or at least a link...?

Mago said:
How to arrange It along interposers is where speculation comes.

A diagram:

Maybe we could get actual diagrams, drawn out on paper...?

I am still going to hope for the ASi Mac Pro to debut with N3B M3 Ultra & M3 Extreme SoCs, and for some sort of ASi (GP)GPU for pushing compute/render jobs to; iGPU for display output, (GP)GPU for compute/render tasks...

Oh, the ASi Mac Pro should also have LPDDR5X RAM & hardware ray-tracing...! ;^p

Mago · Mar 14, 2023

Boil said:
these x-ray images, or at least a link...?

My personal work is private.

Boil said:
double UF was a false rumor

No, always been double (as now) just not the way I think it seemed logic according my own knowledge in the area.

Boil said:
I am still going to hope for the ASi Mac Pro to debut with N3B M3 Ultra & M3 Extreme SoCs, and for some sort of ASi (GP)GPU for pushing compute/render jobs to; iGPU for display output, (GP)GPU for compute/render tasks...

Oh, the ASi Mac Pro should also have LPDDR5X RAM & hardware ray-tracing...! ;^p

As enthusiast I'd love the Mac Pro arriving on M3 ASi even with the possibility of an AMD Epyc-like chiplet complex with at least 80 CPU cores (64+16/8x8+2x8), but I doubt Apple going extreme with first n3b product.

Boil said:
Maybe we could get actual diagrams,

Ask Jon Proseer, he does a lot on toilet-paper 🫢🤫😜

Mago · Mar 14, 2023

A secondary analysis on the face of today disclosure:
M2 max already includes upgraded UltraFusion, while not a configuration I'll propose it is there, if Said UF2 failed to escalate to 4 ways it's too late to change m3 design, so if happens the m2 extreme was cancelled due something wrong with UF2, hopes for an m3 extreme Mac pro implies apple following an complete new approach at multiple chiplets Complex otherwise forget m3 extreme.

Don't worry, whatever it looks apple m2 extreme is live and well, stacking enough for ASi Mac Pro 14,17 14,19

Mago · Mar 14, 2023

Apple Knowledge Navigator said:
that one Ultra group (2 x Max dies) would face one orientation, the other group the opposite.

I also consider it could be arranged in a diamond or X each M2 max facing a single irregular octagon shaped UF 4x4 bridge it leaves exposed both I/O and both 4 memory channels at each m2 Max.

Although it's cooling solution would be quite complicated both in size and shape.

kiiso · Mar 15, 2023

What about these old leaks from Majin Bu?
Isn't it possible that contrary to the original leak, it's about M2 Extreme and not M1 Extreme? 🤔

https://twitter.com/x/status/1502675792886697985

https://twitter.com/x/status/1503774292651261955

Mago · Mar 15, 2023

kiiso said:
What about these old leaks from Majin Bu?
Isn't it possible that contrary to the original leak, it's about M2 Extreme and not M1 Extreme? 🤔

https://twitter.com/x/status/1502675792886697985

https://twitter.com/x/status/1503774292651261955

This "leak" has double incoherency: UF bias is at bottom an single side, m2max LR sides are for ram interface, and top side i/o thunderbolt PCIe USB etc.
Neither consistent with actual m2 max proportions, looks more like an concept by undergraduate student.

DaisyXL reminds me CXL 2

Mago · Mar 15, 2023

People Here:
Understand: Unified Memory is not soldered memory, it's just sharing system ram with CPU and GPU, by definition every AMD and Intel APU without dedicated VRAM uses unified memory, Apple marketing using dark concept to apologize it's design: soldered ram, what makes an ASi faster and more efficient than others is its monster L2 cache, as Big as AMD Ryzen 7950X3D L3 (read L3) cache, L2 cache is much faster than L3, ASi L2 cache is shared (unified) across all CPU and GPU cores, system RAM is connected to it as middleware not data goes ever directly from sys ram to CPU or GPU.

It has advantages and disadvantage, monster L2 cache are crazy expensive and should be on the same silicon, while L3 cache could be at the interposer (as some AMD Ryzen), Apple also discard L3 cache, soldering RAM don't lower latency neither allow faster link (while it's true an socket interface adds noise by capacitance it's filtered by RAM interface chips at no data rate cost).

ASi GPU will never rival dedicated GPU on systems requiring huge GPU RAM because it's RAM interface is just at 400GB/s while a consumer rtx4090 it's 1000GB/s, an m2 Ultra won't go beyond 800GB/s and while m2 extreme could reach 1600gb/s it's fair then compare it with proper workstation GPU: RTX A6000 ADA it's 960GBB/s single, teamed thru PCIe5 it's 242 GB/s for such data non GPU local chances for Algorithm being throttle by teaming on 242GB/s bus are low, but ASi memory also relies on UltraFusion 2.5TBit/s bus (312GByte/s) while a bit better than PCIe5 you need it 4x while Nvidia only 2 on same scenario, ah not to mention a single rtx ada is twice as fast as an theoretical m2 extreme.

Indeed if apple has real workstation market on target (as mp 7,1), there is no way it can be done by soldered RAM neither ASi M2 GPU, mandatory the ASi Mac Pro needs both discreet RAM (4tb) and CPU/sysRam independent GPU, e.e. you believe about compute performance is marketing BS.

Thus rumour on ASi dedicated discreet GPU as about AMD (even Intel) dGPU support comeback should not even be considered a rumour but a pending goal.

Boil · Mar 15, 2023

Mago said:
People Here:
Understand: Unified Memory is not soldered memory, it's just sharing system ram with CPU and GPU, by definition every AMD and Intel APU without dedicated VRAM uses unified memory, Apple marketing using dark concept to apologize it's design: soldered ram, what makes an ASi faster and more efficient than others is its monster L2 cache, as Big as AMD Ryzen 7950X3D L3 (read L3) cache, L2 cache is much faster than L3, ASi L2 cache is shared (unified) across all CPU and GPU cores, system RAM is connected to it as middleware not data goes ever directly from sys ram to CPU or GPU.

It has advantages and disadvantage, monster L2 cache are crazy expensive and should be on the same silicon, while L3 cache could be at the interposer (as some AMD Ryzen), Apple also discard L3 cache, soldering RAM don't lower latency neither allow faster link (while it's true an socket interface adds noise by capacitance it's filtered by RAM interface chips at no data rate cost).

ASi GPU will never rival dedicated GPU on systems requiring huge GPU RAM because it's RAM interface is just at 400GB/s while a consumer rtx4090 it's 1000GB/s, an m2 Ultra won't go beyond 800GB/s and while m2 extreme could reach 1600gb/s it's fair then compare it with proper workstation GPU: RTX A6000 ADA it's 960GBB/s single, teamed thru PCIe5 it's 242 GB/s for such data non GPU local chances for Algorithm being throttle by teaming on 242GB/s bus are low, but ASi memory also relies on UltraFusion 2.5TBit/s bus (312GByte/s) while a bit better than PCIe5 you need it 4x while Nvidia only 2 on same scenario, ah not to mention a single rtx ada is twice as fast as an theoretical m2 extreme.

Indeed if apple has real workstation market on target (as mp 7,1), there is no way it can be done by soldered RAM neither ASi M2 GPU, mandatory the ASi Mac Pro needs both discreet RAM (4tb) and CPU/sysRam independent GPU, e.e. you believe about compute performance is marketing BS.

Thus rumour on ASi dedicated discreet GPU as about AMD (even Intel) dGPU support comeback should not even be considered a rumour but a pending goal.

LPDDR5X would up the UMA bandwidth...?

Why does the ASi Mac Pro "need" 4TB of RAM...? Only reason it had 1.5TB with the 7,1 was because of the Xeons Apple was using, all other Power/Mac Pros never came near that capacity...

As for discrete GPU; I'll take the ASi variant, thanks...! ;^p

Looking forward to the M3 Ultra & M3 Extreme SoCs running at 4.2GHz in the ASi Mac Pro...

Mago · Mar 16, 2023

Boil said:
LPDDR5X would up the UMA bandwidth...?

Yes but at least it would require either an m2max specific variant or overclocking the whole soc, the latter more likely but not optimal.

Boil said:
Why does the ASi Mac Pro "need" 4TB of RAM...?

Cause those buying it mostly aren't gamers.(sarcasm)
Well, seriously with Mac Pro 6,1 Apple Lost most physics research and AI workstation users, while physics not as memory demanding as AI it demanded huge fp32/fp64/bigInt performance at that time only CUDA GPUs enabled, even CUDA ruled tensorflow by Long time at the extreme Apple (Siri) is among biggest Nvidia customers in USA (despite how much both sides hate each other), Apple invested a lot of money enabling tensorflow inference and training on Metal performance shaders, in general the HPC ecosystem was practically ignored by 4-5 years by Apple when executives insisted on sunseting the Mac and focusing on mobile platforms, that changed and now we have Metal performance shaders compute integration in swift, Julia, Python, tensorflow, pytorch (my favorite IA framework), but Mac still lacks competitive GPUs, developing an GPU capable to rival Nvidia is the holy grail of the high-tech industry, Apple just an newcomer and despite has an great GPU development team the GPU stars still signed at Nvidia, Intel and AMD.

For AI, machine learning etc 4tb ram only buy the boarding ticket.

Apple is failing miserable at AI, huge development about are long awaiting.

Boil said:
for discrete GPU; I'll take the ASi variant, thanks...! ;^p

Don't expect to beat an 3000$ PC with an single rtx4090 with your 10000$+ Mac Pro, actually in gaming Nvidia will humiliate apple by long time, better chances with AMD.

ChrisA · Mar 16, 2023

exoticSpice said:
Right now the highest VRAM offered by AMD and Nvidia is 64GB and 48GB respectively. If the AS Mac Pro has 384GB of Unified RAM then there is no way AMD and Nvidia offer go over 100GB VRAM.

Apple WILL have the memory advantage for video RAM but that is the ONLY advantage they will have as Apple will to cost, speed and software and hardware stack. Apple's GPU arch is not suited for workstations.
Next gen GPUs will be on 5nm or 4nm TSMC and you control the powerlimts that is put the next gen cards at the a certain TDP they will still beat Apple's GPU which will likely be on TSMC 5nm as well.

Nvidia's H100 GPU has 80GB VRAM The Apple M2 Ultra can have up to 192 GB of unified RAM so Apple "wins" if you count VRAM.

My guess is that Apple will place the M2 Ultra on a PCIe bus and then you can add moreUltras up to some limit. I'd guess maybe four of them in the first version then later they could make a larger chassis that fits 8 or 16 "ultra" chips

spaz8 · Mar 16, 2023

The H100 is a $36K USD GPU. To say its is only 10x faster than the compute of what an M2 Ultra's compute might be capable of is probably a huge disservice to the H100. These GPUs likely never run solo, if you can afford one you can afford 2, likely 8 in some DGX config. So ya could you have an ASI mac work on some large ML model that needs 100+ GB of ram, cuz you can't make smaller batch sizes, and can and are willing to turn your computer into a hot paperweight for a month sure. In the professional world where time is money someone buys 2 H100's and gets a result in a day and a half and then improves the model 20 more times while you wait for your first result.

Kimmo · Mar 16, 2023

An interesting piece on Nvidia's A100 and the next generation H100.

Meet the $10,000 Nvidia chip powering the race for A.I.

The $10,000 Nvidia A100has become one of the most critical tools in the artificial intelligence industry,

www.cnbc.com

Mago · Mar 17, 2023

exoticSpice said:
Right now the highest VRAM offered by AMD and Nvidia is 64GB and 48GB respectively. If the AS Mac Pro has 384GB of Unified RAM then there is no way AMD and Nvidia offer go over 100GB VRAM.

Apple WILL have the memory advantage for video RAM but that is the ONLY advantage they will have as Apple will to cost, speed and software and hardware stack.

Error, CUDA allows sharing system memory withbit's GPU by independent DMA access and as with ASi it's limited by system bus speed being GPU faster on die (actually on die, not soldered on the same PCB) than from system (figures UpTo 100x aren pre-pcie3, with PCIe5 5 latency slow system ram by about 5x, ASi GPU memory latency varies as from which die is the core asking for an memory location and if said block of memory is at l2 cache indeed it varies from 0-6x on ASI compared with Nvidia).

Apple's unified memory has both advantages and disadvantages, it's mostly an efficient performant model for most of today consumer apps, but not q magic bullet "can improve everything" as for machine learning it fella long short, for other science computations as finite element simulation it's really good, for NLP depends hugely on model size but it lies from mean to poor.

I believe upcoming apple ASi dedicated Compute device to be an PCIe5 card loaded with 1 m1 ultra or one M2 extreme both with some or all CPU cores disabled, and some local RAM both shared with main system. Indeed won't expect it to beat an rtx4090 as much could match an rtx4070ti

Mago · Mar 18, 2023

New tip:

Seemly ComputeDevice , not exactly an new boot mode, but the boot for repurposed semi-defective silicon from Mac studio (having some CPU or GPU cores disabled) as eGPU/compute device Apple to sell it along or instead ASi Mac Pro, if apple decided to sell it instead m2 ASi Mac Pro means the Mac Pro will delay until m3 or m4 version is free of issues.

It's especial iOS actually is an new boot firmware for M1 Mac studio or an ecompute device based on it and to be tethered by TB4 to every Mac.

Personally I don't give it a $h1t, but at rumourland it sounds loud.

I expect this or next week an huge leak about ASi Mac pro keep eyes on @l0vetodream,Gurman, Proser and Ritchie/iMore.

Macintosh IIcx · Mar 18, 2023

Mago said:
I come here to debunk an false leak (bring here by me) about m2 max UltraFusion, and confirm another.

UltraFusion in m2 Max is at one side just like the m1 max, confirmed thru x-ray on M2 Mac book Pro system board.

That's x-ray also confirm double Data path and what seems a central bifurcation, or if it where separation for two sockets.

It reminded an theory or leak from a Chinese source which states m2 extreme are daisy chained in asymetrical arrangements, which seems me weird but consistent.

From that pair of UltraFusion (name it primary and secondary) one is directed to a root soc and the secondary is connected to the primary UF from child SOC.

How to arrange It along interposers is where speculation comes.

Someone suggested apple could put a pair of m2 Ultra side by side in front another pair in an ladder pattern. This is good for short path bias but also eclipses half ram channels, so an m2 extreme to have the same number of RAM channels as an m2 ultra (8).

An alternative arrangements implies two longer interposer, bit longer as the m2 max, in asymetrical or ladder arrangement the one at left connecting to the precedent or father SOC and the one at right connecting to the child SOC, in implies all m2 max disposed the same alignment (all up, not one up-one down as M1 ultra). Also implies a bit longer Data path but noise in silicon substrates it's easy to handle so it won't affect performance neither stability. But M2 extreme interposer to cost twice each at least and at least 3 are required to interface 4 m2 max.

A diagram:

Code:

Proposed S arrangement: (Ram)|M2|M2#(Ram) Π U (Ram)#W2|W2|(Ram) Stacked: (Ram)|M2#(Ram) ↑==↓ (UF below the SOC) (Ram)|M2|(Ram) ↑==↓ (Ram)|M2|(Ram) ↑==↓ (Ram) M2|(Ram) # denotes PCIe5 UltraFusion I/O terminator.

Both arrangements have its own pro and cons:

S: half ram channels blocked, all thunderbolt and native m2max I/O available. More complicated cooling.

Stacked: only master m2ax I/O available, all memory channels available, easier cooling.

S arrangement also allow for an PCIe5 I/O terminator at each extreme unused UltraFusion, while the stacked arrangement only one (at master/root), given each UF provides 2.5 TB a single PCIe5 UltraFusion terminator is enough for 5 PCIe5 slots, so I bet apple to deploy m2 extreme as a stacked m2 max, teaming 2,3 UpTo 4 soc. As only the master requires all it's I/O provisions.

Sorry, but I don’t quite understand whether you are providing some form of leaked information here or this is just educated speculation?

ZombiePhysicist · Mar 18, 2023

Macintosh IIcx said:
Sorry, but I don’t quite understand whether you are providing some form of leaked information here or this is just educated speculation?

I think both.

Mago · Mar 18, 2023

ZombiePhysicist said:
I think both.

Gotcha

innerproduct · Mar 21, 2023

Just a friendly reminder that there are a lot of cool stuff being created with arm cores. https://wccftech.com/nvidia-grace-c...ficiency-versus-latest-x86-data-center-chips/
Why couldn’t this be the next mac pro chip? Or something really similar

sirio76 · Mar 22, 2023

So you want to put a server oriented CPU inside a workstation? It doesn’t seems smart.

mattspace · Mar 22, 2023

sirio76 said:
So you want to put a server oriented CPU inside a workstation? It doesn’t seems smart.

As opposed to a cellphone oriented CPU?

PineappleCake · Mar 22, 2023

sirio76 said:
So you want to put a server oriented CPU inside a workstation? It doesn’t seems smart.

It's not that crazy. The Nvidia SoC supports up to 1TB of RAM which the Mac Pro needs to have minimum. 192GB is not going to cut it.

sirio76 · Mar 22, 2023

That's hilarious, only a very very small set of people need that much memory, even in a workstation.
You can be sure that Apple knows exactly the percentage on MacPro 7.1 user that have installed 1.5TB, and I would be extremely surprised to discover that more than 5% of the user have done that.
You can build a workstation with less than 500GB of RAM and be sure that it will cover 95% of the workstation market needs.
You can start a pool and ask how much RAM people have installed on the 7.1 or on their PC workstations.

sirio76 · Mar 22, 2023

mattspace said:
As opposed to a cellphone oriented CPU?

The fact that AS share core technology with smaller SoC doesn't mean that they are all equal, try to put an M1 Ultra in an iPhone and you will see with I mean

and BTW, Grace still use ARM cores derived from the mobile ARM core, just like the one from Apple.
For a workstation you want something in the middle, there is a reason why both Intel and AMD have specific CPU for the workstation market and most OEM do not use their serve grade CPU.

MP All Models 40 core ASI with PCI-E slot is around the corner!!

macrumors 68030

macrumors 68040

macrumors 68040

macrumors 68030

macrumors 68030

macrumors 68030

macrumors member

macrumors 68030

macrumors 68030

macrumors 68040

macrumors 68030

macrumors G5

macrumors 6502

macrumors 6502

macrumors 68030

macrumors 68030

macrumors 6502a

Suspended

macrumors 68030

macrumors regular

macrumors 6502a

macrumors 68040

Suspended

macrumors 6502a

macrumors 6502a

Our Staff