Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
As I said a compute/render card that is used to send compute/render jobs to; the SoC (SoIC) handles what work one is doing in assorted apps, the compute/render card(s) handle all compute/render jobs sent to it for processing...

So these theoretical compute/render cards are NOT linked into the UMA, but recieve (and send back) compute/render jobs via the PCIe slot(s)...

NO display output from these compute/render cards...!
Yeah, that's exactly what I thought you meant—essentially these would be compute/render eGPU's, which would (as I mentioned) not be linked to the UMA, and thus not be integrated within Apple's unified architecture (which is why I don't think Apple would be interested).

Not sure why you wrote "NO display output from these compute/render cards...!" in response to my post, since I never mentioned a display output—we were discussing specialized GPU compute cards, which don't have video outs (at least not typically).

Note that "eGPU" simply means a GPU external to the main system, but connected to it. My use of that terminology implies nothing about whether they have video outs or not.
 
Last edited:
I think Apple should make PCIe-based ASi compute/render cards; use the SoC (SoIC) in the Mac Pro for real-time work & the compute/render card(s) for queued jobs...

I've posted here before, and maybe apple weren't interested in the market prior to AI as there wasn't anything to justify it, but I'm thinking a backplane to slot 8 or so entire M4 ultra SOC boards into.

It would certainly fit in the current Mac Pro case, and would probably fit within the power and cooling budgets. Maybe might need to limit it to 4 or 6 slots. But point remains...

Memory/cpu/gpu upgrades are just done all in one via M4 ultra boards.

i.e., each slot has an M4 (or whatever) Ultra with (up to) 512 GB of RAM and 80 GPU cores PER SLOT.

If they can get a high speed bus working between the slots, you're talking about a machine that would top out at 4 TB of RAM, 640 GPU cores and 256 CPU cores (half those figures for 4 slots obviously)

Would be as expensive as hell, but a bit of a flex and appropriate to next generation ML/AR development workloads as may be appropriate for future Apple Vision and cloud app development.
 
Last edited:
  • Like
Reactions: theorist9
I've posted here before, and maybe apple weren't interested in the market prior to AI as there wasn't anything to justify it, but I'm thinking a backplane to slot 8 or so entire M4 ultra SOC boards into.

It would certainly fit in the current Mac Pro case, and would probably fit within the power and cooling budgets. Maybe might need to limit it to 4 or 6 slots. But point remains...

Memory/cpu/gpu upgrades are just done all in one via M4 ultra boards.

i.e., each slot has an M4 (or whatever) Ultra with (up to) 512 GB of RAM and 80 GPU cores PER SLOT.

If they can get a high speed bus working between the slots, you're talking about a machine that would top out at 4 TB of RAM, 640 GPU cores and 256 CPU cores (half those figures for 4 slots obviously)

Would be as expensive as hell, but a bit of a flex and appropriate to next generation ML/AR development workloads as may be appropriate for future Apple Vision and cloud app development.
Yeah, people are already bridging Ultra Studios for AI, and IIRC one poster here said you can bridge 4. So this would just be creating a turnkey solution with a very fast interface.

How fast do you think the interface would need to be to avoid acting as a bottleneck?

As a reference, the (relatively expensive) DGX A100 offers a 100 Gb/s =12.5 GB/s InfiniBand interface (https://www.fibermall.com/blog/dgx-...oQnZCEWOxYWfTw6PDFM_Jwtb9PPXBjY5vJXBm7-PYzmPm), which doesn't seem that impressive given that x16 PCIe 5.0 offers 60 GB/s. Indeed, it's not much more than TB5's 80 Gb/s.
 
Last edited:
  • Like
Reactions: bcortens
which doesn't seem that impressive given that x16 PCIe 5.0 offers 60 GB/s. Indeed, it's not much more than TB5's 80 Gb/s.
x16 PCIe 5 has a theoretical 512 Gbps (64 GB/s) (ignoring overhead) while TB5 provides 80 Gbps (10 GB/s).

6x more bandwidth.
 
x16 PCIe 5 has a theoretical 512 Gbps (64 GB/s) (ignoring overhead) while TB5 provides 80 Gbps (10 GB/s).

6x more bandwidth.
Not sure what you're adding here. x16 PCIe 5.0 has 60.5 GB/s peak data transfer rate (i.e., including OH), which I rounded to 60 GB/s.

And I correctly indicated that that the 100 Gb/s* provided by the DGX A100 is not only far less than x16 PCIe 5.0, it also isn't even much more than the 80 Gb/s (incl. OH) offered by a single TB5 link.

[*Don't know if that includes encoding OH, but it could be InfiniBand EGR, which is 100 Gb/s including OH.]

I think you got confused and thought I was comparing TB5 to x16 PCIe 5.0, which I wasn't—that woudn't make sense there, since the purpose of that paragraph was to provide context to NVIDIA's connection speed by comparing it to both PCIe 5.0 and TB5.

The data transfer rate info. on TB5 and PCIe 5.0 was extracted from my more detailed post on p. 7 of this thread:

That limitation could provide a compelling use case for the next MP, even if it's only equipped with the same Ultra die as the contemporaneous Studio, because of its PCIe slots:

TB5's peak data transfer rate (after overhead) is 80 Gb/s = 10 GB/s (bidirectionally).

By comparison, using an x16 networking card on a MP woud give you these bidirection peak data transfer rates (after accounting for overhead from 242B/256B encoding):

x16 PCIe 5.0: 60.5 GB/s
x16 PCIe 6.0: 121 GB/s

Or they could offer their own proprietary bridging solution for the MP. But PCIe is already part of the MP, so using that saves them on development time & costs.
 
Last edited:
I think Apple should make PCIe-based ASi compute/render cards; use the SoC (SoIC) in the Mac Pro for real-time work & the compute/render card(s) for queued jobs...

The issue with this approach is that it would require new APIs and specialized code paths to take advantage of. And experience shows that folks just don’t like writing code like that.
 
Not sure what you're adding here. x16 PCIe 5.0 has 60.5 GB/s peak data transfer rate (i.e., including OH), which I rounded to 60 GB/s.

And I correctly indicated that that the 100 Gb/s* provided by the DGX A100 is not only far less than x16 PCIe 5.0, it also isn't even much more than the 80 Gb/s (incl. OH) offered by a single TB5 link.
You're right. I misunderstood your post thinking that you're comparing x16 PCIe 5 to TB5 bandwidth.
 
  • Like
Reactions: theorist9
I'm not sure if the difference would be significant. If you're curious, with some searching you can probably find extended high-load performance comparisons of the top-end M2 Ultra Studio and top-end M2 Ultra Mac Pro to check this.

And if the M2 Ultra Studio doesn't suffer by comparision, the M3 Ultra Studio wouldn't either, as the latter has a lower max TDP than the former:

View attachment 2501079

I was making assumptions so you may be right. I was curious and watched a couple Max Tech videos (below). The Mac Pro did perform a bit better than the Studio in some tests, but maybe not significantly by most people’s definition of the word. In the tests, neither Mac could get to throttling temperatures. It seems the Ultra chip may just be too power efficient and/or the cooling systems may just be too effective to push temperatures high enough. But I wonder if that means these chips could have been clocked a lot higher (especially in the Mac Pro) if Apple didn’t care about fan noise, and what that would have done for performance.

 
  • Like
Reactions: theorist9
Yeah, that's exactly what I thought you meant—essentially these would be compute/render eGPU's, which would (as I mentioned) not be linked to the UMA, and thus not be integrated within Apple's unified architecture (which is why I don't think Apple would be interested).

It would basically be like sending jobs to a render farm, but the farm was in the Mac Pro...?

Not sure why you wrote "NO display output from these compute/render cards...!" in response to my post, since I never mentioned a display output—we were discussing specialized GPU compute cards, which don't have video outs (at least not typically).

I mistakenly thought you were talking about the compute/render cards being tied into the UMA and boosting overall GPU performance...

Note that "eGPU" simply means a GPU external to the main system, but connected to it. My use of that terminology implies nothing about whether they have video outs or not.

My bad, when I see eGPU, I think of AMD/Nvidia GPUs with functional display outputs...
 
FWIW the lack of eGPU on Apple Silicon isn't a software problem, because it's not possible on Asahi Linux either. The IOMMU for the PCIe/Thunderbolt DMA prevents memory re-ordering which is a requirement on AMD/nVidia GPUs IIRC. They're all set to Device-nGnRE:


Considering how they intend for the CPU+GPU to utilize 'dynamic caching' I doubt they're going to change it any time soon.
 
IIUC what you have in mind, A PCIe render card sounds like it would essentially be a slotted eGPU. And, as you know, Apple's not shown an interest in supporting those.

I'm no better at reading Apple's tea leaves than anyone else. But if I were to guess, I'd expect if Apple wanted to increase the rendering/GPU compute power of its MP, it would add a separate GPU die to the SoC (perhaps in a stacked configuration on top of the base die), giving it access to its UMA. That would be consistent with its unified architecture in a way that an eGPU would not.
I think that the best implementation of this wouldn’t be a traditional GPU setup but would rather be kind of like having a cluster in a box. You basically have multiple ASi Macs within a single tower and you have a primary mac (the built in SoC) which controls and dispatches work to the others.
 
It very-well could.

It's often odd what makes it to total market exposure.

Maybe it was the AAPL Tax?
Not before USB-C. Anything lightning-like back then would have been frowned upon by the industry just because they knew it would be coming from Apple (It’s been said that group behind USB-C hid Apple’s contributions to it as they figured that was the only way to ensure it gained acceptance). Now that USB-C is behind us, a next generation lightning, developed and provided for free to the USB licensing body might just do the trick! :)
 
  • Like
Reactions: splifingate
The problem that USB-C solved that Lightening suffered from, is with micro-minaturised cable connections the male plug is inherently more robust than the female socket, which is subject to failing because of debris getting into it.

USB-C puts the 'female' part of the design in the more easily replaceable plug, whilst the socket is more or less a larger Lightening male plug mounted behind a simple slot in an enclosure, with two sets of retaining springs clamping the slot-ends when inserted (one in each half).

So if debris clogs up the individual sprung connection pins in the plug, you just easily replace a relatively cheap cable.
The flat Lightening-like embedded connectors in the enclosure are much more robust.

IMO a " a next generation lightning, developed and provided for free to the USB licensing body might just do the trick!"
would look just like USB-C....
 
Last edited:
  • Like
Reactions: splifingate
So if debris clogs up the individual sprung connection pins in the plug, you just easily replace a relatively cheap cable.
The flat Lightening-like embedded connectors in the enclosure are much more robust.
So you're saying that USB-C is more easily replaceable than Lightning if it fails (since the mostly likely failure point has been moved to the plug), but is less robust and thus more likely to fail.

That makes sense, except I've got USB-C ports on my iMac that have gotten looser with time (even with new cables), and I've seen reports online of people having to replace USB-C ports on their MBP's becase they loosened. All of this suggests that USB-C also has a likely failure mode that affects the installed ports.

My guess is that he "hold" with USB-C is supposed to come from the spring-contact connection that is internal to the plug, and is relatively weak. This is supplemented by the friction/stabilization provided by the contact between the outside of the plug and the surface it contacts inside the device—this friction/stabilization may not be part of the USB standard, but it still helps. When this loosens, most of the hold is from the spring-contact connection alone, which is what makes the ports seem loose.

While all ports probably loosen with use, the USB-C ports are the only ones where the loosening has been enough for me to notice.
 
Last edited:
  • Like
Reactions: splifingate
FWIW the lack of eGPU on Apple Silicon isn't a software problem, because it's not possible on Asahi Linux either. The IOMMU for the PCIe/Thunderbolt DMA prevents memory re-ordering which is a requirement on AMD/nVidia GPUs IIRC. They're all set to Device-nGnRE:


Considering how they intend for the CPU+GPU to utilize 'dynamic caching' I doubt they're going to change it any time soon.
It’s something I’ve always “known” wasn’t a software problem (Apple’s WWDC videos make it clear, but good to better understand the technical limitations), but I’m sure people will still be talking about how eGPU’s are coming to Apple Silicon any day now. :)

The fellow that just left Ashahi was adamant about it being possible… either he wasn’t aware, or he was providing false hope to folks.
 
IMO a " a next generation lightning, developed and provided for free to the USB licensing body might just do the trick!"
would look just like USB-C....
USB-C is already too thick for some of the devices being produced today. I don’t know how much thinner it can be while maintaining the current robustness.
 
@theorist9 "My guess is that he "hold" with USB-C is supposed to come from the spring-contact connection that is internal to the plug, and is relatively weak. This is supplemented by the friction/stabilization provided by the contact between the outside of the plug and the surface it contacts inside the device—this friction/stabilization may not be part of the USB standard, but it still helps."

The aperture in the case of the Mac never has any part in retaining the cable, unless you take pliers to totally distort the plug, and so wreck the spring function.
The main, outer, locking spring is in the enclosure. The one inside the plug is by necessity smaller.

Three Four pics:
1. The simplicity of the hole in the Mac (actually a Studio Display).
2. The interior spring in the plug, showing more fragile contacts (3). Spring shown blue.
3. The weak 'ears' of the Lightening-like interior of the socket. Wear here is a failure point.
The main springs are shown, blue and green.

EDIT in answer to post #219 (two down):
4. However in Apple's Thunderbolt sockets (from an Intel iMac) the retaining lugs are much stronger, made of steel.


USB-Cspring.jpg

USB-CplugDiag.jpg
USB-CsocketCU.jpg


USB-CportInnerCU.jpg
 
Last edited:
  • Like
Reactions: theorist9
@Unregistered 4U "USB-C is already too thick for some of the devices being produced today."

Good luck getting this lot into anything thinner. That's Apple's TB4 cable, is TB5/6/7... likely to be thinner?
Carrying <48v @<240 watts.
TB4cablePCB.jpg
 
Last edited:
  • Like
Reactions: transmaster
@theorist9 "My guess is that he "hold" with USB-C is supposed to come from the spring-contact connection that is internal to the plug, and is relatively weak. This is supplemented by the friction/stabilization provided by the contact between the outside of the plug and the surface it contacts inside the device—this friction/stabilization may not be part of the USB standard, but it still helps."

The aperture in the case of the Mac never has any part in retaining the cable, unless you take pliers to totally distort the plug, and so wreck the spring function.
The main, outer, locking spring is in the enclosure. The one inside the plug is by necessity smaller.

Three pics:
1. The simplicity of the hole in the Mac (actually a Studio Display).
2. The interior spring in the plug, showing more fragile contacts (3). Spring shown blue.
3. The weak 'ears' of the Lightening-like interior of the socket. Wear here is a failure point.
The main springs are shown, blue and green.



View attachment 2501316
View attachment 2501317View attachment 2501318
Thanks for taking the time to annotate the photo!

So the physical strength of the connection is determined by the tension of the springs you've drawn in blue and green (which are part of the cable), and the integrity of the ears indicated by your red arrows (which are part of the port). Thus if the device port provides a looser connection over time (i.e., where the looseness is due to the port and not the cables), it would be due to wear on those ears. [Assuming the issue isn't lint/gunk inside the port.]

Since they wanted to offload the failure risk from the device to the cable, why didn't they make those ears from a hard metal or ceramic instead of the grey material seen in your picture? Then the wear would occur mostly on the springs, not the ears.

And why didn't they design USB-C to maintain a more robust and solid connection? Clearly there is a problem in that regard compared with other ports, as evidenced by the add-on cable retention feature OWC felt compelled to provide for USB-C ports exceptionally.

Did they just drop the ball on this one, not realizing the deficiency in their design until the product had been out in the wild for some time and thus subject to a range of real-world uses?
 
Last edited:
(since the mostly likely failure point has been moved to the plug), but is less robust and thus more likely to fail.

The female Lightning port has the connectors on the outside wall, as opposed to USB-C, where the connectors are on the central-tab.

The outside-wall connectors of Lightning on the female-side are subjectively more robust, as this represents a much more simple fem-slot (one, central space, rather than that of USB-C which is presents more surface area).

I still giggle (a little) in the irony of the IF in the USB-IF Specifications ;)
 
Last edited by a moderator:
  • Like
Reactions: theorist9
I am so on the fence about buying an M4 Max Mac Studio with Apple's extortionate 128GB RAM. Anybody running smaller LLMs on the desktop want to talk me out of it?
 
People can make all of the power consumption comparisons they want, in some cases representing power users Apple should want the performance is more important than power savings. This is why I've always believed Apple should take on an exclusive MacOS partner to build Macs that will interoperate with more industry standards so customers don't have to make OS level decisions/changes. Nvidia and AMD GPUs will work with ARM/AS already.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.