MP 7,1 What must Apple include in a new Mac Pro that we can’t live without?

goMac · Mar 3, 2022

The other problem is consistency. Right now Apple has a consistent GPU architecture from the iPhone and the iPad all the way up through the Mac. On every Apple Silicon device you have an SOC with an integrated GPU that follows the same optimization patterns. Apple can issue one set of guidance for GPU software optimization that applies to their entire lineup.

After making everything so consistent, would Apple really be willing to have just the Mac Pro all by itself on a non-SOC or hybrid architecture? Or would Apple really be willing to throw 3rd party cards back into the mix?

It's not impossible. Infinity Link was a Mac Pro only feature. But it makes third party cards look a lot less likely. And it would mean Apple putting in a ton of work to make expandable graphics look and perform as much like an SOC as possible.

ZombiePhysicist · Mar 3, 2022

deconstruct60 said:
It doesn't. GPU devices are not even in the System Extension framework. You are saying there is no instance of a category of device drivers.... "So what". I'm saying there is not even a category of that type of "extending" driver at all. That is much deeper "hole" to dig out of.

In modern macOS, the kernel is Apple only land. If there is no way in from the outside then there is no "exit ramp" for the 3rd drivers to get off to get to "kernel land". [ If there is freeway that goes though a town and there is no exit. then no one is getting off. There is no easy fix to that after the freeway is constructed. It is also demonstrative of intent. ]

Rosetta 2 was there at launch because they put effort into it. Apple had another whole year after launch to minimally put up a GPU class of system extensions even if didn't sign any extensions implementation. They did not.
They did say that "time is almost up" for USB and basic I/O extensions to switch over at 2021 ( decent chance they'll turning off that class of kext on this or next macOS upgrade cycle. ).

For non GPU's it already works. There are 30+ cards on the TB external PCI-e card enclosure that work with the M-series Macs:

(PDF file.)

https://www.sonnettech.com/support/downloads/manuals/TB_PCIe_Card_Compatibility.pdf

Echo III Desktop (Thunderbolt to PCIe Card Expansion System) - SONNETTECH

Dual-interface Thunderbolt expansion system with 2 PCIe slots and auxiliary power for high-bandwidth DV I/O PCIe cards. For Thunderbolt 4 and 3 computers.

www.sonnettech.com

That work got done. There are lots more that are not, but those tend to be either old (not getting much new driver work done for them) , not in the new System Extension class hierarchy, or coupled to early boot environment issues ( storage boot entanglements).

Says You. You are not Apple. The most consistent topic they drone on in every M-series presentation so far is Perf/Watt. To claim is not an Apple priority is a bit delusion. That priority isn't likely to change just for a product that is around 1% of the Mac product mix.

It is similar to the years between 2010-2019 where folks said Apple should dump Thunderbolt to make rolling a new cheesegrader easier ... and yet Apple never did that. Probably not now either. There will be Thunderbolt controllers on the die in the package this next round too. And the system is going to a priority on Perf/Watt. That is about the only way going to close pack 2-4 dies together and not take a single thread performance hit.

Like Thunderbolt is a priority with the rest of the Mac product line up leads to it becoming a "basic Mac property", Apple is doing effectively the same thing with Perf/Watt. Apple is shifting to tighter thermal enclosures for the rest of the line up. Chucking dGPUs. It is the same core design constraint is going to leak into the Mac Pro M-series packages because the die implementations are all related on the same family. Apple is extremely unlikely starting over from scratch and/or off on a different fab process for the Jade2C and Jade4C dies relative to the Jade-chop and Jade dies (M1 Pro and Max).

Apple is rumors to coming out with a "half sized". Mac Pro. Going to need to be some power saved somewhere to chuck half of the volume of the current Mac Pro ( much of that internal volume is primarily for thermal dissipation. If the volume is going then so is the thermal dissipation issue that drove the volume jump in first place. )

The Intel solutions rested upon a much bigger ecosystems than just the Macs. There was a broader subset of CPU to pick from there. It is not going to anywhere even close to near the same breadth when Apple is doing them all themselves. Even less so for the relatively very low volume products. There will be just enough adaptation to cover the difference but the basic design infrastructure is likely going to be more similar to laptops than data center servers.

No, not says me, says all Pros and why apple did an apology tour. They could repeat their mistakes ignoring their own apology tour, but I'm going to give them the benefit of the doubt that they will learn from their own mistakes.

Thank you for conceding on Rosetta. And to that point, for kernel access, nothing stops them from writing some new IOKit like API, or have some new signed kernel extension policy for GPUs. If they go to the substantial bother of putting in slots, I see no reason they wouldn't write some API to access those slots. Again, all you are talking about is will, not technical impediment.

deconstruct60 · Mar 3, 2022

goMac said:
Apple at a lot of times seems to be looking toward a multi GPU future with the Mac Pro. The 2013 was multi GPU, and the 2019 more gently so. So while I don't think Apple's just going to start throwing M1 Maxs on boards, it seems reasonable they could do linked GPUs. Just... probably not this year. Maybe Apple will surprise me though.

Linked GPUs isn't necessarily discrete GPUs. Linked on the same chip package is still linked. Very substantially lower Perf/Watt to do the linking also.

This linked at substantial distance away is something I'm not buying for where Apple is going. It is higher power draw which is going to run counter to the "Perf/Watt" dogma they have been following so far.

If can make the link short enough, low latency enough, and high bandwidth enough then will just have a one very big GPU. I think that is the track there are on. If go back to the April 2017 "dog ate my homework" meeting they said that most of the Mac Pro customers wanted a bigger single GPU rather than multiple ones. So why they would not be long term driving toward that would be a head scratcher. They also said there was a significant number of folks who wanted multiples (not the majority but enough). So over several iterations "bigger" (higher ALU count) on dies , but short linked dies probably would not fade away for a long time (if ever).

The very low latency, short link, efficient power connection thing will make it so more "single big die" folks are happy enough on a multi set up while Apple iterates on getting it low enough.

goMac said:
I think this is the problem I see too. It's a proprietary form factor. Depending on the linkage, you might need to pair proprietary cards to a matching generation with a matching GPU host. At that point, why bother with cards? Just order your Mac Pro the way you want it. Apple's accounting department would certainly like that better.

Buried in the "GPUs on cards" is often a AMD/Nvidia/Intel should be options too. Even now years after Nvidia drivers are off the table still have folks saying they want to buy a Mac Pro more so as a container so they can stuff a Nvidia card in it (and run Windows). This time the SoC foundation that Apple is going to is not predominated by Windows. Once that off the table as a design constraint , then gyrations to plug into the Windows dGPU market ecosystem are not going to be a top priority driver.

If the choice is just one GPU vendor just AMD , or just Intel that isn't all that particularly more broad than "Just Apple". Once it is "Just Apple" then it becomes something very similar to the rest of the Apple GPUs implementations.
Not just Apple's accounting... for relatively very low volume dGPU dies and cards the cost to the end user isn't going to be all that low either. The very high end MPX modules are all past were most Mac Pro users want to pay . For example, single AMD 6900 ... $6K

Radeon Pro W6900X MPX Module

For the maximum workstation-class graphics and demanding pro applications, choose the Radeon Pro W6900X MPX Module with 32GB of GDDR6 memory delivering up to 512GB/s memory bandwidth. You can install up to two in your Mac Pro using Infinity Fabric Link for enhanced multi-GPU performance in...

www.apple.com

That is $6K in part because almost nobody is going to buy it. ( Apple punches up the price to cover the overhead of doing it; a "Low volume tax" ) .

Similar thing with Mac Pro. From 2013-2019 it was give me a modular Mac Pro ... then new entry price gets set to $6K and for many it was "that's 'too much' for modularity". What many wanted was options to drive lower prices. Apple wasn't trying to do that in 2019 and probably still won't in 2022-25.

goMac said:
If I was going to be talked into cards, it would be an Infinity Fabric-ish meshy thing. But the simplest answer to me just seems like no GPU cards. You get what you get.

Again I think some folks are into long term cheapest cost thing. 5 years down the road , what is cheapest options to move my 5 year old chassis forward. What is most performance can buy 3-4 years from now when it will be cheaper.
For them commoditizing "fabric" is a better path to cheaper future tech.

For Apple I suspect their focus is more on short term connectivity. ( adding storage , network, A/V I/O , etc.) now rather than 3-4 years from now. And delivering GPU at higher Pref/Watt than the competitors (because that is what they have already for the rest of the product mix. Don't need another GPU with another strategy. )

They'll loose some long term Mac Pro users , but as long as they get enough new ones so that the volume sales are good enough to continue, that will work for them. ( Apple doesn't need the Mac Pro to be profitable or keep the lights on). Apple's GPUs likely won't keep up with the more high end computationally focused GPGPU. Probably won't win the "biggest and baddest" single GPU card either. But as long as they are competitive into enough of the upper half of the GPU spectrum the Mac Pro will be competitive enough to continue.

JMacHack · Mar 3, 2022

deconstruct60 said:
They'll loose some long term Mac Pro users , but as long as they get enough new ones so that the volume sales are good enough to continue, that will work for them. ( Apple doesn't need the Mac Pro to be profitable or keep the lights on). Apple's GPUs likely won't keep up with the more high end computationally focused GPGPU. Probably won't win the "biggest and baddest" single GPU card either. But as long as they are competitive into enough of the upper half of the GPU spectrum the Mac Pro will be competitive enough to continue.

In terms of pure gpu grunt, we’re seeing more move’s toward specialized accelerators rather than raw number crunching.

So what really stands in the way of making similar MPX modules such as their afterburner card, that have compute or possibly raytracing circuitry on them?

Naturally having to jump over an interconnect would be slower than having it on the silicon die, but it could cut 3D rendering and complex computation times vs. not having them at all.

Boil · Mar 3, 2022

JMacHack said:
In terms of pure gpu grunt, we’re seeing more move’s toward specialized accelerators rather than raw number crunching.

So what really stands in the way of making similar MPX modules such as their afterburner card, that have compute or possibly raytracing circuitry on them?

Naturally having to jump over an interconnect would be slower than having it on the silicon die, but it could cut 3D rendering and complex computation times vs. not having them at all.

Afterburner card is not a MPX module...

This is an Afterburner card, note that is has just the PCIe connection...

Below is a MPX GPU, note the actual MPX connection behind the PCIe connection...

goMac · Mar 3, 2022

JMacHack said:
In terms of pure gpu grunt, we’re seeing more move’s toward specialized accelerators rather than raw number crunching.

So what really stands in the way of making similar MPX modules such as their afterburner card, that have compute or possibly raytracing circuitry on them?

Naturally having to jump over an interconnect would be slower than having it on the silicon die, but it could cut 3D rendering and complex computation times vs. not having them at all.

The problem with cards like Afterburner, and the reason for success of stuff like Apple Silicon, is that specialized accelerators usually need to work together on a problem. And PCIe itself is beginning to become a bottleneck for that. If you want to have your GPU/CPU/encoder working on a problem together, you need a shared set of memory with extremely short links. PCIe is starting to become too long of a link with too little bandwidth.

A machine full of specialized accelerators running on PCIe actually makes things worse for PCIe, not better. That's why for Apple Silicon, Apple started moving the specialized accelerators onto the chip.

Apple could keep making a machine with PCIe links between accelerators. Third party GPU swapping is a good reason why. But MPX and PCIe are not completely performant as you add more specialized cards.

There is a path where Apple decides they are in a post-PCIe world because they have faster dedicated accelerators on their CPU than what you could add using PCIe anyway. And they could say that PCIe itself is the core bottleneck. Even if it's not true in all circumstances, it's certainly a differentiator they could still run with. And it is something you're hearing more and more from them on graphics.

deconstruct60 · Mar 3, 2022

JMacHack said:
In terms of pure gpu grunt, we’re seeing more move’s toward specialized accelerators rather than raw number crunching.

So what really stands in the way of making similar MPX modules such as their afterburner card, that have compute or possibly raytracing circuitry on them?

MPX connector solved two major issues. Provisioning inputs for Thunderbolt controllers ( PCI-e inputs for the TB controller possibly on the MPX module , and routing of DisplayPort for the built-in TB controllers) and power to the module ( Apple "hating" wires and 'mess' inside their systems. ) .

If put the GPU and TB controllers inside the M-series SoC then both issues are basically 'dead'. The power feed to the SoC solves the "no wires" power problem. The TB controller is on same die as GPU so about zero problems getting PCI-e and DP feeds.

The iPhone A15 has a small scale ProRes de/encoder in it. That should be an indicator of what the trendline is here long term. M1 Pro/Max have full scale. Extremely likely that will be somewhat normalized across whole M-series once the M2 ships ( perhaps with in iPhone small scale, but some basic support).

Afterburner is likely a dead end as product. I would be very surprised if there was ever an "Afterburner 2". There should be a slot for folks who already sank $2K into one in the next Mac Pro, but new ones? Probably not.

Afterburner didn't require tons of power and worked off of the 75W can get off of a standard PCI-e socket. Lots of cards can work without "AUX" power. [ there is no additional wire "mess" by adding slots targeted at those PCI-e cards. ]

Raytracing for interactive displays? Probably wouldn't work well. batch jobs? sure, but is that really a "Mac" core target market. Doubtful ever going to get pure 'RTPU' (raytracing processor unit). Unless pushing them to long term storage disk most folks are going to want to see it. On a GUI system like the Mac ... even more likely.

JMacHack said:
Naturally having to jump over an interconnect would be slower than having it on the silicon die, but it could cut 3D rendering and complex computation times vs. not having them at all.

They do have it with the general processors; no "not having at all" in the present systems. Really talking about better scaling to larger workloads (and again drifting into batch jobs territory).

ZombiePhysicist · Mar 3, 2022

I dunno. Pci5 is pretty crazy fast and for gpu, probably not much of a bottleneck even with memory access back and forth. Also pci6 spec should double it again. Seems like a pretty valuable option.

deconstruct60 · Mar 3, 2022

goMac said:
The problem with cards like Afterburner, and the reason for success of stuff like Apple Silicon, is that specialized accelerators usually need to work together on a problem. And PCIe itself is beginning to become a bottleneck for that. If you want to have your GPU/CPU/encoder working on a problem together, you need a shared set of memory with extremely short links. PCIe is starting to become too long of a link with too little bandwidth.

Apple, especially in the Mac Pro instances , has been relatively stuck in the mud when it comes to PCI-e. 2009-2019 on PCI-e v2. (while rest upper end workstation market moved to PCI-e v3 around 2012-13 ). 2019-now on PCI-e v3 when 2022 systems are rolling out on v4 and v5.

The M1 Pro gets about 200GB/s memory bandwidth. So the one of the "double" RAM packages is about 100GB/s (800Gb/s) . The M1 CPU cores bandwidth allocation is lower than that. Just how "bad , awful slow " is modern PCI-e x16 ?

"...

QPI (8.0GT/s, 4.0 GHz)	256.0 Gbit/s	32.0 GB/s	2012
QPI (9.6GT/s, 4.8 GHz)	307.2 Gbit/s	38.4 GB/s	2014
HyperTransport 3.0 (2.6 GHz, 32-pair)	332.8 Gbit/s	41.6 GB/s	2006
HyperTransport 3.1 (3.2 GHz, 32-pair)	409.6 Gbit/s	51.2 GB/s	2008
CXL Specification 1.x (×16 link)	512 Gbit/s	63.02 GB/s	2019
PCI Express 5.0 (×16 link) [51]	512 Gbit/s	63.02 GB/s[y]	2019
NVLink 1.0	640 Gbit/s	80 GB/s	2016
PCI Express 6.0 (×16 link) [52]	968 Gbit/s	121 GB/s[w]	2022
NVLink 2.0	1.2 Tbit/s	150 GB/s	2017

..."

List of interface bit rates - Wikipedia

en.wikipedia.org

PCI-e v5 and CXL are better than the QPI used to lash up multiple socket Xeon processor in 2014. Those Xeon systems worked OK to get productive work done with NUMA memory access. PCI-e v5 is in ballpark of NVLink 1.0 speeds... again a substantive number of Nvidia GPU cards with NVLink gots lots of work done for folks over last 2-3 years.

Apple is more so using PCI-e v4 on the M1 series so far mainly to reduce the number of I/O pads coming off the SoC as oppose to provisioning high end bandwidth. (get x8 of PCI-e v3 worth of bandwidth with on a x4 provisioning. woo-hoo! ).

The M1-series and likely the M2-series is probably stuck on PCI-e v4. (and perhaps narrower than x16 provisioning). Longer term and more mature CXL 2.0 support in future GPU cards it wouldn't be as big of a gap as pretending that PCI-e v3 x16 was the end of the road. The pre-2022 GPU cards have constraints problems , but that probably eases in 2023-25 if Apple does some work in the next iteration. It is more of a short term, rather than long term increasing divergence problem for a fair number of workloads.

An implementation of PCI-e v6 without CXL is probably missing a hefty chunk of its possible utility. It is mainly for being an accelerator interconnect bus.

goMac said:
A machine full of specialized accelerators running on PCIe actually makes things worse for PCIe, not better. That's why for Apple Silicon, Apple started moving the specialized accelerators onto the chip.

Power saving wise. But Apple's approach has scale out problems. As CXL rolls out PCI-e isn't really "stuck" on scale out coverage. It will consume more power, but also isn't capped.

goMac said:
Apple could keep making a machine with PCIe links between accelerators. Third party GPU swapping is a good reason why. But MPX and PCIe are not completely performant as you add more specialized cards.

If you point two Afterburner decodes at one GPU card then yes. But on more parallel and balanced workload assignments it isn't as much of a problem if have a decent PCI-e backbone provisioning the slots.

goMac said:
There is a path where Apple decides they are in a post-PCIe world because they have faster dedicated accelerators on their CPU than what you could add using PCIe anyway. And they could say that PCIe itself is the core bottleneck. Even if it's not true in all circumstances, it's certainly a differentiator they could still run with. And it is something you're hearing more and more from them on graphics.

macOS is a GUI focused operating system, but there is more than just graphics work to do. Storage , network , non-graphics centric problems.

Apple's "war" on dGPUs ... sure they can clobber that with their "poor man's HBM" LPDDR memory system bandwidth metrics. But the huge iGPU is a bit of a dual edged sword. It needs ( hogs) so much bandwidth so that problems that don't tightly couple to maximum GPU horsepower don't really get "big gap from" PCI-e bandwidth allocations. [ but can I can see why Apple is hesitant to put a very high bisection bandwidth PCI-e bandwidth complex on the internal network to compete with the GPU. Not enough to go around. ]

JMacHack · Mar 4, 2022

Thanks for the replies, I was more spitballing ideas. I’m aware of the flaws and problems in my ideas.

Still, I imagine some form of expansion is necessary.

Boil · Mar 4, 2022

I would think the following would be the add-in cards that need the most bandwidth:

Discrete GPUs
RAID cards (multiple NVMe M.2 SSDs)
Network cards (multiple 10Gb/40Gb Ethernet ports)

But of those, discrete GPUs are what really drive the need for increased PCIe bandwidth, and with discrete GPUs no longer part of the Apple silicon/macOS equation...?!?

Maybe "third time's a charm" will actually work here, and Apple can pull off a Mac Pro Cube with all expansion being handled via TB4/USB4 ports...!?!

Maybe we will get a peek at the performance of a Mac Pro Cube next week...? ;^p

JMacHack · Mar 4, 2022

Boil said:
Maybe "third time's a charm" will actually work here, and Apple can pull off a Mac Pro Cube with all expansion being handled via TB4/USB4 ports...!?!

The definition of insanity is trying the same thing over and over again expect different results.

edanuff · Mar 4, 2022

If these latest rumors are true, seems likely we’ll see this all play out with Mac Studio as the so-called half height Apple Silicon Pro and then just let the Intel Pro stick around on the price list for the high end slotbox use cases. Makes more sense than trying to make Apple Silicon play catch-up in a game Apple doesn’t want to play. Deftly sidesteps the endless debate of what “Pro” means.

ZombiePhysicist · Mar 4, 2022

I wouldn’t call that deft. Death is more like it. For the remaining pros sticking with apple, they will say bye bye to apple.

Joe The Dragon · Mar 4, 2022

Boil said:
I would think the following would be the add-in cards that need the most bandwidth:

Discrete GPUs

RAID cards (multiple NVMe M.2 SSDs)

Network cards (multiple 10Gb/40Gb Ethernet ports)

But of those, discrete GPUs are what really drive the need for increased PCIe bandwidth, and with discrete GPUs no longer part of the Apple silicon/macOS equation...?!?

Maybe "third time's a charm" will actually work here, and Apple can pull off a Mac Pro Cube with all expansion being handled via TB4/USB4 ports...!?!

Maybe we will get a peek at the performance of a Mac Pro Cube next week...? ;^p

May need some DATA only TB ports that don't share bandwith with video out. Or have 4+ channels?
needing to have video on ALL TB's bus does not really work to well when you need data channels as one SSD can max out an full TB channel.

also pci-e aux power leads are needed for some NON GPU cards and storage like sata needs power in to the device.

goMac · Mar 4, 2022

edanuff said:
If these latest rumors are true, seems likely we’ll see this all play out with Mac Studio as the so-called half height Apple Silicon Pro and then just let the Intel Pro stick around on the price list for the high end slotbox use cases. Makes more sense than trying to make Apple Silicon play catch-up in a game Apple doesn’t want to play. Deftly sidesteps the endless debate of what “Pro” means.

It's very possible that the Mac Pro goes Apple Silicon eventually. But Apple is likely internally having the same sort of discussion we're having in this thread.

goMac · Mar 4, 2022

deconstruct60 said:
If you point two Afterburner decodes at one GPU card then yes. But on more parallel and balanced workload assignments it isn't as much of a problem if have a decent PCI-e backbone provisioning the slots.

That's where things are going.

If you look at an Apple Silicon Mac, pro workflows may stress the GPU, Neural Engine, and Media Engine simultaneously. It sounds like the quad chiplet M1 Max may be out... but for the sake of argument... imagine 4 GPUs, four afterburner cards, and four media accelerator cards all trying to work on the same job at the same time. It's going to all technically work, but the PCIe performance is going to be the weakness in the system.

Of course sacrificing PCIe mean sacrificing customizability so it's not all great. But Apple could make a good argument that they're cramming the functionality of 8 to 16 cards into a single chip, and a PCIe backbone just can't handle that kind of load.

mattspace · Mar 4, 2022

edanuff said:
If these latest rumors are true, seems likely we’ll see this all play out with Mac Studio as the so-called half height Apple Silicon Pro and then just let the Intel Pro stick around on the price list for the high end slotbox use cases. Makes more sense than trying to make Apple Silicon play catch-up in a game Apple doesn’t want to play. Deftly sidesteps the endless debate of what “Pro” means.

The "Mac Studio" mentioned on the other thread sounds like a Workstation, in the same way the iPad has "Console Quality Graphics".

Note Intel just released the Dragon Canyon revision of their NUC, which advances the Beast Canyon compute element by having a socketed and user-replacable CPU, as well as ram, storage etc. 6x USBa, 2X TB4, 2x Ethernet, HDMI.

deconstruct60 · Mar 6, 2022

mattspace said:
The "Mac Studio" mentioned on the other thread sounds like a Workstation, in the same way the iPad has "Console Quality Graphics".

For literal Audio / Video Studio editing work that is probably a woefully wrong characterization. There is already commentary on these forums from folks who have a MBP 16" Max that a substantive amount of video work goes just fine on a single Max. Double Max would cover an even broader area high utility. Simarly to how several H.265 4:2:2 workloads was given some MP 2019 folks fits when it worked better on a M1 Mac.

Apple doesn't have to target the exact same set of folks doing exactly the same thing as they did with the 2012 Mac Pro. If they don't drop the Mac Pro then they full well know it isn't the same group being addressed. A different name is actually a positive communication sign. Probably means they mean they are not trying to "hand wave" their way to a 100% complete transition this year ( or at least by June).

mattspace said:
Note Intel just released the Dragon Canyon revision of their NUC, which advances the Beast Canyon compute element by having a socketed and user-replacable CPU, as well as ram, storage etc. 6x USBa, 2X TB4, 2x Ethernet, HDMI.

On CPU bound , multiple threaded workloads a 20 core "studio" will probably 'smoke' that Dragon Canyon. It will cost much more to purchase, but performance wise their will be a gap. If later put a Raptor Lake update in it ... probably still wins ( 8 E Gracemounts aren't probably going to completely reverse the tide by a substantial amount on a wide variety of workloads. 8 more Apple P's versus 8 E Gracemounts is an uneven 'exchange'. )

Dragon Canyon is capped at a 650W power supply.
"... The NUC12DCMi9 chassis has the same 'ease' of installation as the previous generation Extreme NUC. The dimensions allow the installation of dual slot GPUs up to 12" in length. The included 650W 80+ Gold internal PSU also supplies a 8-pin and a 2x6+2-pin connector for the GPU. ..."

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

Higher power source than a MP 2013 or iMac Pro 2017 ( or any new likely iMac Pro M-series replacement). Still not Mac Pro class.

When the Raptor Lake module sucks more power out of the supply they'll be less for the dGPU card. And the future 450-600W cards won't fly in this case.

If Apple's "double Max" presents as two GPUs then , yes there will be a bigger single GPU 'instance' advantage here. However, if it presents well as one "big" GPU then a 3080 cap won't be a huge gap for this over the Studio ( if there is one on video centric workloads. )

It is a cheaper upgrade ladder to walk. Can start with a CPU that is slower than a M1 Max and go to something competitive later. Likewise , can take a less heavy body blow from the current dGPU overheating prices now and buy something after perhaps the cryptomania has subsided more in a year or so.

deconstruct60 · Mar 6, 2022

JMacHack said:
The definition of insanity is trying the same thing over and over again expect different results.

It isn't exactly the same. There should be some substantive differences.

1. Thermally this will likely be substantively very different than Cube or Mac Pro 2013. Very high probability of at least one active fan (at least a 120mm one) . Perhaps two ( in the 100+ mm ) range). So not like the Cube at all. And if it is a large air inlet , front-to-back design (i.e., leave behind dogma have to spend lots of effort hiding the air vents and/or chimney. ).

The 2013 also had the two or more , uneven heat sources on shared thermal core. If have two even temperature dies then don't have imbalance problem. One die even less of an imbalance problem (if allow the M1 Max option to hit lower price point. )

If Apple comes out with another chimeny or "hide the fan on the bottom Mini" focus solution, then yes tag them with the "stuck on stupid and not getting off" label. One of the rumors said took design hints from Mac Pro... if it is only being open to highly visible , large air vents, then that is enough to get off the crazy road.

2. They are trading higher performance out of the soldered down RAM. Apple has a "Poor man's HBM" solution here. Yes loose the modularity but getting performance out of the trade-off. That is in contrast to mainly just more money in Apple's pocket. ( Apple is making more but getting some value out of the deep customization also. Is it a big enough to make hyper modularity folks happy? Probably not. Will folks who just want to buy and box and get work done be happier? Probably yes in most cases. So that is a difference. )

Apple will probably make argument that 800MB/s memory is better than PCI-e v4 or something like that. (access to unified memory is faster than PCI-e that we provide (and most others currently provide). )

3. The high amount of fixed function logic is going to make lots of folks quite happy in their workloads where there is a very good match. Getting most of an Afterburner for 'free' if have a huge stack of ProRes video to cut. Again Apple is bringing more value to the table than just being smaller here. In part, smaller because took the card and stuffed it onto the die.

Is that going to make folks who had no interest at all in buying an Afterburner card(s) happy? No. But it is also not doing the 'same thing' either. If Apple has narrowed down the target user base, then not doing the same thing.

Apple has way better AI/ML/Tensor offload libraries now than they had OpenCL coverage when launched the MP 2013. There is more Apple foundational libraries into the 'hidden magic units" in the systems now. AMX , NPU , A/V special units. etc. Software before deliver the 'better' hardware tends to get more traction.

There is also still a decent change that :

a. there is optional storage inside. ( M.2 slot , optional 2.5" SSD bracket , maybe if on the larger possible a slot ). So won't be stuck with one , and only one, internal drive constraint. [ which is kind of silly ... since storage can be much 'smaller' than 10+ years ago. Apple won't ship a BTO configuration with an 3rd party drive in it, but will optionally allow a 3rd party bracket to use a provisioned (but completely empty) SATA socket. ). Same with internal USB dongle of software locks. Shouldn't be too hard to provision what they did with the MP 2019 again here even in a slotless box. ]

Only one internal drive was a MP 2013 dead end there is about zero reason to repeat here if going bigger than a Mini or laptop case.

b. Thunderbolt 4 isn't going to 'die' like Firewire. It is more useful in the 2022 era than the expansion ports were in 2013 (and earlier ). Four TBv4 ports and a 10GbE port would deliver better value. ( Apple should go 'cheaper' on the external ports. The Cube effectively was and the 6 ports on the MP 2013 where an attempt to cover up the Thunderbolt wasn't as good as the hype at that point. More than four TB ports is like pouring more ketchup on something to make it better. It is likely the wrong solution. )

There is even lower mandated reservation for video data traffic in TBv4. Solves eGPU? no. but no where near same zone as Cube and significantly far less restricted as TBv2.

AMD is going to release USB4 ( and motherboard vendors can flush out the TBv4 certification ) solutions on next gen also. So not an "Intel only" thing.

c. Apple won't put last century Ethernet on the device ( 1GbE ). 10GbE or maybe 10GbE and a 1GbE. (Again minimal connection specs from Mac Pro as opposed to whole wireless is more wonderful dogma Apple often falls into these days. ) . In 2022 there is lots more data on NAS/SAN than in pre 2013 era. Times have changed. What would be crazy is not to move to a network standard that was invented in the current century delivering a system in.

Decent change the power supply could be modular ( and Apple may put their iMac 24" 1GbE gimmick on it too et to a second Ethernet if don't have enough edge space on the chassis itself. )

deconstruct60 · Mar 6, 2022

ZombiePhysicist said:
I wouldn’t call that deft. Death is more like it. For the remaining pros sticking with apple, they will say bye bye to apple.

Most of the folks who bough a 2019 MP over last two years aren't likely going anywhere. If Apple doesn't have something in 2023-2024 that would be a bigger issue. However, most of those folks are likely on 4-7 year buying cycles.

If talking about moving folks off Mac Pro 4,1, 5,1 ( 2009-2012) 'cheese grater' systems they are still clinging to in 2022? Do they even want to move? The hard core anti T2 folks have run out of options.

The ones that already left old 'cheese grader' are already gone.

iMac Pro 2017 and MP 2013 ... moving to a Mac Studio isn't a big jump for those that skipped the MP 2019.

Can Apple sell Intel Mac Pro's in 2022-23? They can move more if shift the prices around (and get rid of the "> 1TB RAM" tax from Intel and Apple.). A major issue with the folks who haven't bought one yet is price.

Move the 12 core down to where the 8 core starts. Get rid of that RAM tax. Perhaps update to W-3300 options for AVX-512 and SIMD optimized workloads ( Intel got rid to the tax with the W-3300 and has better base pricing) . Also do the RNDA2 speed bump coming from AMD for 6800. [ don't necessarily need new RNDA3 drivers to do that. ]

If Apple pushes the 40 Core / Quad Die option to M3 era ( into 2023 ) , a modest mod on the MP 2019 chassis would last another couple of years for die hard macOS fans. [ wafer start constraints probably better bang for buck by taking those 4 dies and making M1 Max with them than trying to goose sky high prices out of a very narrow few. Need more deployed M-series Macs out there, not necessarily more stratospherically priced ones. If there were no foundry shortages that would be a different trade off , but there are. Apple is leaving money on the floor because can't keep up with current demand. Need more individual dies for next year or so. That opens a window for Intel to hang on a while longer as an Apple client. ]

deconstruct60 · Mar 6, 2022

goMac said:
That's where things are going.

If you look at an Apple Silicon Mac, pro workflows may stress the GPU, Neural Engine, and Media Engine simultaneously. It sounds like the quad chiplet M1 Max may be out... but for the sake of argument... imagine 4 GPUs, four afterburner cards, and four media accelerator cards all trying to work on the same job at the same time. It's going to all technically work, but the PCIe performance is going to be the weakness in the system.

I'm dubious that that Quad die will work well when go to the extremes. Pulling data out of L3 cache of Die 3 to send to a GPU core on Die 2 , and a CPU core on Die 1 , and a NPU and ProDes decoder on die 4 with a display output processors on Die 3 . If it is NUMA for all 3-4 different types of function units that is probably not going to work so well either at the high end edge cases. PCI-e would be worse. But at least for the display output processor that NUMA impact would likely show up as an issue under ProMotion (120Hz) .

I suspect Apple is going to have to work on the scheduler where there are super tight cooperation that at least the different function units are co located so can leverage a 'local' L3 to send messages and handoffs back and forth. Perhaps the RAM and I/O is spread out over all the dies but that local nexus is close.
[ Looks like the NPU and Pro ResRes video en/decode units are located near each other in pairs for similar reasons. ]

Where it is going in general market is that there are "back end" memory busses and "front end" ones. NVLink, Infinity Fabric , QPI-next/Xe-link aren't going away. Stuff like Optane DIMMs probably are. There are "scale up" memory capacity issues that Apple is likely going to miss delivering a solution for.

Apple doesn't have a file storage system that can deal with PB sized storage and/or integrity issues either... so punt on that too.

PCI-e allows for point-to-point DMA . As long as pair up the units Accelerator-GPU and have them send data between themselves and segregate them into different workgroups it isn't a huge problem where there is a low-to-modest back and forth as long as the apps are built for that. ( e.g. the 'studies' where folks point to "don't need PCI-v4 because old games X , Y, Z still work ok on a x8 PCI-e v3 link. ) . I imagine there is some homogeneous and/or centralized security policy that Apple doesn't like there though. (only having to deal with Apple MMU units is very appealing to them.)

Apple's is pushing folks to pull their apps off of PCI-e ( unified memory Apple SoCs), so that does run in conflict. But in the large scope most apps are not being pushed in that direction. Supercomputer are still mostly MPI like systems with multiple independent nodes without one single huge shared memory space. More computation will be pulled internal to the nodes with better local CPU-GPU connectivity but the overall workload isn't going to collapse that way.

The problem with Apple's approach is that it doesn't really scale well 'up' or 'out'. The hyper local RAM with get less constrained as memory density evolve but will be limited by the speed of that evolutionary path. The hyper local dies ... similar issues. Fab process density improvements will offer relieve over time but on immediate term horizon it has limitations.

The core issue boils down to though is are a substantial number of Mac Pro users going to have acceleration workloads that are not covered by the stuff that Apple wants to work on inside of M-series SoCs. Those folks will be lost long term if Apple also doesn't have an alternative way to more effectively link in non-Apple acceleration hardware.
Apple's NPU are reasonable it the ML inferencing realm but no where near top-class in the training realm . Great for ProRes but AV1 ... not so much right now. Raytracing ... not as well right now if not 100% shackled to macOS.

Comes down to whether Apple wants to dig deeper moats around the Mac ecosystem or do they want to grow it ? The path they are on is a better moat digger. Have some new moats evolving around VR/AR and maybe car sensing inferencing, but that is mainly an inside out deployment (from Apple ).

Apple's path here lots pretty much like "Cathedral vs Bazaar". ( holy hardware vendor Unix vs Linux ). It has taken CXL a long time to "lift off" and get momentum but it is moving now. This new effort being cheerleaded by Intel will probably take a while also.

https://www.anandtech.com/show/1728...d-setting-standards-for-the-chiplet-ecosystem

I have doubts Apple is going to stay out in front as long as they think they are. Dumping PCI-e standards , dumping OpenCL/Vulkan/OpenGL , etc. will bite them in the butt over the longer term.

goMac said:
Of course sacrificing PCIe mean sacrificing customizability so it's not all great. But Apple could make a good argument that they're cramming the functionality of 8 to 16 cards into a single chip, and a PCIe backbone just can't handle that kind of load.

But they aren't doing 8-16 cards. The ProRes stuff is basically just doing what the one Afterburner card did. This isn't anywhere even close to "boo hoo I couldn't have three Afterbuner card stuff". And if upgraded to a PCI-e v5 Afterburner card could do more at least double the work with a single card. The MP 2019 is choked on PCI-e version 3 which was dated when the system shipped. ( PCI-e v4 had been out for years ... just hadn't trickled down to Intel level yet). In 2023, getting a FPGA card with PCI-e v5 on it isn't going to be a real tough problem to crack.

And the large number of DSP on a Avid HD Protools card. (or multiple Protools card) that's exterior inputs. that is the scale issue. once chewed down it isn't as big of a problem to pass onto the internal units inside the CPU-GPU package.

MarkC426 · Mar 6, 2022

Holy cow.....so many words dude (#94-97).....

So much information...

Let's hope Apple are reading this.

GlynH · Mar 6, 2022

MarkC426 said:
Let's hope Apple are reading this.

And if they are they should offer him a job! 👍

-=Glyn=-

goMac · Mar 7, 2022

deconstruct60 said:
But they aren't doing 8-16 cards. The ProRes stuff is basically just doing what the one Afterburner card did. This isn't anywhere even close to "boo hoo I couldn't have three Afterbuner card stuff".

A dual chiplet would have two media engines and two neural engines. If a rumored quad chip ever appears, that would be four neural engines and four media engines.

That's where the "Mac Pro full of cards" comparisons would come from. Apple could make the argument that M1 Max dual or quad is a configuration that would either require a lot of cards on a Mac Pro.

I don't disagree that abandoning PCIe would not be good for them (and now that Mac Studio and Mac Pro could be unlinked we'll see where they go.) But it's easy to see where they are going. They could try and make the argument that they're sacrificing expandability for performance that would be impossible in an expandable machine.

MP 7,1 What must Apple include in a new Mac Pro that we can’t live without?

macrumors 604

Suspended

macrumors G5

Suspended

macrumors 68040

macrumors 604

macrumors G5

Suspended

macrumors G5

Suspended

macrumors 68040

Suspended

macrumors 6502a

Suspended

macrumors 65816

macrumors 604

macrumors 604

macrumors 68040

macrumors G5

macrumors G5

macrumors G5

macrumors G5

macrumors 68040

macrumors regular

macrumors 604

Our Staff