MP All Models 40 core ASI with PCI-E slot is around the corner!!

exoticSpice · Jul 14, 2022

avro707 said:
If Apple chooses to go down that path, some users might just migrate over to top end PC workstations. The prices are similar and Windows 11 Pro for Workstations seems to be quite decent (I'm running it as a second OS on my 7,1 MP).

Will Apple even support AMD 7000 series on macOS?
Intel in maybe 3-5 years will also be an option for the PC workstations but Mac will be stuck with Apple GPUs which apart from efficiency are crap for 3D and compute workloads and gaming also sucks cause no Vulkan or OpenGL support on Apple GPUs.

Only saving grace is the AS Mac Pro might work with Linux ARM and that way we can use AMD or Intel GPUs in the future. But then you might as well buy a PC as those GPUs will not work under macOS because Apple does not want to support third party GPUs for AS Macs.

mattspace · Jul 14, 2022

exoticSpice said:
Right now the highest VRAM offered by AMD and Nvidia is 64GB and 48GB respectively. If the AS Mac Pro has 384GB of Unified RAM then there is no way AMD and Nvidia offer go over 100GB VRAM.

Hence my question as to whether there's any point going higher, given the moves to DMA for system ram for GPUs. If there's a benefit to 300gb+ of VRAM that would allow Apple to market a discreet GPU product on that basis, AMD & Nvidia will surely scale up.

I still remember when "integrated graphics" was called "vampire graphics" and Apple used to market about how good their systems were, because the graphics DIDN'T steal system memory. *lol*

exoticSpice said:
Apple WILL have the memory advantage for video RAM but that is the ONLY advantage they will have as Apple will to cost

I don't believe Apple will put the processor / ram from a ~AU$6000 Mac Studio, which is handily spanked on Apple's own graphics benchmark by a ~AU$1000 AMD GPU, into a card, and sell it for anywhere near the market price of the GPUs it's competing against. If they do it, they'll market it against the cost of the computer it most closely matches. "It's a Mac Studio on a card" etc.

h9826790 · Jul 14, 2022

exoticSpice said:
Right now the highest VRAM offered by AMD and Nvidia is 64GB and 48GB respectively.

For info, Nvidia has 80GB VRAM graphic card.

NVIDIA A100 GPUs Power the Modern Data Center

The fastest data center platform for AI and HPC.

www.nvidia.com

Of course, that still very far away from 384GB, however, that's a released product. And it allow up to 1.3 TB of unified memory per node.

TBH, I highly double if Apple will release anything for the 8,1 can compare to that.

deconstruct60 · Jul 15, 2022

h9826790 said:
For info, Nvidia has 80GB VRAM graphic card.

NVIDIA A100 GPUs Power the Modern Data Center

The fastest data center platform for AI and HPC.

www.nvidia.com

Of course, that still very far away from 384GB, however, that's a released product. And it allow up to 1.3 TB of unified memory per node.

Only far away if want near completely uniform memory access. NVLink2 with NUMA latencies and very much higher thermals.

From the sales marketing page linked above:

" ....

When combined with NVIDIA® NVLink®, NVIDIA NVSwitch™, PCI Gen4, NVIDIA® InfiniBand®, and the NVIDIA Magnum IO™ SDK, it’s possible to scale to thousands of A100 GPUs.
....
For the largest models with massive data tables like deep learning recommendation models (DLRM), A100 80GB reaches up to 1.3 TB of unified memory per node and delivers up to a 3X throughput increase over A100 40GB. ... "

Don't have to go to thousands of A100 ( and many millions in hardware cost). Just do something similar to the "x4" that Apple is doing with the "Max-like" die. 3*80GB --> 240GB. That is in the triple digit GB range. That is still less than Apple (and not as uniform access as Apple), but out of the "very far away" zone, and still possible in a single box. Nvidia and AMD has a solution for the very large pool problems. They sacrifice some on uniform memory access to get there. Apple sacrifices local memory bandwidth speed to get their higher capacity and more uniform number. There is no "free lunch" here. Apple's trade-off probably lead to lower costs as they get to leverage other parts of Mac line up to pay for the SoC R&D. ( not that the coming Mac Pro is going to have much lower prices, but that it would shoot up even higher if Apple completely forked the SoC R&D for a super small niche product. )

And if want to throw standard format PCI cards out the window and throw lots more pricing at the solution.... then not really any significant VRAM capacity gap.

".... train large models using a fully GPU-optimized software stack and up to 320 gigabytes (GB) of GPU memory. .."
https://www.nvidia.com/en-us/data-center/dgx-station-a100/

[ Can throw another 512GB of DDR ram in there also. The actual display VRAM is much smaller though. ]

But same point, Apple isn't way out of the large VRAM game either ( if fill in some software gaps). They won't have a DGX-station 'killer', but it would be useful for a substantive number of workloads. Apple will like to look "thin" standing next to the "plus , plus , plus sized " pricing on the DGX-station though.

h9826790 said:
TBH, I highly double if Apple will release anything for the 8,1 can compare to that.

Probably not. Apple is more GUI focused solution. It is more a extremely large 3D model focus than a AI/ML large training model focus. If can't render to a GUI display out of it then Apple won't go there. So far their Unified memory access has also implied uniform access. Apple is avoiding NUMA (pretty good chance display controllers running off of NUMA memory will run into real-time output issues. Even more so as frame rates go up). That puts a limit on scale, but there is more can do within a pragmatically NUMA-free zone with modern tech.

But multiple dies very tightly coupled together with an interconnect. Yes. They probably will. But to a much lower power consuming and very much lower latency and higher bandwidth interconnect. It wont' be a "long distance" interconnect; just a couple of inches/cm .

deconstruct60 · Jul 15, 2022

Amethyst said:
I have some info about next Mac Pro?? chips
- Total 40 cores, contains 32 P-Core and 8 E-Core.
- Total 128 GPU Core!!
- A sample board contains PCI-E slot but no ram slot (Doesn't know it exists on Production Mac Pro)

If there are empty "headers" on the board where slots connectors would attach to then perhaps something coming later, but more than decent chance at this point they are not sending out "test mule" boards at this point. Probably validation units, unless really far away from a product (which would be strange given their self imposed target for "around 2022") .

Amethyst said:
- Try to put 6900XT on that slot, its not working at all.

The card does not fit ( tried to put it into the slot and does not work). Or fitting into slot and connected to internal AUX power ( multiple x8 and x6 pin provisioned ) and just doesn't "boot"/"initialize" ? If only one slot seems doubtful that there would be some huge AUX power facility around also. One slot would more likely target SSD , Networking , Audio/Video capture , etc. cards which wouldn't need huge power draws to work.

Amethyst said:
- Although it is in sample board, stability with macOS is great!!

It wouldn't make lots of sense to send something that was super flakey to outside users. If targeting PCI-e cards that have had macOS "M1" support for about a year now via PCIe-via-thunderbolt support those would not be flakey either. There are over a dozen of those cards that Apple has been ramping support for all along this transition.

deconstruct60 · Jul 15, 2022

exoticSpice said:
Having only one PCIe slot is strange. At least one is better than none like the 6,1. Is the sample board finalised?

Having one, and only one, internal storage drive was a "not good" issue that Apple highlighted in their 2017 discussions of the shortcomings of the 6,1. So one slot means they would not be in the same place. Also in the non GPU card myopia context, not leaning on Thunderbolt for A/V capture/output cards solutions. etc. is also probably in the "let's not repeat that again".

A bit less strange if that is just one 'standard' PCIe slot ( and there was another M.2 socket(s) on the board for 3rd party SSDs. ) and also something similar to the two SATA + USB (dongle) internal connector from the 7,1.

A limited number of slots is not strange if they are boat anchored to the "Max" class die as the basic foundational building block. The laptop optimized dies don't provision lots of wide PCIe lanes so won't be lots if primarily just lash four of those together either.

exoticSpice said:
Yeah we all expected no AMD support, no DIMM slots(This makes sense as the bandwitdh would be insane on this Mac Pro) and the inclusion of PCIe slots.

DIMMs slots discussions on this Mac Pro forum over 2006-2013 era largely have been about capacity and the costs to get to buying capacity than about bandwidth. The bandwidth isn't "insane" given this is primarily the VRAM for a 128 core GPU (and the base level M-series starts out at 8 GPU cores).

exoticSpice said:
Now about the 40 core CPU and 128 core GPU that sounds outdated cause the current info is 48 cores (32 P + 16 E ) and there has be more GPU cores than 128 because M2 gen increases GPU cores and Apple said no more new M1 models is coming out.

Apple could easily bin down the E cores and a few GPU cores in the large package SoC they sent out to control thermals a bit more. Probably not a good reason to push the thermal system to the extreme with an external beta tester on pre-production equipment. Even 128 GPU cores is huge leap over any other Mac they would have access to test their software on.

( even more so if this "quad max" silicon is on TSMC N3 which is still in the 'at risk' production stage. In that case, they are probably not trying to cherry pick the best die ever out of the pile. Just get a good (target level clocks) and complete enough dies packaged up to ship to beta testers. )

Apple doesn't need to send out the most expensive box possible to get proper beta testing done. ( RAM capacity not mentioned but it would be non surprising if that also was not super maxed configuration. )

deconstruct60 · Jul 15, 2022

mattspace said:
Why not? I would say it's FAR more likely that AMD & Nvidia can just quadruple, or octuple their VRAM, than it is for Apple to produce a GPU that is faster, more powerful, and price-competitive at GPU tasks than Nvidia or AMD.

With the same low latency , uniform access times? Probably not. That quadruple would more likely need to come with a RAM die density increase and Apple will get access to those about just as fast as AMD and Nvidia will. ( neither one of those are making/designing RAM dies. They are buying basic building blocks from same sources. Apple has some semi-custom RAM packages but the basic blocks in the packages are not custom. )

Scale up , non uniform access (NUMA) VRAM isn't really in the "likely" category. It is already there with NVLink and less so for the mega-scale range for Infinity Fabric ( but fine over 3-4 cards: MI250 128GB . 3 * 128 --> 384GB ).

mattspace said:
Also, putting that much data onto the GPU in the first place before it can be worked on - is that going to be more efficient than just streaming it from the system ram via direct memory access, especially if you end up needing more than whatever arbitrary limit Apple has?

Depend upon how you load it and toss aside asynchronous requests .

https://developer.apple.com/videos/play/wwdc2022/10104/

( Windows has a GPU Storage loading I/O interface, AMD , Nvidia , etc. it is being done. )

mattspace said:
We've seen this before - Apple has a cultural problem of choosing the wrong tech path, or rather choosing a tech path because it's the one they have to sell, not because it's the best solution for the task, and then wasting years pounding the square peg into the round hole. If you look at tablets, for example, does the modern iPad more closely resemble the original iPad (a big-screen iPhone), or the original Surface (a tablet computer with stylus, windowing, mouse, keyboard, external display support etc).

LOL at the tablet example. "Pounding square peg in round hole" hardly. The same CPU/GPU and RAM capacities Apple evolutionary leveraged on iPad also keep the iPhone at the top of the smartphone market for about a decade. There is no "square peg" there.

What Apple didn't do was the Microsoft thing of pushing the desktop paradigm onto the handheld before the handheld hardware was ready. (e.g., Windows Phone OS . )

The iPads don't need a mouse , keyboard to work any more than a Mac needs a Trackpad 2 to work. Those are just secondary accessories that are nice if you need them, but not required. It is only the dogma of having to map non legacy stuff on to new product categories that would make a stylus necessary.

Leaning on Thunderbolt too hard was a problem in the Mac Pro space (2013-2019) , but it worked much better for other parts of the Mac line up.

mattspace said:
Just imagine an Apple discrete GPU with two 'M2 Extreme' plus 256GB VRAM.

Click to expand...

I imagine it would suck as badly at graphics tasks and price / performance, compared to whatever Nvidia and AMD are offering at the time, as the current Apple Silicon offerings do. The only advantage it will have, is that Apple can just refuse to provide support for new AMD cards, the same way they refused post 10-series Nvidia cards, and then they can structure their cooked benchmarks however they like.

It would suck more so do to very similar problems that Intel is having switching from iGPU driver stack to a dGPU stack. Apple wouldn't have the graphics subsystem stack maturity and app optimizations for that stack they they are still building on for the iGPU driver stack that is still evolving.

More than several months after the M1 introduction vendors like Adobe and Blackmagic are still rolling out better optimized apps for Apple GPU cores. There were multiple segments at WWDC 2022 about how to get better optimizations done with Metal 3 and the new Xcode debugging/profiling tools. For a fairly large number of apps are probably not using an Ultra GPU core set optimally. Another quarter or two ( or three for folks like Adobe) will probably roll out some performance improvements.

Apple's GPU pricing is closer to AMD/NVidia's "Pro card" pricing. It isn't really baselined on commodity retail GPU card box sales .

Historically, there was hackery to make those commodity cards work on Macs. Toward the end of the Intel phase there was enough UEFI evolution on Mac side and on commodity card side that things go closer to off the shelf. But that really has not been the norm or a primary Apple objective.

deconstruct60 · Jul 15, 2022

- rob - said:
Given compute is arguably the most important aspect of high end workstations, dGPU is the only way to carry forward Mac Pro.

Discrete GPGPU doesn't necessarily have to be video output GPU. Several replies above the Nvidia DGX-station uses a complete different class of GPU to do display for the workstation than for the compute. The notion that this all has to be completely homogenous mix of GPUs doesn't have lots of weight.

If the Mac Pro has one x16 PCI-e v4 slot that could be connected to an PCI-e expansion box ( same thing happened with with 2009-2012 models for folks who had lots of cards). For scaled compute in one-two boxes under the desk that could work. It wouldn't be all of the addressable market but it would be some.

- rob - said:
I remain convinced that AfterBurner is not a one off product category, but an initial test toward a lasting product line of AS-designed task-focused discrete compute hardware.

ProRes video acceleration next being added to the A15 iPhone SoC points to exactly the opposite. (yes the Max/Pro SoCs were likely also done at the time, but shipped after the A15. ) Afterburner more so a prototype of something to be matured before incorporating it into the SoC die. ( make it work, then add it to a die ... where fixes are harder. )

ProRes was a "CPU processor offload" that got added back to the CPU die.

As Apple goes to TSMC N3 to N2 to 10A progression the transistor budget on the SoC die is just going to get bigger. Which likely means more compute in the same Package, if not on the same die.

Modern advances in 3D packages with cost and power effective layered interposers only reinforces that trend line. If have 80billion transistors on one die then 2-4 dies gets you 160-320 billion budgets.

The M1 Ultra can do more 8K streams than the Afterburner in part because the Afterburner is choked on x16 PCI-e v3. PCI-e slots are fixed in time on a normal system board. If Apple puts v4 on the baseline motherboard bus then 4-5 years from now there will be something much faster than v4. Getting modularity more so than lower latency, more bandwidth , and/or higher perf/watt. The intra-package bus in modern packages is way faster and more effective than PCI-e is. To go out through PCI-e means taking a hit on bandwidth and latency and power consumption.

PCI-e v5 and up is good enough for intermodule memory traffic that is on par with older links used for NUMA connections between CPU packages in the past. There will be more use cases where dGPU and CPU couple in a more unified manner with latency overhead. But single unit/node performance is going to see more packaging over time from multiple vendors; not just Apple.

- rob - said:
If Apple is to truly empower AR/VR creators and take a seat at AI/ML it has no choice but to go beyond the SoC to deliver these experiences.

AR/VR that doesn't render on an Apple headset does what for Apple? Not alot. There are folks who drift into AR being more "artificial reality" rather than "augmented reality". That is probably not where Apple is trying to go.
Some 3D creation tools are still being ported over to M-series and optimized for Apple GPUs. It isn't just Apple that is in play here, the software has to catch up also.

Given only Apple GPU set as an optimization target that will likely happen faster if there is a tractable set of targets to optimize for. ( a super broad set of widely diverging targets is one reason why there is huge barrier to entry in the mainstream GPU market. between game/app + driver + hardware permutations there is a never ending pile of tweaks to do. To do everything for everybody can be used as a barrier to entry. )

As far as AI/ML, on the inference side if this is "quad Max" die then there would be 64 NPU cores. Smoking a whole lot of something if try to position that as not taking a seat at the AI/ML inference table. It is. Even at 32 NPU cores it is past substantive. Similar issue here evolutionary wise with future process shrinks coming along, Apple isn't going to loose tons of ground on inference if allocate decent transistor budget increases for bigger caches , more function units , etc.

Is Apple after the mega-trainer business? Probably not. Cerbras is throwing whole wafers at the mega scale workloads. For reasonable sized models , Apple's four die set up is likely going to be decent enough to get some work done (between GPU cores , AMX 'cores' , and NPU cores as appropriately delegated.) . Apple is more limited on lack of software foundation to do training than on hardware. And training is more of a non-GUI endeavor. When it is racks of extremely highly coupled nodes in a data center, then really outside of Apple's wheelhouse.

filterdecay · Jul 15, 2022

im just happy we will get pci-e slots. aside from graphics, this gives pros a lot of choices.

mattspace · Jul 15, 2022

deconstruct60 said:
With the same low latency , uniform access times?

Which are at best in the margins for error in an overall task, and still produce less performance for the price than the card in a slot.

deconstruct60 said:
LOL at the tablet example. "Pounding square peg in round hole" hardly.

An iPad of today, is more like a historical Surface tablet, than it is like a historical iPad. The "big iPhone" was not a usage model anyone wanted - what everyone wanted all along, was the Surface's paradigm - a touchscreen tablet *desktop* OS capable computer, with filesystem, stylus, external storage, displays etc.

That's the hole in people's lives that needed filling, not an iPhone with a bigger screen that couldn't make phone calls. The proof of that, is the entire history of iPad OS becoming a (admittedly less-capable in some respects) Surface clone.

Microsoft's attempts to go the other way to make a lightweight limited device, have largely failed because no one wants that product. Companies and tech-bros might love the idea of them, but when looking at that infamous "hole in the lineup" Jobs presented, people don't want the scaled-up phone, they want the scaled-down computer.

Pretending that the iPad's development has been anything other than that, is ahistoric.

deconstruct60 said:
Apple's GPU pricing is closer to AMD/NVidia's "Pro card" pricing. It isn't really baselined on commodity retail GPU card box sales .

True, the upgrade pricing for stepping up one level in GPU for a Mac Studio is similar in price to a Radeon Pro w6800. If it had even remotely close to the performance of that card, that wouldn't be a problem, aside from the whole having to buy the whole computer again next year, so you're on a $6k annual hardware cycle rather than a $2k for cards, and getting significantly slower hardware.

deconstruct60 said:
Historically, there was hackery to make those commodity cards work on Macs. Toward the end of the Intel phase there was enough UEFI evolution on Mac side and on commodity card side that things go closer to off the shelf. But that really has not been the norm or a primary Apple objective.

That's a VERY revisionist reading of history, based on an artefact that 2012-2019 lacked new PCI slot machines, which would most likely have tracked the development of EFI with the broader PC industry.

Apple's objective under Tim is to make money. The products are a means to that end, nothing more. If making the products worse, makes more money, they will make the products worse.

kvic · Jul 15, 2022

h9826790 said:
For info, Nvidia has 80GB VRAM graphic card.

NVIDIA A100 GPUs Power the Modern Data Center

The fastest data center platform for AI and HPC.

www.nvidia.com

Of course, that still very far away from 384GB, however, that's a released product. And it allow up to 1.3 TB of unified memory per node.

TBH, I highly double if Apple will release anything for the 8,1 can compare to that.

Anybody knows what that '1.3TB of unified memory per node' mean by Nvidia?

I don't.

I recall 'unified memory' in Nvidia speak is from the perspective of CUDA programming semantics. That is from a CUDA program system memory (attached to CPUs) and VRAM (attached to GPUs) are sharing the same address space. Hence, a CUDA program doesn't need to differentiate or care where a variable (when declared as global) is placed.

The largest node on sale that I can find is from Supermicro: 2x Intel Xeon /w up to 8TB system memory and up to 10 A100 80G. That gives you 8+ 0.8 = 8.8TB 'unified memory'

So I don't quite get what Nvidia meant by "1.3TB unified memory per node"

For first generation of Apple dGPUs, in a tower case with two MPXs of a Duo version of 'M2 Extreme' GPU with 256GB VRAM each. It's quite a humble start.

And due to both technical and cost limitation, both Nvidia/AMD won't be able to catch up with Apple's lead in VRAM size. LOL.

Boil · Jul 15, 2022

kvic said:
Anybody knows what that '1.3TB of unified memory per node' mean by Nvidia?

I don't.

I recall 'unified memory' in Nvidia speak is from the perspective of CUDA programming semantics. That is from a CUDA program system memory (attached to CPUs) and VRAM (attached to GPUs) are sharing the same address space. Hence, a CUDA program doesn't need to differentiate or care where a variable (when declared as global) is placed.

The largest node on sale that I can find is from Supermicro: 2x Intel Xeon /w up to 8TB system memory and up to 10 A100 80G. That gives you 8+ 0.8 = 8.8TB 'unified memory'

So I don't quite get what Nvidia meant by "1.3TB unified memory per node"

For first generation of Apple dGPUs, in a tower case with two MPXs of a Duo version of 'M2 Extreme' GPU with 256GB VRAM each. It's quite a humble start.

And due to both technical and cost limitation, both Nvidia/AMD won't be able to catch up with Apple's lead in VRAM size. LOL.

80GB x 8 = 640GB ( this would be the SuperMicro system you mention...? )

80GB x 16 = 1.28TB ( this might be the "node" that Nvidia speaks of...? )

Where/how are you arriving at 8.8TB...?

kvic · Jul 15, 2022

Boil said:
80GB x 8 = 640GB ( this would be the SuperMicro system you mention...? )

80GB x 16 = 1.28TB ( this might be the "node" that Nvidia speaks of...? )

Where/how are you arriving at 8.8TB...?

From Nvidia own offerings, there is no chassis able to house 16 A100 80G. The largest is only 8 GPUs.

Boil · Jul 15, 2022

kvic said:
The largest node on sale that I can find is from Supermicro: 2x Intel Xeon /w up to 8TB system memory and up to 10 A100 80G. That gives you 8+ 0.8 = 8.8TB 'unified memory'

More questioning the highlighted above; how are you getting 8.8TB of "unified memory" from eight 80GB GPUs...?

kvic · Jul 15, 2022

Boil said:
More questioning the highlighted above; how are you getting 8.8TB of "unified memory" from eight 80GB GPUs...?

Re-read the paragraph/prelude again that you snipped.

To be exact it's 8TB + 80GB x 10 = 'unified memory'

h9826790 · Jul 15, 2022

kvic said:
Anybody knows what that '1.3TB of unified memory per node' mean by Nvidia?

I don't.

I recall 'unified memory' in Nvidia speak is from the perspective of CUDA programming semantics. That is from a CUDA program system memory (attached to CPUs) and VRAM (attached to GPUs) are sharing the same address space. Hence, a CUDA program doesn't need to differentiate or care where a variable (when declared as global) is placed.

The largest node on sale that I can find is from Supermicro: 2x Intel Xeon /w up to 8TB system memory and up to 10 A100 80G. That gives you 8+ 0.8 = 8.8TB 'unified memory'

So I don't quite get what Nvidia meant by "1.3TB unified memory per node"

For first generation of Apple dGPUs, in a tower case with two MPXs of a Duo version of 'M2 Extreme' GPU with 256GB VRAM each. It's quite a humble start.

And due to both technical and cost limitation, both Nvidia/AMD won't be able to catch up with Apple's lead in VRAM size. LOL.

TBH, no idea, from my understanding, Unified Memory makes a single memory address can be shared between CPU and GPU. It never says can utilise all available memory without further limitation.

In other words, Unified Memory creates a pool of managed memory that is shared between the CPU and GPU. But that doesn't mean able to put all CPU / GPU memory into the pool without restriction.

AFAIK, Unified Memory start from CUDA 6 with Kepler GPU, however, at that moment, GPU memory oversubscription not even available. And Unified Memory was actually limited by the size of VRAM. It's more like "allow CPU to access VRAM" than "combine both RAM and VRAM together".

Until CUDA 8 with Pascal, Nvidia add 49bit virtual memory addressing to the GPU, then Unified Memory size become system memory limiting. However, this is just the limitation of that 49bit virtual memory addressing. I have no idea if there is any other restriction (e.g. software limitation) which makes Unified Memory max at 1.3TB.

ZombiePhysicist · Jul 15, 2022

Amethyst said:
I have some info about next Mac Pro?? chips
- Total 40 cores, contains 32 P-Core and 8 E-Core.
- Total 128 GPU Core!!
- A sample board contains PCI-E slot but no ram slot (Doesn't know it exists on Production Mac Pro)
- Try to put 6900XT on that slot, its not working at all.
- Although it is in sample board, stability with macOS is great!!

If the release machine has only one slot, highly disappointing. DOA IMO.

ZombiePhysicist · Jul 15, 2022

kvic said:
It doesn't make sense to me Apple only provides ONE PCIe slot!

Based on your description in the OP, seems to be the SoC is on one PCB, and the PCIe slot on a separate PCB. So it's possible the production version will be equipped with a PCB version with a couple of PCIe slots.

Can you count the number of slot doors at the back of the tower case? But if the launch of the new Mac Pro is still a few months away, perhaps your friend is sent in a disguised computer case?

I'm praying the 'one slot' machine is more of a testing proof of concept board, and that they move to a real Mac Pro with multiple slots. If they ship a machine with one slot, it's DOA IMO.

kvic · Jul 15, 2022

ZombiePhysicist said:
I'm praying the 'one slot' machine is more of a testing proof of concept board, and that they move to a real Mac Pro with multiple slots. If they ship a machine with one slot, it's DOA IMO.

Exactly, if one PCIe slot on final version, it's DOA.

h9826790 said:
TBH, no idea, from my understanding, Unified Memory makes a single memory address can be shared between CPU and GPU. It never says can utilise all available memory without further limitation.

In other words, Unified Memory creates a pool of managed memory that is shared between the CPU and GPU. But that doesn't mean able to put all CPU / GPU memory into the pool without restriction.

AFAIK, Unified Memory start from CUDA 6 with Kepler GPU, however, at that moment, GPU memory oversubscription not even available. And Unified Memory was actually limited by the size of VRAM. It's more like "allow CPU to access VRAM" than "combine both RAM and VRAM together".

Until CUDA 8 with Pascal, Nvidia add 49bit virtual memory addressing to the GPU, then Unified Memory size become system memory limiting. However, this is just the limitation of that 49bit virtual memory addressing. I have no idea if there is any other restriction (e.g. software limitation) which makes Unified Memory max at 1.3TB.

Both you and @Boil gave me inspiration to think it through again, and I think I might reach a plausible explanation.

'Unified memory' in Nvidia speak is as good as double the total amount of VRAM of all GPUs combined. 8X A100 80GB gives 640GB. Nvidia's CUDA runtime can grab another 640GB from system memory for its use as 'unified memory'. Anything beyond 640GB (i.e. the total amount of GPU VRAM) is useless in 'unified memory' speak.

So in a node with the SMX version of the A100 80GB, we have

8 A100 80GB = 640GB
+ 640GB from system memory
= 1.3TB 'unified memory'

Now I found Supermicro offers a 10 GPUs chassis but it's PCIe version of the A100 80GB. Supermicro has a SMX version of the same chassis, that only houses max 8 GPUs.

There might be some physical limitation of the PCIe version or perhaps more likely PCIe version is quite a bit slower in terms of memory transfer (between CPU and GPU) that drives Nvidia not to brag about it (i.e. 1.6TB 'unified memory').

Boil · Jul 16, 2022

kvic said:
Re-read the paragraph/prelude again that you snipped.

To be exact it's 8TB + 80GB x 10 = 'unified memory'

My bad, I did not register that the 8TB portion of the "unified memory" was actually the system memory...

kvic said:
Exactly, if one PCIe slot on final version, it's DOA.

Both you and @Boil gave me inspiration to think it through again, and I think I might reach a plausible explanation.

'Unified memory' in Nvidia speak is as good as double the total amount of VRAM of all GPUs combined. 8X A100 80GB gives 640GB. Nvidia's CUDA runtime can grab another 640GB from system memory for its use as 'unified memory'. Anything beyond 640GB (i.e. the total amount of GPU VRAM) is useless in 'unified memory' speak.

So in a node with the SMX version of the A100 80GB, we have

8 A100 80GB = 640GB

+ 640GB from system memory

= 1.3TB 'unified memory'

Now I found Supermicro offers a 10 GPUs chassis but it's PCIe version of the A100 80GB. Supermicro has a SMX version of the same chassis, that only houses max 8 GPUs.

There might be some physical limitation of the PCIe version or perhaps more likely PCIe version is quite a bit slower in terms of memory transfer (between CPU and GPU) that drives Nvidia not to brag about it (i.e. 1.6TB 'unified memory').

Yeah, but it's not that "good, good" UMA like Apple is using, a true unified memory...?!? ;^p

kvic · Jul 16, 2022

Boil said:
Yeah, but it's not that "good, good" UMA like Apple is using, a true unified memory...?!? ;^p

I'm sure Nvidia will disagree. Apple's approach (which Apple didn't invent btw) is hard to scale up (and I would agree). Nvidia didn't and doesn't control CPU side, their approach (that's fascinating too) is best they could do so far and seems to be performing extremely well.

mattspace said:
I still remember when "integrated graphics" was called "vampire graphics" and Apple used to market about how good their systems were, because the graphics DIDN'T steal system memory. *lol*

Apple mostly sells mobile devices. iGPUs will serve them very well. Since computing power has advanced so much in the past decades, iGPUs are adequate for desktops like iMac and Trashcan/Mac Studio.

The idea of a powerful iGPU isn't new in very recent memory. Apple succeeded in bringing it to laptops and desktops. While AMD envisioned it more than a decade ago, it was too poor to make a splash on PC side. AMD's APU ('accelerated processor unit') was supposed to be what Apple silicon SoCs are today. They had everything on technical side to achieve it. But didn't have the money to drive the ambition. They did succeed in bringing the idea to their chips for game consoles.

kiiso · Jul 20, 2022

Guys let's not let this valuable thread sink!

Amethyst we are eager to hear evey bit of information you have about the new Mac Pro. When do you think you will visit your friend?

Amethyst said:
[...](i will seek an opportunities to visit his home (~1000km far from my home) in near future)

ps. it contains only 1 pci-e slot!!

iPadified · Jul 20, 2022

mattspace said:
I imagine it would suck as badly at graphics tasks and price / performance, compared to whatever Nvidia and AMD are offering at the time, as the current Apple Silicon offerings do. The only advantage it will have, is that Apple can just refuse to provide support for new AMD cards, the same way they refused post 10-series Nvidia cards, and then they can structure their cooked benchmarks however they like.

Apple can price their SoCs whatever they like. If they do a dGPU, even more so because they do not need to compete with their SoCs. They pay the same per cm2 wafer as anybody else (or less because of volume) so it is about market positions. VRAM is usually tied to more expensive options (no gaming cards). The 80 Gb A100 cost $14000 (you get 3.5 complete Ultra Mac Studio for that). IF Apple wants, they can be competitive in price. Not sure they will though. Very curious how Apple will solve the ASi Mac Pro and GPU to be competitive.

kvic · Jul 21, 2022

iPadified said:
Apple can price their SoCs whatever they like. If they do a dGPU, even more so because they do not need to compete with their SoCs.

For the first generation of Apple dGPUs, I bet they can simply re-use M2 Ultra and M2 Extreme SoCS but with CPU cluster and other non-essential on-chip features disabled. This happens under the hood and inside the packaging. Users probably will not notice they're shipped the same SoCs but with disabled functionalities.

Very curious how Apple will solve the ASi Mac Pro and GPU to be competitive.

Within the past year, I believe all likely system architectures are mentioned in this sub-forum:

1. SoCs on a daughter board. Support multiple such daughter boards as a mean of expansion in CPU, GPU capabilities and memory capacity. Many downsides to this approach for a tower desktop. OP's leak basically eliminates this scenario.

2. BIGGER Mac Studios + a few PCIe slots. With OP's limited info, this is not ruled out. This leaves the option open in future to support Apple and 3rd party dGPUs through PCIe slots. A slight variation of this scenario is to include two MPX slots. Personally I think this approach is conservative and very 'non-Apple' in Apple silicon era. The new era should bring something 'unique' and not akin to commercially off the shelf stuff on PC side.

3.
My latest thought on the previous page. One of the defining features is an Apple dGPU bus. Apple has done quite a lot of proprietary connectivities in 80s & 90s. Apple silicon can't be a better time to bring back some of their old practices. But as Tim Cook said, Apple only do stuff in house if they can't get better parts from outside sources. dGPU and its connectivity look like right candidates falling into such mentality.

I expect a proprietary Apple dGPU bus will bring faster memory transfer between: CPU & dGPU, dGPU & dGPU, dGPU & iGPU. It won't be as fast as access to local memory but MUCH faster and more efficient than, say, through PCIe. Also from the perspective of CPUs and GPUs, all VRAM on the dGPUs will be unified with system memory in a single address space.

Anyway, be prepared Apple will surprise people with their wildest thoughts.

singhs.apps · Jul 22, 2022

So what happened to the Xeon unit OP was talking about in another thread ? Will Apple release two Mac pros ?
The 40 core AS extreme with just one PCIe slot is..underwhelming.

Or were they planning to sell Mac studio as a Mac Pro but realised it just isn’t powerful enough?

Will they release an Apple dGPU mpx module for this 40 core Mac Pro ?

MP All Models 40 core ASI with PCI-E slot is around the corner!!

Suspended

macrumors 68040

macrumors P6

macrumors G5

macrumors G5

macrumors G5

macrumors G5

macrumors G5

macrumors regular

macrumors 68040

macrumors 6502a

macrumors 68040

macrumors 6502a

macrumors 68040

macrumors 6502a

macrumors P6

Suspended

Suspended

macrumors 6502a

macrumors 68040

macrumors 6502a

macrumors member

macrumors 68020

macrumors 6502a

macrumors 6502a

Our Staff