How can the Apple Silicon Mac Pro compete with PC Workstations?

Xenobius · Sep 20, 2022

Mac: M3 - *Hardware accelerated RT (Part 1)

All Apple needs to do is make the GPU in the Mac Pro 12x faster than the GPU in the M1 Ultra to compete with RTX 4090. Piece of cake. … and more seriously - I think we can expect the Radeon RX7000 in the Mac Pro.

blenderartists.org

zoomp · Sep 20, 2022

Stick a Radeon GPU in it.

Kimmo · Sep 20, 2022

This is going to be interesting.

jav6454 · Sep 20, 2022

How? Well, if Apple can manage to build a Pro level dGPU into the M2, I'll be seriously impressed.

theorist9 · Sep 20, 2022

There's already an active thread discussing that here:

https://forums.macrumors.com/threads/am-i-the-only-one-being-skeptical-about-a-new-mac-pro-with-m2-extreme.2358425/

leman · Sep 20, 2022

If Apple wants to build a large GPU, there is nothing stopping them. The main problem is the low volume, large die size and therefore high cost of such a product. Then again, I dint expect the 4090 to be a high volume product either. I wouldn’t be surprised if it will mostly stay a marketing gimmick by Nvidia.

Xiao_Xi · Sep 20, 2022

Better software optimizations and more hardware accelerators. In the case of Blender, Apple can add hardware-based ray tracing and Metal RT support.

senttoschool · Sep 20, 2022

The problem is mostly software optimization. No productivity software has been optimized for Metal. There was never really a reason to.

Boil · Sep 20, 2022

leman said:
If Apple wants to build a large GPU, there is nothing stopping them. The main problem is the low volume, large die size and therefore high cost of such a product. Then again, I dint expect the 4090 to be a high volume product either. I wouldn’t be surprised if it will mostly stay a marketing gimmick by Nvidia.

Maybe the ASi Mac Pro has "mixed SoCs"; two "regular" M2 Max SoCs and two "GPU-specific" M2 SoCs...

M2 Max SoC - 12-core CPU (8P/4E) / 40-core GPU
M2 GPU SoC - 60-core GPU

M2 Extreme SoC - 24-core CPU (16P/8E) / 200-core GPU

Xiao_Xi said:
Better software optimizations and more hardware accelerators. In the case of Blender, Apple can add hardware-based ray tracing and Metal RT support.

Highly interested to see what a Full Metal Blender can do running on the new ASi Mac Pro...!

senttoschool said:
The problem is mostly software optimization. No productivity software has been optimized for Metal. There was never really a reason to.

LOL, I just posted about this over here...

If software is developed to take advantage of the way ASi graphics work (Metal/GPU cores/Neural Engine cores/eventual ray-tracing cores/UMA/etc.), much like software is currently tailored to Nvidia hardware, then who knows what kind of performance comparisons we might see...?

altaic · Sep 20, 2022

Xiao_Xi said:
Better software optimizations and more hardware accelerators. In the case of Blender, Apple can add hardware-based ray tracing and Metal RT support.

MetalRT support has already been added to Blender. It’s an experimental option, and currently only for Apple Silicon. AMD support requires some more work, but it seems like AMD support in general is also a priority. There are a lot of tea leaves to read over at the Blender dev site.

BTW, Blender’s Metal Cycles renderer has seen a 33% speed up (according to the Blender benchmark, presumably with MetalRT disabled) from 3.2.0 to 3.3.0. The Apple/Blender engineers have made some great progress, and are continuing to do so. It’ll be exciting to see what’s ready by October 🙂

senttoschool · Sep 20, 2022

altaic said:
BTW, Blender’s Metal Cycles renderer has seen a 33% speed up (according to the Blender benchmark, presumably with MetalRT disabled) from 3.2.0 to 3.3.0. The Apple/Blender engineers have made some great progress, and are continuing to do so. I’m excited to see what’s ready by October 🙂

It's a good start: https://www.macrumors.com/2021/10/14/apple-joins-blender-development-fund/

Apple needs to have an army of open source developers contributing to popular open source projects in order to optimize software for Apple Silicon.

Just simply making amazing hardware isn't enough. The world has been optimizing for x86 and Nvidia GPUs for decades.

Xiao_Xi · Sep 20, 2022

senttoschool said:
The problem is mostly software optimization.

Apple can now increase the FPS in some scenes in Blender's viewport by up to 2x with a patch.

Blender Archive - developer.blender.org

developer.blender.org

galad · Sep 20, 2022

Apple isn't the only one contributing arm64 optimizations to open-source softwares. Amazon is making a lot of contribution to speed up things on their Graviton cpu.

altaic · Sep 20, 2022

Xiao_Xi said:
Apple can now increase the FPS in some scenes in Blender's viewport by up to 2x with a patch.

Blender Archive - developer.blender.org

developer.blender.org

Very cool, I hadn’t seen that. I was just about to amend my post with my particular excitement about Eevee’s potential since real-time rendering performance is obviously interesting (and… controversial) around here.

Xiao_Xi · Sep 20, 2022

Boil said:
Highly interested to see what a Full Metal Blender can do running on the new ASi Mac Pro...!

Some people on the Blender forums believe that Apple could achieve real-time ray tracing in the Blender viewport.

leman · Sep 21, 2022

Boil said:
Maybe the ASi Mac Pro has "mixed SoCs"; two "regular" M2 Max SoCs and two "GPU-specific" M2 SoCs...

M2 Max SoC - 12-core CPU (8P/4E) / 40-core GPU
M2 GPU SoC - 60-core GPU

M2 Extreme SoC - 24-core CPU (16P/8E) / 200-core GPU

They probably will need to leverage some form of asymmetric multi-chip technology (instead of symmetric one they use today in the Ultra). Just stacking max dies together will be too expensive and won’t properly address user needs. But let’s see what they will come up with. Maybe one part of the solution will be significantly increasing the frequency at the expense of efficiency. They still have a lot of headroom. The Studio will probably easily dissipate 250-300W.

senttoschool · Sep 21, 2022

leman said:
They probably will need to leverage some form of asymmetric multi-chip technology (instead of symmetric one they use today in the Ultra). Just stacking max dies together will be too expensive and won’t properly address user needs. But let’s see what they will come up with. Maybe one part of the solution will be significantly increasing the frequency at the expense of efficiency. They still have a lot of headroom. The Studio will probably easily dissipate 250-300W.

Creating multiple huge custom SoCs for the smallest Mac market make zero sense for Apple. Part of stacking Max dies together is to reduce R&D cost for Mac Pro-level SoCs.

Unless... of course... Apple decides to start Apple Silicon Cloud service: https://forums.macrumors.com/thread...t-a-40-core-soc-for-mac-pro-now-what.2306486/

Creating a cloud service would expand the market for big Apple Silicon SoCs beyond the Mac Pro.

Boil · Sep 21, 2022

If Apple could pull off an 8-way UltraFusion...

Future Mn workstation SoC:

Eight SoCs (four CPU & GPU/four GPU) total
64-core CPU (48P/16E)
480-core GPU
128-core Neural Engine
2TB LPDDR5X SDRAM
4TB/s UMA bandwidth

quarkysg · Sep 21, 2022

Boil said:
If Apple could pull off an 8-way UltraFusion...

8 way cross-bar switch ... wow ... probably more complicated that the SoC.

leman · Sep 21, 2022

senttoschool said:
Creating multiple huge custom SoCs for the smallest Mac market make zero sense for Apple. Part of stacking Max dies together is to reduce R&D cost for Mac Pro-level SoCs.

Exactly. That’s why I don’t see any other way for them than making smaller chips, e.g. one containing 8x CPU clusters and one containing 16x GPU clusters, and developing some technology that allows them to link those together on a single package.

quarkysg said:
8 way cross-bar switch ... wow ... probably more complicated that the SoC.

Does it have to be a crossbar switch? I have no idea what modern systems use… like what does Nvlink or Apples Ultra Fusion use?

quarkysg · Sep 21, 2022

leman said:
Does it have to be a crossbar switch? I have no idea what modern systems use… like what does Nvlink or Apples Ultra Fusion use?

I would think so. Linking the SoC dies serially would kill performance or cause corruption as updates on one SoC die's cache needs to be propagated as fast as possible to all other SoC dies.

I would think Apple's UltraFusion basically is a very fast bus to broadcast one die's memory access to the other to keep their caches in sync.

theorist9 · Sep 21, 2022

leman said:
If Apple wants to build a large GPU, there is nothing stopping them. The main problem is the low volume, large die size and therefore high cost of such a product. Then again, I dint expect the 4090 to be a high volume product either. I wouldn’t be surprised if it will mostly stay a marketing gimmick by Nvidia.

This made me curious about the GPU area of a hypothetical M2 Extreme (2 x M2 Ultra) vs. the 4090.

I've read the GPU takes up 30 mm^2 on an M2. The M1 Ultra has ~5 x as many GPU cores as the M1 (48 vs. 10), so if the same ratio applies to the M2, then an M2 Extreme's GPU should take up ~300 mm^2. By comparison, the 4090 is reportedly 608 mm^2 (with 76.3B transistors). That's huge–the entire M1 Max chip is 432 mm^2 (with 57B transistors).

Both the M2 and 4090 are built on a TSMC enhanced 5nm process.

So Apple could create a MacPro Extreme where the ratio of GPU:CPU cores is double that on their current architecture, which would give ~600 mm^2 for the GPU. I don't know if they'll invest the resources to do this.

leman · Sep 21, 2022

quarkysg said:
I would think so. Linking the SoC dies serially would kill performance or cause corruption as updates on one SoC die's cache needs to be propagated as fast as possible to all other SoC dies.

But others are doing it somehow, right? There are plenty chips on the market that combine hundreds of CPU or GPU cores.

My naive imagination pictures some sort of network on a chip solution where chips can be connected together like legos to make a single entity, with additive cache capacity etc. No idea how feasible something like that is. But Intel is supposed to work on this kind of technology if I understand it correctly.

theorist9 said:
I've read the GPU takes up 30 mm^2 on an M2. The M1 Ultra has ~5 x as many GPU cores as the M1 (48 vs. 10), so if the same ratio applies to the M2, then an M2 Extreme's GPU should take up ~300 mm^2. By comparison, the 4090 is reportedly 608 mm^2 (with 76.3B transistors). That's huge–the entire M1 Max chip is 432 mm^2 (with 57B transistors).

Exactly. Imagine that Apple makes a separate GPU chiplet that comes with 32 cores and then puts a bunch of those together, wouldn’t that be a scalable solution?

theorist9 · Sep 21, 2022

leman said:
Exactly. Imagine that Apple makes a separate GPU chiplet that comes with 32 cores and then puts a bunch of those together, wouldn’t that be a scalable solution?

That would be scalable, but I thought the performance of their integrated CPU-GPU architecture relied on having both the CPU and GPU on the same die. Thus I imagined that, if they wanted to double up on the number of GPU cores in the Mac Pro, they would need to construct the "Extreme" chip from four "M2 Max Pro" subunits instead of four M2 Max's, where each M2 Max Pro was an expanded version of the M2 Max that contained double the number of GPU cores.

I.e., if the M2 Max has X CPU cores and Y GPU cores, then the "M2 Max Pro" would have X CPU Cores and 2Y GPU cores.

I estimate that would give them ~120 TFLOPs, as compared with ~80 TFLOPs for a 4090 and 90–100 TFLOPs for a 4090Ti, i.e., half-way between a single 4090 and dual 4090's for general GPU compute performace (we'll probably need to wait for M3, which will likely be on 3 nm, to get hardware RT).

But creating this new design just for the Mac Pro seems resource-intensive, so I don't know if they'd do that.

leman · Sep 21, 2022

theorist9 said:
That would be scalable, but I thought the performance of their integrated CPU-GPU architecture relied on having both the CPU and GPU on the same die.

Again, my perspective on this might be naive, but I don’s see a principal difference between same die and same package if the dies can be connected in this way. The topology should be the same in either case. I can imagine that a on-die solution might be more energy efficient and of course have slightly lower latency but those things will matter less on a desktop.

theorist9 said:
But creating this new design just for the Mac Pro seems resource-intensive, so I don't know if they'd do that.

One problem with Apples current design is that you can’t just take an individual component, there is ton of stuff that comes attached to it. People who could use a 128-core GPU likely don’t need a 32-core CPU etc. Apple will need to devise a more flexible scalable solution if they want to offer comprehensive options in that market in a way that’s commercially viable.

How can the Apple Silicon Mac Pro compete with PC Workstations?

macrumors regular

macrumors regular

macrumors 6502

macrumors Core

macrumors 68040

macrumors Core

macrumors 68000

macrumors 68030

macrumors 68040

macrumors 6502a

macrumors 68030

macrumors 68000

macrumors 6502a

macrumors 6502a

macrumors 68000

macrumors Core

macrumors 68030

macrumors 68040

macrumors 65816

macrumors Core

macrumors 65816

macrumors 68040

macrumors Core

macrumors 68040

macrumors Core

Our Staff