Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

bcortens

macrumors 65816
Aug 16, 2007
1,324
1,796
Canada
So you think the apple silicon Mac Pro won’t support PCIe graphics cards? I think that is a massive point of the Mac Pro. We’ll see I guess
I think it depends, you might be able to have software that explicitly dispatches work to other accelerators but I doubt that the main GPU (display driving) will be supported on a PCIe card.
 

killawat

macrumors 68000
Sep 11, 2014
1,961
3,609
Do people think eGPUs will ever be supported with apple silicon machines?
Nope. That ship pretty much sailed after Apple went a full year and released MacBook Pros M1 without eGPU support. I've actually gotten rid of my eGPU setup and repurposed it while GPU prices were through the roof and my system uptime has increased considerably. On Intel Macs there are incredible gains to be had. On M1 obviously the situation is a bit different. Apple is still providing AMD drivers but we will see what happens once the Mac Pro is released. I strongly suspect that it won't work with any GPU, PCIe or TB.
 

diamond.g

macrumors G4
Mar 20, 2007
11,437
2,666
OBX
So you think the apple silicon Mac Pro won’t support PCIe graphics cards? I think that is a massive point of the Mac Pro. We’ll see I guess
Pretty sure it won't support them. That still leaves audio, storage, and video capture cards.
 

l0stl0rd

macrumors 6502
Jul 25, 2009
483
420
Nope. That ship pretty much sailed after Apple went a full year and released MacBook Pros M1 without eGPU support. I've actually gotten rid of my eGPU setup and repurposed it while GPU prices were through the roof and my system uptime has increased considerably. On Intel Macs there are incredible gains to be had. On M1 obviously the situation is a bit different. Apple is still providing AMD drivers but we will see what happens once the Mac Pro is released. I strongly suspect that it won't work with any GPU, PCIe or TB.
I am not sure what to expect from the Pro but if it has no expansion slots then we are nearly back to the trashcan days. Oh wait they kind of made that it is called the Mac Studio.
 
  • Like
Reactions: shuto

killawat

macrumors 68000
Sep 11, 2014
1,961
3,609
I am not sure what to expect from the Pro but if it has no expansion slots then we are nearly back to the trashcan days. Oh wait they kind of made that it is called the Mac Studio.
LOL Oh for sure. I'd imagine they have PCIe for the great number of expansion cards that work in macOS (storage, network, AV). Not including GPU seems like an oddity but its the current situation with all M1 Macs. I keep my eGPU enclosure around but not sure what to stuff it with. Could use more USB ports..maybe a nice sound card?
 
  • Wow
Reactions: l0stl0rd

l0stl0rd

macrumors 6502
Jul 25, 2009
483
420
LOL Oh for sure. I'd imagine they have PCIe for the great number of expansion cards that work in macOS (storage, network, AV). Not including GPU seems like an oddity but its the current situation with all M1 Macs. I keep my eGPU enclosure around but not sure what to stuff it with. Could use more USB ports..maybe a nice sound card?
Yeah I still got mine too, it is collecting dust.
With the lackluster performance however that they show I would still love to the eGPUs back.
They promised in GPU perf have really been a letdown. Can’t even mach 6900 XT from AMD.
I had a Studio on order an canceled it because I am not confident.
Should we see the Pro at WWDC and it is about 10k then I might as well go back to PC.
Perhaps I can be patient and see if the M2 brings ray tracing cores will see.
 

diamond.g

macrumors G4
Mar 20, 2007
11,437
2,666
OBX
Yeah I still got mine too, it is collecting dust.
With the lackluster performance however that they show I would still love to the eGPUs back.
They promised in GPU perf have really been a letdown. Can’t even mach 6900 XT from AMD.
I had a Studio on order an canceled it because I am not confident.
Should we see the Pro at WWDC and it is about 10k then I might as well go back to PC.
Perhaps I can be patient and see if the M2 brings ray tracing cores will see.
I wouldn't be surprised if the AS MP has a bunch of x8 slots, since they wouldn't need x16 slots.
 

l0stl0rd

macrumors 6502
Jul 25, 2009
483
420
Actually while M1 was exiting at first now that we see desktops this reflects quite nicely how I am feeling about Apple at the moment.
And not just about the storage modules but some other remarks he makes.

 

terminator-jq

macrumors 6502a
Nov 25, 2012
720
1,516
The new Mac Pro is probably the most mysterious of the Apple Silicon devices. After we got the M1, it was pretty easy to guess what was coming for the M1 Pro and Max even before the leaks. I think predictions for the M1 Ultra started within days after the M1 Max was announced (even the Ultra name).

For the Mac Pro it’s hard to know what to expect. Apple themselves said the M1 Ultra was the last M1 family chip. A few leakers have said the Mac Pro won’t use an M1 nor M2 based chip… This leads me to believe we could see an “X1” chip or something that takes cues from the M1 and uses some of the coming M2 improvements but will be it’s own chip line.

Here’s some of my predictions based on Apples most recent Mac launches:

1. CPU wise the M1 chips have been phenomenal! However Intel has caught up in both single core and multi core performance. I’m expecting this new chip to put Apple back in the lead in both categories.

2. GPU wise, the M1 family has been good but not great. They compare well to AMD cards but Nvidia has kept its leading position. Especially for 3D. I’m expecting Apple to make significant improvements on the GPU side. Perhaps we see Apples first implementation of hardware ray tracing.

3. When it comes to modularity I think it’s fair to assume that the RAM and SSD will be user upgradable. CPU and GPU will most likely be locked but I do wonder if Apple could introduce a sort of boost kit similar to the afterburner card.

One thing is for sure, this Mac Pro chip will be the first Mac Apple Silicon chip that can be designed with Mac wattage in mind. The M1, M1 Pro and M1 Max (and M1 Ultra which is just 2X Max) have all needed to be efficient enough to run inside of a laptop. Let’s see what the Apple engineers can do when they don’t have to worry about power efficiency as much!
 

mi7chy

macrumors G4
Oct 24, 2014
10,625
11,298
It is most likely a power demand that couldn’t be delivered by the hardware

Wonder if the updated PSU is a possible fix. I'd bet on the Delta PSU (better reputation for quality) on the right with more caps vs the Lite-On (known for cheaper quality) on the left.

1649804406912.png
 
  • Like
Reactions: l0stl0rd

Lone Deranger

macrumors 68000
Apr 23, 2006
1,900
2,145
Tokyo, Japan
Here's the full text I pulled out of the Twitter feed. It's a bit of a read, but interesting none the less.

Problem: Apple shows that the M1 Ultra GPU can use up to 105W of power. However, the highest we could ever get it to reach was around 86W. No, the Mac Studio cooling wasn't a problem because the GPU stayed cool, around 55-58°C compared to in the past when Apple allowed 100°C. This makes it pretty clear that the Mac Studio cooling system is OVERKILL in most apps, which means that there was a disconnect between Apple's Mac Studio cooling system engineers and the M1 Ultra chip designers/engineers. Something has gone terribly wrong in terms of chip perf.

Culprit: Each cluster of GPU cores within an M1/M1 Pro/M1 Max/M1 Ultra chip comes with a 32MB TLB or Transaction Lookaside Buffer, which is a memory cache that stores the recent translations of virtual memory to physical memory, used to reduce user memory location access time.

Hishnash: "If an application has not been optimized for the M1 GPU architecture's tile memory, (not just Metal optimized) then every read/write needs to go all the way out to system memory. If the GPU compute task is issuing MANY little reads, then this will saturate the TLB. The issue is if GPU data hits the TLB and the page table being read/written to is not loaded, then that entire thread group on the GPU needs to pause while the page table is loaded into the TLB. If your application is using MANY reads/writes per second, this results in a lot of STALLED GPU thread groups. Unlike a CPU, when a GPU is waiting for data, it can't just switch to work on something else. So the GPU sits there and waits for the TLB buffer to clear in order to get more work to process."

This is why we only saw 86W peak GPU usage in an app that was considered to be decently optimized. However, for apps that CLAIM to support Apple Silicon support but have NOT been rewritten to take advantage of Apple's TBDR tile memory system, they will be severely limited by the 32MB TLB if there are many reads/writes.
The problem is that ALMOST ALL apps out there haven't been optimized for Apple's TBDR tile memory system. Many software developers simply get it to work using the traditional TBIR model and call it good to go, being unaware of the 32MB TLB limitation that bottlenecks performance.

Hishnash: "What apps should be doing is loading as much data as possible into the tile mem and flushing it out in large chunks when needed. I bet a lot of the writes (over 95%) are for temporary values that could've been stored in tile mem and never needed to be written at all."

Hishnash: "I expect that the people building the M1 family of chips didn't expect applications to be running on it that are not TBDR optimized. So they thought 32MB would be enough."

WRONG. Most apps aren't optimized for Tile-mem, even if they claim it supports Apple Silicon.

Keep in mind that between the time when Apple started engineering the M1 family 5-7 years ago, reliance on GPU performance has skyrocketed, so the chip designers probably didn't think there would be so many reads/writes to the 32MB TLB.

What does this mean? The M1 family of chips, including the M1 Ultra, has a major limitation that can't be fixed unless apps are properly optimized. Here's the problem.

Hishnash: "The effort needed to optimize for tile memory is MASSIVE. It requires going all the way back to the drawing board, re-considering everything, like the concept that there is a local on-die memory pool you can read/write from with very very low perf impact is unthinkable in the current desktop GPU space. It’s a matter of a complete rewrite at a concept/algorithmic level."

Why is this such a big problem for M1 Ultra?
With the M1 and M1 Pro chips, there wasn't enough GPU performance to hit that 32MB TLB limit. However, the M1 Max is where you see GPU scaling fall off a cliff due to the TLB, especially the 32-core GPU model.
This problem scales linearly, so if, for example, 26 cores is the sweet spot for the M1 Max, with the rest of the 6 cores being bottlenecked by the TLB, the M1 Ultra will be bottlenecked by 12 GPU cores because it features two 32-core M1 Max dies. No wonder it scales poorly.

The solution from hishnash: "Increasing the TLB will help a lot for applications that are not optimized. This is important because many apps will NEVER be optimized, and even fewer games."
This is why gaming performance is so poor on M1 Ultra, apart from the Rosetta bottleneck.

Hishnash: "For game engines that are not TBDR aware/optimized, they might be currently bottlenecked on reads.. and depending on the post-processing effects, might have some large bottlenecks on writes if they're not using tile memory and tile compute shaders where possible."

The reason World of Warcraft runs so well and compares well to the RTX 3080/3090 is because APPLE helped them optimize the game PROPERLY to take advantage of the new TBDR tile-based architecture. (WoW Metal Update Released one week after M1 event proves they got help from Apple.)

The solution from our source: Future M-chip families (Hopefully and probably M2) will see a big increase in the TLB to solve this problem since developers are likely to be slow in optimizing apps. Apple will likely release white papers at WWDC on how to optimize apps properly.

This means that the M2 Ultra will see a HUGE boost in GPU performance over the M1 Ultra if the 32TLB bottleneck is removed. And that performance boost will be on top of higher clock speeds and potentially higher GPU core counts.

The only hope for the M1 Ultra is that developers finally decide to completely rethink and rewrite their apps to support the TBDR tile-based memory architecture. (Good luck)
Oh, and by the way, expect hardware ray-tracing support on future M-chip families. (Hopefully M2 Pro+)
 

killawat

macrumors 68000
Sep 11, 2014
1,961
3,609
The only hope for the M1 Ultra is that developers finally decide to completely rethink and rewrite their apps to support the TBDR tile-based memory architecture. (Good luck)
Oh, and by the way, expect hardware ray-tracing support on future M-chip families. (Hopefully M2 Pro+)
This is absolutely fascinating.
 
  • Like
Reactions: Irishman

Boil

macrumors 68040
Oct 23, 2018
3,478
3,174
Stargate Command
Maybe those wanting an ASi Mac for GPU intensive work should wait and see how future M2/M3 Max/Ultra SoCs handle this issue...?

Which could create even more demand for Mn Pro SoCs...?!?
 
  • Like
Reactions: Irishman

Lone Deranger

macrumors 68000
Apr 23, 2006
1,900
2,145
Tokyo, Japan
It is somewhat of a cold shower isn't it? But there was also this tidbit:

The solution from our source: Future M-chip families (Hopefully and probably M2) will see a big increase in the TLB to solve this problem since developers are likely to be slow in optimizing apps. Apple will likely release white papers at WWDC on how to optimize apps properly.

I don't know how reliable their source is, but if true, it could obviate the need for 3rd party app developers to re-write their apps from the ground up to have any chance of being competitive on macOS. Which no 3D app in their right mind would consider doing.

I'm sure the reality is more complicated than just this. The battle rages on.
 
  • Like
Reactions: Irishman

quarkysg

macrumors 65816
Oct 12, 2019
1,247
841
IMHO, this is a sensation piece for increased views. I'll not be surprised that future Mx SoCs will still remain with 32MB TLB (which stands for Translation Lookaside Buffer, not transaction) Software need to be optimised for the hardware it is being run on.
 

jujoje

macrumors regular
May 17, 2009
247
288
And somewhat depressing at the same time.

At this stage it kind of feels like Apple should develop their own GPU renderer; cycles and redshift are both very much geared towards a certain approach to rendering, and it sounds like it's probably need a significant rethink to get them to take full advantage of the TBDR.

Viewport wise, a little more optimistic in that most 3D Apps are rebuilding their viewports to some extent to move to away from OpenGL to Vulkan / Metal. Since this means they're discarding legacy code there's the opportunity for them to take advantage of TBDR. This would assume that they're not going to just use MoltenVK as the low effort option. Apples pro apps team have been doing a good job getting companies to adopt Metal and optimise for AS so fingers crossed they provide the support needed to make this work...

Maybe those wanting an ASi Mac for GPU intensive work should wait and see how future M2/M3 Max/Ultra SoCs handle this issue...?

Which could create even more demand for Mn Pro SoCs...?!?

Hopefully the Mac Pro would address this, otherwise it's really not going to be very pro :)

IMHO, this is a sensation piece for increased views. I'll not be surprised that future Mx SoCs will still remain with 32MB TLB (which stands for Translation Lookaside Buffer, not transaction) Software need to be optimised for the hardware it is being run on.

The idea that Apple engineers overlooked something so obvious appears disingenuous and pretty silly in itself (besides it's been clear that GPUs were the future for a fair few workloads, even 5-6 years ago).

That said GPUs are the weak point of AS at the moment. On a 24 GPU Mac studio I was getting pretty bad performance, and even if I went with a 48 core I'd only be getting meh performance. The lack of scaling above 32 cores for current apps is making me hold off (I returned the 24 core Mac studio). It may be a software issue or lack of optimisation but when you're getting 5fps in the viewport on a new machine that doesn't bode well. Hopefully this doesn't turn into too much of a chicken and egg scenario...
 

BootLoxes

macrumors 6502a
Apr 15, 2019
749
897
So stick with the base mac studio unless you need more cpu cores. I will continue on with my M1 Air and RTX 3060 PC and wait until the M3 studio that hopefully has this issue fixed plus ray tracing cores.
 
  • Like
Reactions: Irishman

quarkysg

macrumors 65816
Oct 12, 2019
1,247
841
The idea that Apple engineers overlooked something so obvious appears disingenuous and pretty silly in itself (besides it's been clear that GPUs were the future for a fair few workloads, even 5-6 years ago).
I would agree.

32MB of TLB would allow (with 16KB page size) many GBs of mapped physical memory. The main issue as I see it is that the software is not optimally design for the hardware, as the GPU tile memory (which is the fastest for the GPU core to access) is not properly used. It doesn't matter how large the TLB is, it will still trash the cache and go to 'slow' RAM.

And we have way less cache than main memory, so large TLB (which just means the OS doing less work mapping virtual to physical) will just trash the SoC's cache if memory access patterns by the GPU is randomly done all over the virtual address space.
 
  • Like
Reactions: Irishman and jujoje

altaic

Suspended
Jan 26, 2004
712
484
Each cluster of GPU cores within an M1/M1 Pro/M1 Max/M1 Ultra chip comes with a 32MB TLB

Why is this such a big problem for M1 Ultra?
With the M1 and M1 Pro chips, there wasn't enough GPU performance to hit that 32MB TLB limit. However, the M1 Max is where you see GPU scaling fall off a cliff due to the TLB, especially the 32-core GPU model.

I don’t buy the Max Tech guy’s scaling theory. He stated that each GPU cluster has a 32MB TLB— looking at the die shots, each cluster is 8 cores, which lines up with the different M1/Pro/Max configurations. If scaling works with 3 clusters (w/ 3 x 32MB TLBs), it should also work with 4 clusters (w/ 4 x 32MB TLBs).

Clearly something else is the bottleneck for underperforming benchmarks. Certainly could be a TBDR vs immediate rendering thing, but the TLB thrashing explanation is all hand wavy. And the headline that Apple engineers made a critical hardware design error 5-7 years ago… clickbait nonsense.

Edit: It’s a shame, because I want to believe the M2 ray-tracing “exclusive” from them, but it seems they have a deeper understanding of clickbait than the topic that they imply they’ve deep dived.
 
Last edited:
  • Like
Reactions: ader42 and Irishman

iPadified

macrumors 68020
Apr 25, 2017
2,014
2,257
So developers needs to use the hardware correctly for max performance? What a surprise! Motivating developers to do this will however be difficult. Developers usually wait for improved hardware given the history.
 
  • Like
Reactions: l0stl0rd

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,628
1,101
@Lone Deranger Could you open a new thread with the video and tweets, or better yet, have some moderator move the latest posts from this thread to a new thread so more people can shed more light on this situation?
 
  • Like
Reactions: Irishman

altaic

Suspended
Jan 26, 2004
712
484
@Lone Deranger Could you open a new thread with the video and tweets, or better yet, have some moderator move the latest posts from this thread to a new thread so more people can shed more light on this situation?
Why? The oringinal-source-tweet-guy self contradicted. It’s clickbait. I’m starting to worry about you, Xiao_Xi.
 
  • Haha
Reactions: Xiao_Xi
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.