Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
I fully agree. I am a bit pessimistic of Apple catering for third party GPUs. They got literally burned by AMD with their very hot running GPUs in iMac and laptops. However, Map Pro as a market segment is very small and a dedicated Mac Pro chip with user choice of CPU and GPU core counts independently of each other seem to be economical stupidity. This is a difficult equation to solve and therefore the most interesting Mac to observe.
oh yes I remember the hot AMD gpus in the 6,1 Mac Pro.
 
I fully agree. I am a bit pessimistic of Apple catering for third party GPUs. They got literally burned by AMD with their very hot running GPUs in iMac and laptops. However, Map Pro as a market segment is very small and a dedicated Mac Pro chip with user choice of CPU and GPU core counts independently of each other seem to be economical stupidity. This is a difficult equation to solve and therefore the most interesting Mac to observe.
To be fair, what burned Apple was designing inadequate cooling solutions - the GPU is going to run the temperature it's going to run. Blaming it, is like choosing to jump off a cliff, and blaming Gravity for what follows.

But you're right, the difficulty of designing an entire architecture to supply a niche of the business, it's almost as if they should build a lower cost machine that can still fit a discreet, slot-based GPU to widen it's appeal and amortise the costs, or else stick with Xeons.

Better yet, they could get out of the workstation business entirely, and licence macOS to HP for $1000/seat.
 
  • Like
Reactions: T'hain Esh Kelch
the GPU is going to run the temperature it's going to run.
idk, AMD and Nvidia had defective GPUs before. Apple GPU's are efficient. The idle watt of the GPU in the M1 Max is measured in milliwatts.

Apple has to design proper desktop GPUs. now the Apple GPU is too mobile oriented.
 
Better yet, they could get out of the workstation business entirely, and licence macOS to HP for $1000/seat.
If they were going top quit the workstation market the 2019 Mac Pro would never be released.

Let's just see how the new Mac Pro specs are.
 
  • Like
Reactions: singhs.apps
Apple should have gone with AMD thread ripper CPU's combined with RDNA2 or the up coming RDNA3 then you would have a powerful workstation. Showing off Apple M1 or the upcoming M2 in cherry picked scenarios is Apple doing what they do best B/S but worst of all being non upgradable. Apple Tax at its worst.

The M1 and M2 might be good enough for laptops and portable devices where the machines looked cool but run hot because of apples insistence on poor cooling solutions to fit in their ultra slim package's. But true workstation machines if they drop intel will be a mistake.
 
  • Like
Reactions: prefuse07
devices where the machines looked cool but run hot because of apples insistence on poor cooling solutions to fit in their ultra slim package's
That has been true for the 2016 era MacBook Pro but the latest 2021 MBP ones are much are thicker to allow for better cooling. Both 14" and 16" are function over form when it comes to cooling.

Apple Sillicon MacBooks do not run hot like the Intel MacBooks under normal or even heavy loads. It's only under extreme loads that these M1 MacBooks get hot but that is true for any laptop even those that have liquid metal and heavy copper cooling like gaming laptops.

As also seen with the overkill cooling on the Mac Studio. We see that the obession with thinness has ended for Apple.
But true workstation machines if they drop intel will be a mistake.
I don't care if it Intel or M1 but Apple needs to keep PCIe slots.
 
Apple should have gone with AMD thread ripper CPU's combined with RDNA2 or the up coming RDNA3 then you would have a powerful workstation. Showing off Apple M1 or the upcoming M2 in cherry picked scenarios is Apple doing what they do best B/S but worst of all being non upgradable. Apple Tax at its worst.

The M1 and M2 might be good enough for laptops and portable devices where the machines looked cool but run hot because of apples insistence on poor cooling solutions to fit in their ultra slim package's. But true workstation machines if they drop intel will be a mistake.
This can only come from someone who's probably never tried to work with a mac studio or even a full specced macbook pro m1.
The performance of these machines is insane, and I doubt that a threadripper with RDNA 2 can beat these in the majority of tasks that most apple users do.
Just take the frustrating example of h265 10 bit decoding, which is simply not possible in any real time fashion on anything but the latest m1 macs..
 
  • Like
Reactions: AlphaCentauri
Just take the frustrating example of h265 10 bit decoding, which is simply not possible in any real time fashion on anything but the latest m1 macs..
Only on the 4:2:2 Sampling structure where Quicksync 11 and 12 work properly. 4:2:0 can be done in RDNA2 and Nvidia RTX and 4:4:4 can be done by RTX series. And Quicksync 11 and 12 can decode all HEVC from 8 to 12 bit.
My point is not only M1 Mac can decode those.
 
Only on the 4:2:2 Sampling structure where Quicksync 11 and 12 work properly. 4:2:0 can be done in RDNA2 and Nvidia RTX and 4:4:4 can be done by RTX series. And Quicksync 11 and 12 can decode all HEVC from 8 to 12 bit.
My point is not only M1 Mac can decode those.
Well. Is there any way to get quicksync, or an nvidia RTX in any modern mac? ;)
My point is, it sucks that there is no way to upgrade these features to the "user upgradable mac pro".
 
To be fair, what burned Apple was designing inadequate cooling solutions - the GPU is going to run the temperature it's going to run. Blaming it, is like choosing to jump off a cliff, and blaming Gravity for what follows.

But you're right, the difficulty of designing an entire architecture to supply a niche of the business, it's almost as if they should build a lower cost machine that can still fit a discreet, slot-based GPU to widen it's appeal and amortise the costs, or else stick with Xeons.

Better yet, they could get out of the workstation business entirely, and licence macOS to HP for $1000/seat.
Intel, AMD, NVIDIA showed no drive to lower power consumption and expected others to solve their inefficiencies with bulky and noisy cooling solution. On a Mac Pro, that can be solved but on all other computers, it is/was a problem. No wonder Apple switched architecture.
 
Intel, AMD, NVIDIA showed no drive to lower power consumption and expected others to solve their inefficiencies with bulky and noisy cooling solution. On a Mac Pro, that can be solved but on all other computers, it is/was a problem. No wonder Apple switched architecture.
That's not all true, a water cooled system can be very quiet and not noisy at all, the by product heat can also warm a cold room, or you can have a M core chip and a heater on ? which is more efficient then? unless you live in a very hot climate where even the M chip could get warm but water cooled system could stay close to ambient if installed correctly.

Lets face it AMD and Nvidia have spent many years getting GFX performance to where it is now. The new Mac Pro needs to be upgradable simple as that, even if its Mac's silicon based gfx cards. Apple has tightened its grip on its ecosphere with soldered in parts like CPU's and Ram along with SSD's and this path would be a killer for the new Mac Pro.

For instance even today with soldered in ram and SSD drives if one fails do you think apple will care after your warranty runs out, of course they won't care, but will sell you via themselves a whole new board, with new soldered in SSD and ram along with CPU at an out ragious price although it might only be 1 stick of memory that failed, or just the SSD drive.

So before we start quoting Apple as being energy efficient and green what a wasteful situation to just replace the whole thing when only one part failed. This is where the new Mac Pro must not be, if it is then people with a brain and not deep pockets might look else where for work stations. and perhaps Apples big brother effect might backfire on them.

Hell, I would love that! Common-sense hardware with a clear upgrade path at last. :p
Mac OS on license to HP or any workstation provider might be a good idea, but Apple will never do it lol
 
Well. Is there any way to get quicksync, or an nvidia RTX in any modern mac? ;)

Possible? Yeah on the Intel Front. ( on the Nvidia front ... that 'bridge' has been blown up; at least on mac OS. )

".. The full-height card is the more powerful of the two with 32 compute units, but there is another low-profile model with two 8 compute unit GPUs onboard. ..."
These are the mobile Arc A370M and A350M (about 150 and 75 (2* 35) W respectively )

Intel is eyeball deep in driver support 'drama'. So they need 'yet another driver port ' added to their software workload like they need another hole in the head. Apple probably isn't cointributing to a solution here either.

So possible, just no very probable. There are no DisplayPort or HDMI ports on these cards. Stripped of GUI workload and virtual desktop workload, there is a trackable amount of work to do if both parties wanted to do something. (even more so on the mac OS on Intel side where there is already Quicksync work in place. )

"Quicksync" without the CPU package is possible. If Intel prices these like an Afterburner card they'll have problems. But if that low end card is 1/2 or 1/3 the Afterburner price there could be some utility there non server use cases than did tons of encode-decode at a single user workstation.


My point is, it sucks that there is no way to upgrade these features to the "user upgradable mac pro".

If Apple had some slots, but did not provide any AUX power connectors could still use that 75W card.
 
I fully agree. I am a bit pessimistic of Apple catering for third party GPUs. They got literally burned by AMD with their very hot running GPUs in iMac and laptops.

The biggest problem I see with AMD drivers is driver stability. The Apple Silicon drivers aren't perfect, but they seem a lot more stable than the AMD drivers. As others have noted - Apple is still having to roll in fixes into new versions of macOS despite them only selling only machine with AMD GPUs.

AMD has generally been a good partner for Apple. But it's tough to see them continuing to put up with the overhead of dealing with AMD for limited cases.

Ideally they'd just open up third party GPU development and get out of distribution. That way AMD could do their thing without causing too much churn at Apple. But I don't think they will.
 
Intel, AMD, NVIDIA showed no drive to lower power consumption and expected others to solve their inefficiencies with bulky and noisy cooling solution. On a Mac Pro, that can be solved but on all other computers, it is/was a problem. No wonder Apple switched architecture.

This gives Apple more credit than it deserves, and unfairly discredits the other players in the industry.

Let's say the first M1 GPU. It's released in 2020 and manufactured on TSMC 5nm. Its performance is equivalent to AMD Radeon RX 560. The later was released in 2017 and manufactured on GloFo 14nm. GloFo 14nm process is about the equivalent of TSMC 16/12nm around similar timeframe. If you re-manufacture Radeon RX 560 on TSMC 5nm, you probably get similar power efficiency as the M1 GPU.

Now look at the M1 Ultra GPU. Also manufactured on TSMC 5nm. Performance is equivalent to AMD Radeon Pro Vega VII (TSMC 7nm). How their max power consumptions fair? According to anandtech's test [0] on M1 Max GPU, under the most GPU intensive task (Aztec High Off) they could push, M1 Max GPU reports about 46W. Let's assume for moment M1 Ultra GPU would report 90W under the same test ('cos I can't find anyone done the same test). According to this fella [1], maximum GPU power of "Radeon Pro VII" is "way below 200W" and let's just put it at 150W. Now if you re-manufacture Radeon Pro Vega VII on TSMC 5nm that brings 30% power saving at the same performance level, max GPU power trimmed down to 105W (!!). Comparable to M1 Ultra GPU.

So Apple GPUs seem to me not more power efficient than PC GPUs if vendors decide to stay with Apple performance level AND have access to Apple's manufacturing process node.

CPU and GPU microarchitectures are quite different species. Apple also doesn't seem to have much edge in its GPU microarchitecture. And unlike CPU microarchitecture, vendors could change their GPU microarchitecture and ISAs to suit their design goals.

[0] anandtech's test on M1 Max GPU
[1] one fella's test on a "Radeon Pro Vega VII"
 
So Apple GPUs seem to me not more power efficient than PC GPUs if vendors decide to stay with Apple performance level AND have access to Apple's manufacturing process node.

Power efficiency on Apple GPUs is a complicated topic. I tend to agree - GPU core to GPU core, they're probably not that much more efficient.

But there are other ways to shake out design efficiencies. Apple Silicon GPUs don't have VRAM. VRAM can be a significant power cost. They typically also have designs that induce much lower memory bandwidth. Another power savings.

People try to put Apple Silicon head to head with traditional GPU designs and that's really tough because Apple Silicon is not a traditional GPU design. For better or worse. And there are certainly things its worse at. But it can be much more workflow dependent.
 
But there are other ways to shake out design efficiencies. Apple Silicon GPUs don't have VRAM. VRAM can be a significant power cost. They typically also have designs that induce much lower memory bandwidth. Another power savings.

Glad to see you pointed this out. I drilled down the rabbit hole a bit. Surprise. Turns out VRAM to VRAM, Apple GPUs aren't much more power efficient.

Apple GPUs still have "VRAM" just that its CPU and GPU share the same system RAM. Apple juices up the system RAM's bandwidth a lot when compared to a conventional PC. So it's much better system RAM and comparably still good VRAM (when compared to higher end PC GPUs). Also note that DRAM energy cost mostly comes from transfers measured in unit energy per bit transfer.

Different DRAM types have varying power efficiencies. Common technologies in ascending power efficiency: GDDR5 < LPDDR4X < GDDR6 < LPDDR5. LPDDR5 is about 20% more power efficient than LPDDR4X. GDDR6 is somewhat in between and closer towards LPDDR5. Also, typically DRAM consumes only about 10-15% of a GPU board's total power, e.g. one such estimate [0].

As an aside, Radeon RX 560 comes with GDDR5, and you know what.. M1 comes with LPDDR4X. RDNA & RNDA2 come with GDDR6...and M1 Pro/Max/Ultra come with LPDDR5. Perhaps just a pure coincidence ppl would think?

[0] some quick google result
 
The M1 and M2 might be good enough for laptops and portable devices
Not "might" but "are". the M series was designed to also be used in iPad's too and they are fanless. Intel's i3/i5/i7 chips were never meant to be fanless cause they get too hot.
 
Turns out VRAM to VRAM, Apple GPUs aren't much more power efficient.

I wouldn't be so sure about that. sure, LPDRR5 is more effecient than GDRR6 but Apple's GPU is still more efficient not because it uses LPDDR5 but rather due to Apple's GPU arch inherent . This level of difference shows how Apple's idle state at consuming little power say compared to an AMD mobile GPU like the 5300M found in the 16" 2019 which uses GDRR6. I have the 2019" and it idles at 5-6 watts. Compared to a post I showed below shows this user's 2021 MacBook Pro at a shocking 15 miliwatts. The MBP was also running apps but was idle.

Here is what the user had to say:

"Here's my GPU power with a bunch of Safari tabs, teams, music, activity monitor, Firefox and an external 4k display connected - when I'm just reading as opposed to moving things around...

**** GPU usage ****



GPU active frequency: 13 MHz

GPU active residency: 3.11% (389 MHz: 2.8% 486 MHz: .08% 648 MHz: .03% 778 MHz: .22% 972 MHz: .02% 1296 MHz: 0%)

GPU requested frequency: (389 MHz: 2.7% 486 MHz: .04% 648 MHz: .05% 778 MHz: .20% 972 MHz: .10% 1296 MHz: 0%)

GPU idle residency: 96.89%

GPU Power: 15 mW

15 MILLIWATTs"


source: https://talkedabout.com/threads/waiting-for-and-or-enjoying-my-m1-pro-max-mbp-thread….2114/page-16


Looking at this we see that that it's Apple's GPU arch that is very power efficient and no RAM type can deliver this low engery usage.
So Apple GPUs seem to me not more power efficient than PC GPUs if vendors decide to stay with Apple performance level AND have access to Apple's manufacturing process node.
Let's continue. Keep in the 5300M is made on TSMC 7nm process. A move to TSMC 5nm would be 30% less power so lets say 3.5 watts and microarch improvements say 2 watts less power. That's 1.5 watts on idle using hypothetical analysis and I was very generous here. 1.5 watts on idle vs 15 milliwatts on idle on a Apple M1 based GPU.

When talking about efficiency it good to talk about low power states as well.

TLDR: Apple has industry leading idle states which no type of memory can achieve but it mainly depends on how Apple made their GPU cores and arch.
 
Last edited:
Apple GPUs still have "VRAM" just that its CPU and GPU share the same system RAM. Apple juices up the system RAM's bandwidth a lot when compared to a conventional PC.

Not really - depending on the app and how well it's been optimized. Apple Silicon GPUs have something called Tile Memory. That means they hit system memory not nearly as often as a traditional GPU would hit VRAM. Tile Memory acts like a per tile cache that fulfills the same role a typical system cache would.

An Apple Silicon GPU would definitely hit main system memory less than a discrete GPU touches its own VRAM. It might even touch system memory less than a discrete GPU would need to swap.

Tile Memory is how Apple gets away with using much slower memory. They just avoid hitting memory so they don't pay a cost for the lower bandwidth.

It's possible a badly optimized app doesn't fit its work well into tile memory. Then it becomes exposed to Apple Silicon's poor memory bandwidth and performance will suffer. That would also decrease power efficiency by driving so much traffic through system memory.
 
TLDR: Apple has industry leading idle states which no type of memory can achieve but it mainly depends on how Apple made their GPU cores and arch.

What you said about low idle power consumption didn't invalidate my previous analysis. We're talking about GPUs for professional workstations in this thread and by some stretch data center type of workloads. Hence, my focus was obviously pinned on GPUs under load. Just like any machines, idle machines are lost revenue. Also for machines of such nature, efficiency under load is where saves the majority of electricity waste.

Not really - depending on the app and how well it's been optimized. Apple Silicon GPUs have something called Tile Memory. That means they hit system memory not nearly as often as a traditional GPU would hit VRAM. Tile Memory acts like a per tile cache that fulfills the same role a typical system cache would.

What you've described is so called TBDR (tile based deferred rendering), a type fo rendering technique made it popular by PowerVR where Apple GPUs inherit lots of IPs. TBDR is in contrast to IMR (immediate mode rendering) which is the "norm" in AMD/Nvidia GPUs.

TBDR does reduce access to main memory. So less demand on bandwidth and potentially more power efficient. Hence, popular on mobile GPUs such smartphones.

IMR in textbook speak depends on immediate read/write to VRAM for all operations. But don't forget over the years AMD/Nvidia had added L1, L2, L3...caches to their GPUs to reduce dependency on VRAM access. Similar effect to what you call "tile memory" in TBDR.
 
What you've described is so called TBDR (tile based deferred rendering), a type fo rendering technique made it popular by PowerVR where Apple GPUs inherit lots of IPs. TBDR is in contrast to IMR (immediate mode rendering) which is the "norm" in AMD/Nvidia GPUs.

I'm aware.

TBDR does reduce access to main memory. So less demand on bandwidth and potentially more power efficient. Hence, popular on mobile GPUs such smartphones.

IMR in textbook speak depends on immediate read/write to VRAM for all operations. But don't forget over the years AMD/Nvidia had added L1, L2, L3...caches to their GPUs to reduce dependency on VRAM access. Similar effect to what you call "tile memory" in TBDR.

Still not quite the same thing. AMD and Nvidia GPUs don't implement deferred rendering (the D part of TBDR) so they write a lot more data back out to VRAM when they don't need to. When you're rendering in 3D, the Apple GPU will try to only render the top most pixels, while an AMD or Nvidia GPU will fully render every obscured pixels and write those pixel values back to VRAM. That's inefficient from a compute perspective (you're calculating the colors of pixels that will never be presented to the user) and memory (you're writing pixel values that will never be presented to the user.) It's also inefficient from a power perspective if you're doing pixel computations and stores that don't end up as part of the final render. Thats something you can do with tile memory that you can't do with a cache.

Tile memory also allows a render to be completed within a compute unit entirely - without leaving the core until the render pass is done. That's still way more efficient than what AMD is doing with cache. And it even requires some amount of software optimization to take advantage of - using an API like Metal.

What we're still talking about is it's very difficult to do a direct core-to-core power comparison of Apple Silicon and something like a Radeon. Very workflow and optimization dependent.
 
Still not quite the same thing. AMD and Nvidia GPUs don't implement deferred rendering (the D part of TBDR) so they write a lot more data back out to VRAM when they don't need to.
TBDR vs IMR are different approaches to achieve the same end result (rendering geometry to a framebuffer). No one had argued and would argue they're the same methodology. What we care is the two approaches consuming how much electricity to get workdone. The amount of electricity is observable at the global chip level which you had agreed a few posts back "GPU core to GPU core, they're probably not that much more efficient."

When you're rendering in 3D, the Apple GPU will try to only render the top most pixels, while an AMD or Nvidia GPU will fully render every obscured pixels and write those pixel values back to VRAM. That's inefficient from a compute perspective (you're calculating the colors of pixels that will never be presented to the user) and memory (you're writing pixel values that will never be presented to the user.)

AMD/Nvidia have depth buffers and obscurity detection to avoid redundant work as well. They probably had way more optimizations baked in over the years of advancement.

It's also inefficient from a power perspective if you're doing pixel computations and stores that don't end up as part of the final render. Thats something you can do with tile memory that you can't do with a cache.

Not to lose track of our initial focus. We care at the global chip level Apple GPUs and AMD/Nvidia GPUs would have similar power efficiency at the same level of performance.

Tile memory also allows a render to be completed within a compute unit entirely - without leaving the core until the render pass is done. That's still way more efficient than what AMD is doing with cache. And it even requires some amount of software optimization to take advantage of - using an API like Metal.

Caches inside AMD/Nvidia are clustered to each compute unit as well. Saving VRAM access by exploiting data locality.

Again, I would remind us to focus on the global chip level. If two GPU chips burn similar level of electricity and achieve the same level of workdone (level of performance), then power efficiency is about the same among the two. It's not helping by drilling down, pointing out different approaches and insisting that one hence will be way more efficient than the other.
 
TBDR vs IMR are different approaches to achieve the same end result (rendering geometry to a framebuffer). No one had argued and would argue they're the same methodology. What we care is the two approaches consuming how much electricity to get workdone. The amount of electricity is observable at the global chip level which you had agreed a few posts back "GPU core to GPU core, they're probably not that much more efficient."

So, again, the idea behind TBDR is that sometimes the GPU can skip work. That makes a core to core power comparison irrelevant when a TBDR chip can shave off entire chunks of work. You're trying to compare power consumption in a one dimensional way when there are two to three dimensions to think about. It's not just a pure transistor to transistor comparison. A TBDR GPU basically computes a more efficient way to render. That goes around the work chunk vs work chunk comparison you're trying to do. The GPUs won't do the same amount of work for the same render.

AMD/Nvidia have depth buffers and obscurity detection to avoid redundant work as well. They probably had way more optimizations baked in over the years of advancement.

We've both agreed that AMD and Nvidia have immediate mode GPUs. That's fundamentally different than the work a deferred GPU can optimize out. AMD and Nvidia don't make deferred mode GPU. Deferred mode is not one of the optimizations they've made.

Again, I would remind us to focus on the global chip level. If two GPU chips burn similar level of electricity and achieve the same level of workdone (level of performance), then power efficiency is about the same among the two. It's not helping by drilling down, pointing out different approaches and insisting that one hence will be way more efficient than the other.

At what workload? Is it optimized for immediate mode? TBDR? It's not a one dimensional question where you just run a single benchmark and you get an answer. That's why your methodology isn't sufficient. There's no "global level" for comparison. You're trying to make something really complicated really simple, but it's not that simple.

It's been easy to compare AMD and Nvidia GPUs directly like this because they're fundamentally the same kind of GPUs. And I think the trap everyone is falling into is trying to do the same sort of comparisons against Apple Silicon. But Apple Silicon GPUs use a very different GPU design thats going to be better at some things and worse at other things. It's not a 1:1 comparison which has befuddled the YouTuber crowd trying to judge it like it's a gaming drag race.
 
Your arguments appeared to me getting no where. I'm not the sort of interested in arguments for the sake of arguing. So this will be last response on this.

What sort of workload? Any type of workload applicable to compute/graphical rendering (less media encoders/decoders which are traditionally ASICs stitched into a GPU chip).

So pick a workload. Input into Apple GPU. Measure power consumption. Input into AMD/Nvidia GPU. Measure power consumption. When they're similar, their power efficiency are about the same. Their underlying microarchitectures are irrelevant in determining their power efficiency.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.