Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Exactly, Apple wins power per watt. But at the price of substantially worse performance. If you want/need top level raw performance, Apple simply can't compete. That's not going to change until they come out with a non-power/thermally constrained desktop-only chip. But they are clearly on a 'one chip to rule them all' architecture with a primary design goal of performance per watt, so the need to run the same chip in a laptop will continue to cripple the desktops.

There's not much point in buying a Mac pro over a studio with the current design, unless you need very specialized PCIe hardware. That's a shame - Apple's going to miss out on a lot of the LLM/AI/ML developer market because the machines are going to be a distant second to linux/windows intel systems.
If the power bill is a concern and there's a whole system that consumes <1/3rd the power but is 2nd place then I'd go for that. There are many businesses looking to cut cost where they can. That includes the power bills.

This is exactly like high efficiency LED lamps replacing incandescents. Sure, it does not have high as a CRI as traditional light bulbs but unless it is the sole concern then it is "good enough" for illumination.

But then again if I was looking for a gaming PC I'd not look at the Mac Pro.

The rumored M2 Extreme would have addressed the GPU core performance concern as it would 2x the performance of the M2 Ultra. Sadly the yield appears to be sufficient to even offer it.

For what it is now the actual target market for the Mac Pro will appreciate it and buy it even if it tops off at $12k as it is a business expense.

The previous market for it will just keep their 2019 and wait for the M3 Extreme's 512GB(?) unified memory by Q1 2025 or buy a Dell/HP/Lenovo.
 
There's a lot of discussion on x86 vs Arm already, but in addition to the perf-per-watt war I'm not understanding the value of the Mac Pro vs Studio featuring the same M2 Ultra setup.
  1. Where's the value in the Mac Pro unless the clock speeds were pushed higher than the Studio's version?
  2. Do power limitations exist in the Arm architecture that keep it from competing with x86?
  3. If Arm chips begin clocking the same speeds as Intel chips wouldn't Arm pull ahead with a large lead?
No the value of the ARM Mac Pro is you get X2 the performance of a Intel 28 core Mac Pro on a one to one replacement.
 
  • Like
Reactions: Yebubbleman
Performance per GPU shader cluster is the same (in fact, Apple might even do better since their architecture seems to be less prone to stalls, but I’m still not 100% sure about the details), it’s just that Nvidia has much more die space to put compute logic on, plus, they are more aggressive about reusing shared hardware functionality which gives them more effective compute per mm2. For example, AD102 (the die that powers 4090) is just 20% larger than M2 Max, but Nvidia can fit almost 4x as many shader clusters on it!
That isn't surpring that Nvidia has that much shaders in a surface area 20% larger than a M2 Max... it is a dedicated dGPU. 😊
 
The performance tragectory of Mac GPU cores will surpass those of flagship RTX before 2030.

That’s a bold statement. I don’t think there is any evidence to suggest that Apple GPU performance will continue growing like it has, nor is there evidence to suggest that Nvidia will slow down. At the end of the day, GPUs have quite a lot in common and operate using very similar principles. IPC for current GPUs is pretty much identical for example, as is compute execution width. Apple did very well in catching up to big GPUs in speed and capability, but the SoC approach will always have the density problem. It’s much more costly for Apple to put as much compute on a die than for a Nvidia, since Nvidia doesn’t have to share it with other processors.

RTX 4090 draws 450W + power spikes. Mac Studio M2 Ultra non-binned has a CPU max power draw of 215W for the whole system.

That’s mostly because 4090 is clocked ridiculously high and uses very power-inefficient memory to cut costs. If you look at much more reasonably clocked products, like the SFF workstation series, you’ll see that Nvidias energy efficiency is very close to Apples (e.g. M2 Max delivers 13 TFLOPs at 45W with 38 shader clusters, 4000 SFF delivers 19 TFLOPs at 70W with 48 shader clusters, the Nvidia GPU is clocked marginally higher).
 
That isn't surpring that Nvidia has that much shaders in a surface area 20% larger than a M2 Max... it is a dedicated dGPU. 😊

Exactly. And that a big advantage for them. Its much more expensive for Apple to make big GPUs using the SoC approach.
 
There's a lot of discussion on x86 vs Arm already, but in addition to the perf-per-watt war I'm not understanding the value of the Mac Pro vs Studio featuring the same M2 Ultra setup.
  1. Where's the value in the Mac Pro unless the clock speeds were pushed higher than the Studio's version?
  2. Do power limitations exist in the Arm architecture that keep it from competing with x86?
  3. If Arm chips begin clocking the same speeds as Intel chips wouldn't Arm pull ahead with a large lead?
Higher clock speed is a pseudo expression of power. Intel figured this out in the early 2000s. Pushing x86 beyond 4.2Ghz provided no real gains over just using a lower clock.

Then Apple's Jon Rubinstein and Bertrand Serlet shocked the world with real findings about RISC architecture versus the CISC in the x86 platforms. Apple transitioned to IBM PPC, and the world watched in shock as lower clocked IBM PPC RISC chips competed head to head with ease over Intel higher clocked CISC chips.

The problem was in the Northbridge, which Intel phased out in the early Core Duo line.

But with ARM, it is a SoC, with RAM/VRAM right on top of CPU/GPU. No bridges, south or north. It's all unified architecture, learned from the Cell and Xbox 360 processors.

In a unified architecture, clock speed provides no real speed or performance boost in PPW. Sure, a 2 hr process may be reduced to 1 hr, but that is at the cost of overheating or overworking the machine.

What Intel is now learning is that some people prefer a machine that lasts longer but does the job at a slightly lower clock speed because of the battery savings, energy efficiency, and longevity improvements.

This is also why a base MBA performs almost as well as a $6000 Intel Xeon Mac Pro. At a lower clock speed, with hardware/software optimization, Apple effectively can speed up the entire clock cycle without having to increase clock speed. Instead of 1 clock cycle at 4.2Ghz, the ARM can do it at 3 clock cycles at 2.9Ghz, but in the same amount of time.

By parsing data and removing bottlenecks in CPU/GPU/RAM, ARM effectively reduces the time needed for a clock cycle, so no need to increase the clock frequency. You keep heat low, and you accomplish the same task in roughly the same amount of time.

For a 2 hr process on an Intel Xeon, people were already expecting to wait 2 hours. On a MBA M2, that process is 2hr 15. That is a bit longer, but not much longer and on SERIOUSLY less expensive hardware. It essentially nukes the Core Xeon line as a feasible consumer product other than hobbyists or as professional rack servers. Why spend $1200 just for the Xeon when you can get an ultra portable full laptop?

Clock speed is not a sign of power. It is if you compare CISC Intel or other x86 lines. But Apple Silicon is an ARM SoC Unified Architecture light years ahead.
 
Higher clock speed is a pseudo expression of power. Intel figured this out in the early 2000s. Pushing x86 beyond 4.2Ghz provided no real gains over just using a lower clock.

Then Apple's Jon Rubinstein and Bertrand Serlet shocked the world with real findings about RISC architecture versus the CISC in the x86 platforms. Apple transitioned to IBM PPC, and the world watched in shock as lower clocked IBM PPC RISC chips competed head to head with ease over Intel higher clocked CISC chips.

The problem was in the Northbridge, which Intel phased out in the early Core Duo line.

But with ARM, it is a SoC, with RAM/VRAM right on top of CPU/GPU. No bridges, south or north. It's all unified architecture, learned from the Cell and Xbox 360 processors.

In a unified architecture, clock speed provides no real speed or performance boost in PPW. Sure, a 2 hr process may be reduced to 1 hr, but that is at the cost of overheating or overworking the machine.

What Intel is now learning is that some people prefer a machine that lasts longer but does the job at a slightly lower clock speed because of the battery savings, energy efficiency, and longevity improvements.

This is also why a base MBA performs almost as well as a $6000 Intel Xeon Mac Pro. At a lower clock speed, with hardware/software optimization, Apple effectively can speed up the entire clock cycle without having to increase clock speed. Instead of 1 clock cycle at 4.2Ghz, the ARM can do it at 3 clock cycles at 2.9Ghz, but in the same amount of time.

By parsing data and removing bottlenecks in CPU/GPU/RAM, ARM effectively reduces the time needed for a clock cycle, so no need to increase the clock frequency. You keep heat low, and you accomplish the same task in roughly the same amount of time.

For a 2 hr process on an Intel Xeon, people were already expecting to wait 2 hours. On a MBA M2, that process is 2hr 15. That is a bit longer, but not much longer and on SERIOUSLY less expensive hardware. It essentially nukes the Core Xeon line as a feasible consumer product other than hobbyists or as professional rack servers. Why spend $1200 just for the Xeon when you can get an ultra portable full laptop?

Clock speed is not a sign of power. It is if you compare CISC Intel or other x86 lines. But Apple Silicon is an ARM SoC Unified Architecture light years ahead.
Just to add... the concept of clock speed as a measure of CPU performance was debunked by AMD as early as the 1990s.

This is also why Apple does not market their iPhone or Mac chips that way. It does not provide buyers any helpful idea on device performance.
 
Higher clock speed is a pseudo expression of power. Intel figured this out in the early 2000s. Pushing x86 beyond 4.2Ghz provided no real gains over just using a lower clock.

Then Apple's Jon Rubinstein and Bertrand Serlet shocked the world with real findings about RISC architecture versus the CISC in the x86 platforms. Apple transitioned to IBM PPC, and the world watched in shock as lower clocked IBM PPC RISC chips competed head to head with ease over Intel higher clocked CISC chips.

The problem was in the Northbridge, which Intel phased out in the early Core Duo line.

But with ARM, it is a SoC, with RAM/VRAM right on top of CPU/GPU. No bridges, south or north. It's all unified architecture, learned from the Cell and Xbox 360 processors.

In a unified architecture, clock speed provides no real speed or performance boost in PPW. Sure, a 2 hr process may be reduced to 1 hr, but that is at the cost of overheating or overworking the machine.

What Intel is now learning is that some people prefer a machine that lasts longer but does the job at a slightly lower clock speed because of the battery savings, energy efficiency, and longevity improvements.

This is also why a base MBA performs almost as well as a $6000 Intel Xeon Mac Pro. At a lower clock speed, with hardware/software optimization, Apple effectively can speed up the entire clock cycle without having to increase clock speed. Instead of 1 clock cycle at 4.2Ghz, the ARM can do it at 3 clock cycles at 2.9Ghz, but in the same amount of time.

By parsing data and removing bottlenecks in CPU/GPU/RAM, ARM effectively reduces the time needed for a clock cycle, so no need to increase the clock frequency. You keep heat low, and you accomplish the same task in roughly the same amount of time.

For a 2 hr process on an Intel Xeon, people were already expecting to wait 2 hours. On a MBA M2, that process is 2hr 15. That is a bit longer, but not much longer and on SERIOUSLY less expensive hardware. It essentially nukes the Core Xeon line as a feasible consumer product other than hobbyists or as professional rack servers. Why spend $1200 just for the Xeon when you can get an ultra portable full laptop?

Clock speed is not a sign of power. It is if you compare CISC Intel or other x86 lines. But Apple Silicon is an ARM SoC Unified Architecture light years ahead.

Nothing you wrote here even remotely makes sense.
 
Sure performance is not only based on clock rate, but anyone saying higher clock rate doesn’t translate to better performance has no idea what they’re talking about. If there was no benefit to increasing clock rate then we would be using CPUs in the MHz (or lower) range for everything, but we aren’t. If we could reliably run 10GHz base clock CPUs, I guarantee you we would have them. The reason why Apple doesn’t increase frequency in their processors is simply because they can’t, and won’t be able to unless they make the SoC smaller (not impossible since TSMC somehow manages to consistently decrease transistor size), build separate transmission lines to the main CPU separate from everything else in the SoC, or perform borderline black magic (not unlikely either). The issue here (in Apple’s case) is not heat generation due to high frequencies.
 
So I've read everything and I'm still not understanding how clock speeds would not increase perf. I understand that operations per cycle is optimized by software but it doesnt make sense that more of them wouldnt be faster.

Maybe I'm missing something obvious.

Very interesting what @leman mentioned about `AD102 being only 20% larger than the M2 Max`. Makes me think that an SoC arch could never beat any dedicated GPUs and the M3 certainly won't compete with the 50 series.
 
So I've read everything and I'm still not understanding how clock speeds would not increase perf. I understand that operations per cycle is optimized by software but it doesnt make sense that more of them wouldnt be faster.

Maybe I'm missing something obvious.

Very interesting what @leman mentioned about `AD102 being only 20% larger than the M2 Max`. Makes me think that an SoC arch could never beat any dedicated GPUs and the M3 certainly won't compete with the 50 series.
I found the keyword about AMD vs the Megahertz Myth

- https://www.theguardian.com/technology/2002/feb/28/onlinesupplement3
- https://www.zdnet.com/article/amd-takes-on-myth-of-megahertz/
- https://rationalwiki.org/wiki/Megahertz_myth
- https://forums.macrumors.com/threads/amd-battles-megahertz-myth.327/
- https://en.wikipedia.org/wiki/Megahertz_myth

For this decade it should be called Gigahertz Myth.

What AMD/Intel/Apple now push is core counts. How many cores does you Ryzen 3/5/7/9 or Core i3/5/7/9 or M/Pro/Max/Ultra have?

243.gif
 
Last edited:
Just because video was not mentioned at WWDC does NOT mean video is permanently off the table. There's more then enough bandwidth across the PCIe bus in the Mac Pro to easily handle the bandwidth of a high-performance videocard. The other consideration is that the majority of PC motherboards (that aren't built by OEMs like HP, Lenovo, or Dell) have at most two PCIe x16-capable and two x1 slots, whereas the Mac Pro has FOUR PCIe x8 Gen 4 and two PCIe x16 Gen4, in addition to the PCIe x4 Gen 3 slot used for the I/O card. The Mac Pro also has two SATA ports above the PCIe slots, along with one USB and some other port I do not recognize, so there are multiple expansion options.

For heavy video production, users could add video capture cards, SSD cards (ASUS has one that can hold up to 21 SSDs on an x16 card) or anything else they feel would be needed.
GPUs are extremely unlikely and they've said so. The PCIe slots also all share 16 PCIe Gen 4 lanes and everything runs over PCIe switches. So while there is more slots than a typical consumer ATX board there is less lanes available and bottlenecking is more likely.

And for $7000 you've entered PC workstation territory where you can get something like an Intel Sapphire Rapids system with 112 PCIe Gen 5 lanes.
 
  • Like
Reactions: BenRacicot
It’s not that clock speed doesn’t matter, it’s that comparing ARM to PPC to x86 clock speeds is comparing apples to oranges to watermelon. An a14 chip running at 2gHz is obviously slower than an a14 chip running at 3.2 gHz, but when you try to compare across platforms, it’s meaningless. The architectures are too different, and what a given chip does in one cycle is too varied across platforms to make a direct comparison. The only way to really measure is to do practical testing, running photoshop macros or after effects renders or blender projects.
 
So I've read everything and I'm still not understanding how clock speeds would not increase perf. I understand that operations per cycle is optimized by software but it doesnt make sense that more of them wouldnt be faster.

Of course clock speeds increase performance. But it’s about what happens per clock, not just the clock itself. You can have a CPU running at 10 GHz but needing 10 cycles to execute an instruction, and you can have a CPU running at 1GHz but executing 10 instructions per cycle.

I already gave you an example before. Apple CPUs have twice as many FP units as x86 CPUs. This is why a 3.2GHz Apple chip ends up faster in scalar FP code than a 5.5 GHz Intel chip.

Very interesting what @leman mentioned about `AD102 being only 20% larger than the M2 Max`. Makes me think that an SoC arch could never beat any dedicated GPUs and the M3 certainly won't compete with the 50 series.

I never said that. It’s just an SoC with the same amount of compute units is going to be more expensive. This is obviously a problem for Apple, but also an opportunity - they care much less about wafer prices than Nvidia simply because they don’t sell the chips. The M2 Max for example costs more to manufacture than a 4090, but Apples margins are higher. And there are a lot of things Apple can do to improve their logic density, e.g. using 3D stacking (they’ve been working on relevant tech for years). Sure, Nvidia can do that too, but again, expenses.

And dGPUs have their own issues issues as well. Yes, it’s easier to pack more compute into a dedicated die, but data interfacing and RAM capacity are a problem. Nvidia’s solution to that is Grace/Hopper, but that’s a million $ system.
 
Wow if PCIe expansion (sans video) is the main benefit then the value is not there for the Mac Pro at all.
If you needed expansion you can get that for hundreds of dollars in a PC MB, not thousands in a Mac Pro.
These facts even signal that their wont be value in its future either.

It really seems like the death of a product line to me, unless the Extreme chips are coming soon to renew it.
Now go put Mac OS on your PC MB.
Apple no longer sells Intel Macs so the clock on hackintosh is ticking down to when Mac OS drops Intel support.
Apple supported PPC for 3 years after the switch to Intel and whilst we don’t know the timescale for dropping support it won’t be forever.
plus do you want to be tinkering or simply using your System.
I say as a person that used golden build Tonymacx86 hackintosh and you still end up spending time fixing rather than using.
plus only delaying the inevitable for when Intel support dropped.

the nearest you can get to adding the pci-e slots to an ASi Mac is with TB chassis.
2 x chassis with 1 x16 and 2 x8 will set back about 2.2k and then those slots on the end of a TB bus with extra latency compared with the internal slots.
also two extra power sockets, extra cables so get visual aspect of it coming into play.
they are also pci-e v3 and if people complaining that the internal slots not v5 then yes being v3 MUST be an issue for those people.

plus that big aluminium chassis isn’t cheap to make.

that 3k suddenly isn’t such bad value when add all that up IF you need the PCI-E slots.
 
Apple can do four independent floating-point operations per cycle, x86 processors can do only two.
Are floating point operations limited by the number of schedulers or by the number of execution units?
Zen 4 is a 64-bit superscalar, out-of-order, 2-way SMT microarchitecture with advanced dynamic branch prediction, 4-way decoding of x86 instructions with a stack optimizer, multiple caches including an Op cache for decoded instructions and prefetchers for code and data, four integer/address and two floating point instruction schedulers, 3-way address generation, 5-way integer execution, 4-way 256-bit wide floating point execution, a speculative, out-of-order load/store unit capable of up to three loads or two stores per cycle with a 48/88-entry load and 64-entry store queue, write-combining, and 5-level paging with four TLBs and six hardware page table walkers.
 
Are floating point operations limited by the number of schedulers or by the number of execution units?


From what I understand Zen4 execution units have asymmetric capabilities. You have two MUL units (responsible for multiplication) and two ADD units (responsible for addition). So it can do two multiplications AND two additions simultaneously, but not for additions or four multiplications for example, or say, three additions and one multiplications. Apple's units are more flexible and can do any mix of four basic operations (including full FMAs!) in one cycle.
 
From what I understand Zen4 execution units have asymmetric capabilities. You have two MUL units (responsible for multiplication) and two ADD units (responsible for addition). So it can do two multiplications AND two additions simultaneously, but not for additions or four multiplications for example, or say, three additions and one multiplications.
You're right.
AMD-Zen-4-Microarchitecture-Overview.jpg


By contrast:
Arm Client Tech Days CPU Presentation_Final-20_575px.png


Apple's units are more flexible and can do any mix of four basic operations (including full FMAs!) in one cycle.
Do you have any proof of that? Would the Mx be the only CPU capable of 4 FMAs at the same time?
 
I think Cortex X1 and later also have 4x 128-bit FMA units
Doesn't the Cortex-X4 graph show that it has two MAC units?

Some companies use the CoreMark/MHz ratio to compare microcontrollers. Could it be useful to use a similar ratio, perhaps with GB6 scores to compare CPUs?
 
Doesn't the Cortex-X4 graph show that it has two MAC units?

You’re right! Seems that Apple is ahead here.

Some companies use the CoreMark/MHz ratio to compare microcontrollers. Could it be useful to use a similar ratio, perhaps with GB6 scores to compare CPUs?

Useful for what? What is it that you want to compare? The efficiency of hardware implementation? IPC?
 
Just because video was not mentioned at WWDC does NOT mean video is permanently off the table.
Maybe not at ONE WWDC. But, in every WWDC since Apple Silicon, Apple has been communicating the same message to their developers. The GPU in Apple Silicon systems is integrated. Anything they said at the last WWDC was simply re-iterating what they’ve been saying for years.

Apple will be moving to whatever the next technology is, and folks will still be saying, “Any day now, they’re going to announce external GPU support for Apple Silicon!”
 
Maybe not at ONE WWDC. But, in every WWDC since Apple Silicon, Apple has been communicating the same message to their developers. The GPU in Apple Silicon systems is integrated. Anything they said at the last WWDC was simply re-iterating what they’ve been saying for years.

Apple will be moving to whatever the next technology is, and folks will still be saying, “Any day now, they’re going to announce external GPU support for Apple Silicon!”
Could happen when Mac gaming would support a $3k eGPU.

But by 2030 I could see a top-end Mac SoC have top-end RTX performance by that decade.
 
Last edited:
Maybe not at ONE WWDC. But, in every WWDC since Apple Silicon, Apple has been communicating the same message to their developers. The GPU in Apple Silicon systems is integrated. Anything they said at the last WWDC was simply re-iterating what they’ve been saying for years.

Apple will be moving to whatever the next technology is, and folks will still be saying, “Any day now, they’re going to announce external GPU support for Apple Silicon!”

Even in the Intel era, the only Mac with the capacity for upgrading the GPU was the Mac Pro. Prior to WWDC 2023 there were no Apple Silicon based Macs which even had internal expansion slots, so of course everything was integrated. All I'm saying is that the opportunity is there, not that there WILL be a discrete GPU solution.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.