Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
It may come to a shock to many but shipments of desktop dGPU have been at a downward slope since as early as 2005. This to me indicates that "perfect" 4090-like performance is not selling all that well.

Use case of decades past are now approaching niche.

The number does not discriminate whether the dGPU was used exclusively for crypto or non-crypto use case.

9hGBfdHQBWtrbYQKAfFZWD.png


Source: https://www.tomshardware.com/news/sales-of-desktop-graphics-cards-hit-20-year-low

Nvidia took advantage of the crypto craze to jack up their GPU pricing beyond any rational level. That's one big reason why AMD is content to keep most of its GPUs at $800 or less. It also doesn't help that they made the boneheaded decision to release the 4060ti at a price nobody wants to pay.
 
  • Like
Reactions: Longplays
Nvidia took advantage of the crypto craze to jack up their GPU pricing beyond any rational level. That's one big reason why AMD is content to keep most of its GPUs at $800 or less. It also doesn't help that they made the boneheaded decision to release the 4060ti at a price nobody wants to pay.
I am fairly certain Nvidia accounted for that price point by limiting available supply.

But how many would have PSU with a 1+kW input power? Not that many.
 
It may come to a shock to many but shipments of desktop dGPU have been at a downward slope since as early as 2005. This to me indicates that "perfect" 4090-like performance is not selling all that well.
….
The number does not discriminate whether the dGPU was used exclusively for crypto or non-crypto use case.

9hGBfdHQBWtrbYQKAfFZWD.png


Source: https://www.tomshardware.com/news/sales-of-desktop-graphics-cards-hit-20-year-low

First, at the height of the crypto craze not all GPU cards even made it into the retail market that Peddie is counting . Trucks showed up at GPU card factories and just hauled pallets of boxes away .


Second , much of the general trend here is the overall trend of laptops ( and much smaller extent SFF and all-in-ones ) replacing desktops. If number of desktops goes down then number of generic boxes with slots goes down . dGPs went into more laptop sales . Laptop being dominate drove more transistor allocation to iGPUs ( Intel kept CPU core count constant and grew iGPU budget )

the gamer high end cards never were that high in number the whole time . The graph can’t really surface there trendline . If there was a Pro market GPU card graph you would have a shot, but this graph is relat helpless in that respect. I think Peddie does have a graph somewhere of GPU card profits of consumer/gamer vs Pro cards . If I recall correctly , the consumer stuff is somewhat problematical in paying for itself . The Pro cards far more enabled long term investments into better improvements over multiple generations .

There is not much of a downward trend in Pro cards . Decoupled from the server only deployed cards there isn’t much huge growth either . The 4090 , 5090 , 6090 are likely on a trend of leaving the gamer crowd behind with increasing prices over time . Lots of folks are twisted in thinking Apple is using x090 as some sort of a stalking horse. I doubt they are . Nvidia is doing lots of stuff to make that uneconomical to match . ( and the product increasingly doesn’t have to make them money … so they don’t care if it stays in the ditch. As long as rest of business is going up to cover that up). It is a game of ‘chicken’ that really makes no sense for Apple ( or AMD ) to fantically chase in an even more crazy fashion.
 
First, at the height of the crypto craze not all GPU cards even made it into the retail market that Peddie is counting . Trucks showed up at GPU card factories and just hauled pallets of boxes away .
So the demand of desktop dGPU is actually lower than the chart shows?
Second , much of the general trend here is the overall trend of laptops ( and much smaller extent SFF and all-in-ones ) replacing desktops. If number of desktops goes down then number of generic boxes with slots goes down . dGPs went into more laptop sales . Laptop being dominate drove more transistor allocation to iGPUs ( Intel kept CPU core count constant and grew iGPU budget )
I agree with you.

Where Apple goes... the industry follows... I look forward to Mac Studio clones sporting Xeon and Threadrippers soon.
 
This was throughly measured on M2 Ultra already? M1 had issues even single die. That is why M2 got a substantively upgraded internal mesh network that delivers major uptick in performance with not backhaul Memory upgrade. And the 10,000 pads of Ultra... I have doubts the that UltraFusion really needed much adjustments for the 'new' M2 internal mesh adjustments. (For two dies in the M1 era that seemed to be overkill. Even the PCI-e additions they made in M2 .. still a bit of overkill.)

Can Apple scale that 4-way. Yeah, there I'm skeptical. Can they do 3-way and that's probably enough.
So having a central SoC with an UltraFusion interconnect on two opposite sides, and would the SoCs attached on either end of the central SoC have unused UltraFusion interconnects...?

Or maybe the three-way configuration is:

Regular SoC <UltraFusion> GPU SoC <UltraFusion> Regular SoC

The GPU SoC would be the only SoC with two UltraFusion interconnects, with the regular SoCs (Mn Max) having a single UltraFusion interconnect...?

@deconstruct60, truly curious as to how you see a 3-way SoC configuration working...?
 
Last edited:
@deconstruct60, truly curious as to how you see a 3-way SoC configuration working...?

Do not do 3 SoC . Actually use UltraFusion actually do a chiplet design. Stop pounding roound pegs into square holes.


> 4 Thunderbolt ports is suspect. > 6 is definitely dubious. > 8 is cuckoo for cocopuffs ; that is just really bad fanatical ( lack of ) design. Slapping together parts that were throughly designed to be monolithic is goofy . So step one is decoupling I/O so do not drag along stuff they do not need as scale up ( or at least try to ) . Same thing for SSD controller , Secure Enclave , Provisioning for WiFi , etc .

[ I/O die ]
[ compute / memory. Cores/controllers die ] C1
[ compute / memory. Cores/controllers die ] C2
[ compute / memory. Cores/controllers die ]. C3
[ I/O die ]

Not a single one of those dies are an SoC . Nor do the two types need to be the same size ( but both likely smaller than a M2 Max ) .


They don’t need to change the density of the the Core cluster dies at all ( like Zen 4 -> 4c ) … it is more so different decomposition of the blocks to not waste die space on stuff they don’t need. But do allocate more die space to more UltraFusion links without mucking up the placement of the memory controllers and the required external Memory packages .

Only want one Comptue tile in package ? That is OK. Only want one I/O die ? That is OK. Only Two of each? Just fine.

The two I/O dies probably don’t need to transfer relatively large amounts of data to one another. So their distance apart probably would nor incur much of a NUMA impact . Furthermore the stuff on the ‘other side‘ of the I/o dies is off package anyway . so it was already in another NUMA acesscess zone ( much higher latencies so not getting significantly slowe even if have to go off-on-off package to connect external points .

Apple has done two dies with negligible NUMA impact , so the C1-C2 and C2-C3 are not covering new ground . Stripping some of the ‘top’ ( relative to current MAx ) die should makeC1 and C3 closer since have effectively made all the C dies smaller . If go to TSMC N3 , then an even better chance of making them

This has kept the memory controllers coupled close to the respective GPU cores. so don’t have to make changes to Apple’s “poor man’s” HBM subsystem . So don’t have to do major changes there. To increase bandwidth they need a bigger package to wrap more Memory packages down each side of the whole SoC on the cheaper PCB like interposer. The C dies are smaller than a Max dies , but not made dramatically smaller. Still need some substantive bulk to talk to he RAM .



It is similar to how Apple shared lots of overlap between the M2 Pro and M2 Max . But instead of making a bigger monolithic die ( for the Max ) they just couple ‘more cores’ on with UltraFusion . The ’problem’ isn’t Ultra Fusion , it is the chunky die the hooked it to currently .


Also has side-effect that the laptop Max die can stop wasting space on UltraFusion connection dead space that provide not value for end users there whatsoever.


If Apple refuses to decouple from the laptop Max , then I don’t see them go past two. If controlling costs to minimal levels then may just make the Max die a bit bigger and make “bigger“ Ultra and just cover what they can . They will just ride fab improvements , until they hit the wall.

There are other far more exotic and far more expensive options . the Mac market just isn’t that big .
 
So the demand of desktop dGPU is actually lower than the chart shows?

Not really . A chunk of those cards really were not going into desktops in any normal sense of the word. I’m saying that chart never fully captured the crypto craze . Folks did lots of crazy stuff


UBied7FpR5E8A54oyCi6B5-1920-80.jpg



the fact that chart only shows a relatively mild blimp during 2019-2022 is dubious given the rampant shortages .

Folks doing crazy stuff isn’t going to show on a meter that is trying to measure the normal market.


I agree with you.


Where Apple goes... the industry follows...


errr. One of the major reasons Apple jumped onto the x86 train was that they didn’t have a viable uppeer end laptop chip that Intel had the road map . Apple didn’t invent the aggregate PC market shifting to laptops. . That is some Cupertino kool-aid. Apple has been a bit more nimble in following where CUSTOMERS are going ( demand ). They are not shaping major demand trends for the whole market.

The shift to laptops was not Apple . Apple tossed aside substantive sections of the overall PC market but that is not shaping trends . Apple only picks a profitable subset of the market to operate in .Apple tossed away more desktop market than the major players . That isn’t shaping market trends . That is filling the Scrooge Mc Duck money pit more full ( smaller staffing focused on a fixed sized scope of products )




I look forward to Mac Studio clones sporting Xeon and Threadrippers soon.
[\quote]

don’t hold your breath . MI300 isn’t a clone . Uses rich man , real HBM3.
AMD and Intel have different folks to sell to .

also a much bigger uptick in CXL adoption in that space .

what may see in about two years is a big multiple die package from Intel that is a variant off the top end of the consumer CPU/GPU dies that overlaps much more with Apple’s Studio and Mini Pro.
 
  • Like
Reactions: Longplays
I am fairly certain Nvidia accounted for that price point by limiting available supply.

But how many would have PSU with a 1+kW input power? Not that many.
I started at 1K with 3090, had to go to 1600 for 4090. That thing easily draws 600 W, unless I under-power by 70% and under lock by 30%.
I see 15-20 4090s sitting on the shelf in Microcenter, gone are those days of people lining up an hr before the store to buy.
 
  • Like
Reactions: Longplays
Every single thread ends up with GPU discussions as if 4090 GPU compute is everything for the majority of user - it is not. Perhaps back on topic.

Would it make sense for Apple to make higher clocked SoC for MP/Studio as a cheaper method to increase the performance compared to a new chip. If increased clocks required redesign chips, chips will be very expensive and hence not done.

It is actually much more interesting how much can be done in the Air and MBP form factor than the very top end. Fingers crossed for dedicated 3D-ray tracing. We know in September at the iPhone event.
 
Just because video was not mentioned at WWDC does NOT mean video is permanently off the table.
Apple did more than merely stay quiet about external GPUs at WWDC. John Ternus, Apple's senior vice president of Hardware Engineering, said this in an interview with John Gruber:

Gruber: Are there technical barriers to having expandable graphic through PCI that would be only used for compute as opposed to video? Or is that just a design choice?

Ternus: I think, I mean, fundamentally, we've built our architecture around this shared memory model and that optimization. And so it's not entirely clear to me how you'd bring in another GPU and do so in a way that is optimized for our systems. It just hasn't been, it hasn't been a direction that we wanted to pursue.

Translation: "Never say never, but I don't see us doing this."

As others have mentioned, if they want to add GPU power, they would most likely do this by instead having modular AS CPU and GPU SoC components, such that you could add extra AS GPU chips to customize a the Mac Pro for GPU-heavy needs. But this would be too expensive to do for the Mac Pro only, so it would need to be something they would want to do across the Mac line. I've no idea what the chance of this would be.

 
Apple did more than merely stay quiet about external GPUs at WWDC. John Ternus, Apple's senior vice president of Hardware Engineering, said this in an interview with John Gruber:

Gruber: Are there technical barriers to having expandable graphic through PCI that would be only used for compute as opposed to video? Or is that just a design choice?

Ternus: I think, I mean, fundamentally, we've built our architecture around this shared memory model and that optimization. And so it's not entirely clear to me how you'd bring in another GPU and do so in a way that is optimized for our systems. It just hasn't been, it hasn't been a direction that we wanted to pursue.

As others have mentioned, if they want to add GPU power, they would most likely do this by instead having modular CPU and GPU SoC components, such that you could add extra AS GPU chips to customize a the Mac Pro for GPU-heavy needs. But this would be too expensive to do for the Mac Pro only, so it would need to be something they would want to do across the Mac line. I've no idea what the chance of this would be.

If Apple moved ~7.5 million Ultras annually then I can see Apple spending a bit more on GPU performance & eGPU.

But then again desktop dGPU is at its 2 decade low. Mind you that GPU mining started in 2010 so not all of those desktop dGPUs are going to consumers or other non-mining activities.

Before anyone says that Nvidia's pricing has something to do with it consider that lowering of consumer demand has impacted economies of scale that may necessitate Nvidia et al to jack up prices to cover increasing costs.

This is why Intel, AMD and Nvidia are focusing on A.I. chips.

9hGBfdHQBWtrbYQKAfFZWD-970-80.png.webp
 
Gruber: Are there technical barriers to having expandable graphic through PCI that would be only used for compute as opposed to video? Or is that just a design choice?

Ternus: I think, I mean, fundamentally, we've built our architecture around this shared memory model and that optimization. And so it's not entirely clear to me how you'd bring in another GPU and do so in a way that is optimized for our systems. It just hasn't been, it hasn't been a direction that we wanted to pursue.

Seems like a finely crafted non-answer, deflecting around the actual question of possible Compute/Render add-in ASi GPGPUs...

I would bet Apple has something in the works and it is not ready yet, because it is another thing waiting on the 3nm process...

M3 Extreme = more iGPU horsepower (meets/beats RTX 4090/5090)

ASi GPGPUs = massive amount of compute/render horsepower (to drive ASi Mac Pro & ASi eGPGPU sales)
 
Last edited:
Every single thread ends up with GPU discussions as if 4090 GPU compute is everything for the majority of user - it is not. Perhaps back on topic.

The fact that Apple pragmatically dropped Nvidia back in 2018, the highest end consumer Nvidia card is always going to be the 'bogeyman'. Even when the intel based, still runs Windows 'raw iron , Mac Pro 2019 was introduced the x090 was the 'bogeyman' that Apple was letting get away. There is a faction that thinks that if they complain loud enough for long enough Apple will let Nvidia back into the macOS ecosystem. Probably delusional (Nvidia burned lots of bridges behind them on their way out and don't really care that they did. ) , but also likely not going to give up on the delusion either. It is the convenient 'hand grenade' that the more modular focused folks can throw.


Would it make sense for Apple to make higher clocked SoC for MP/Studio as a cheaper method to increase the performance compared to a new chip. If increased clocks required redesign chips, chips will be very expensive and hence not done.

The Mac Studio still many many of the Mini's thermal limitations ( only one air inlet on the bottom of the device). it isn't an enclosure where going to say "let's just throw thermal limits out the window... crank the clocks as high as they'll go" device.

Clock speed is a substantive input into the design core requirements. ( how close adhere to the target may be an open question. ) The question is just how a big of a delta around the targeted speed really talking about here. Is Apple suppose to be doing "keeping up with the Jones' " kind of clock speed increase ? ( attempt at 'king of the benchmark chart') Or this is some kind of 'elitist' feature where 2-5% faster than the other options is good enough ( so Mac Pro not exactly the same as some other Mac product). In other words , just generate a different number on average.

Note that clocks don't appear in any of Apple's modern 'tech spec' pages for cores. AMD/Intel CPU processors are explicitly marketed on clocks because they sell such a broad variety of CPU packages. If you have 2-3 eight core packages then how do you sell them to different folks? In that context, clocks are useful market segmentation tool . Those CPU vendors have three or four more order of magnitude more products to be placed into than Apple's five (or six) Mac products. How many different 10 CPU core M2 Mini Pros does Apple sell? One. would two or three 'help' them much ( and what is the supply chain , inventory additional costs ) ?

Apple's silicon strategy has largely been based on doing as few SoCs as they can get away with. ( lots of 'hand me down' usages across products). They are not trying to build as large and complicated of a SoC portfolio as possible because the Silicon group really only has one customer ( Apple systems ) to sell into.

Large clock increases while keeping the memory subsystem exactly the same isn't necessarily going to help a lot. Especially, if already heavily leaning on cache to mitigate memory access times at the 'lower' clock speeds. So there are possible impacts outside the die design also.


It is actually much more interesting how much can be done in the Air and MBP form factor than the very top end. Fingers crossed for dedicated 3D-ray tracing. We know in September at the iPhone event.

May or may not know. Depends upon how much hw 3D ray tracing 'costs' in die space. It doesn't have to enormous , but the A series die size has gotten bloated on the last two iterations and Apple is likely looking to make it smaller ( as TSMC N3 costs substantively more). Not smaller than it has historically averaged , but back onto the average ( ~90mm^2).

Incrementally, faster CPU/GPU cores and memory would bring incrementally faster 3D-ray tracing even without hw assist. The hw assist would be a 'nice to have' , not a 'you can't ship without it' .

The plain M3 would be larger , so the relative size increase impact should be smaller. ( also has bloat problems, but also doesn't sell in the 100M range either (and higher accumulative increase wafer cost overhead). ) and generally goes into more expensive products than a plain entry model iPhone in the first year.
 
  • Like
Reactions: CWallace
Just because video was not mentioned at WWDC does NOT mean video is permanently off the table.
Apple did more than merely stay quiet about external GPUs at WWDC. John Ternus, Apple's senior vice president of Hardware Engineering, said this in an interview with John Gruber:

Gruber: Are there technical barriers to having expandable graphic through PCI that would be only used for compute as opposed to video? Or is that just a design choice?

Ternus: I think, I mean, fundamentally, we've built our architecture around this shared memory model and that optimization. And so it's not entirely clear to me how you'd bring in another GPU and do so in a way that is optimized for our systems. It just hasn't been, it hasn't been a direction that we wanted to pursue.

Translation: "Never say never, but I don't see us doing this."

Multiple out of context 'words' ( 'video' , 'compute' , 'graphics' , 'GPU' ) all supposedly pointing at the same 'card' really doesn't help much.

The full context of Gruber's question is this:

"Obviously Industry Wide the theme of the year is AI and AI training , and that whole area it seems that all of the compute takes place on graphics card. And .... users looking to do their own AI training"


[ at 22:40 ]


Here Gruber is basically equating 'AI compute and 'graphics card' ( He is heavily entangling 'AI' to 'graphics' to the extent he is using them later as essentially as interchangeable words for the same topic. ) . GPU cards and AI training really not that necessarily entangled. Don't need a GPU card to do AI training ( or inference). And 'graphics' is dubious since vast majority of these "in the cloud" data center cards have any video out what so ever (So direct graphics output isn't the focus). Same issue for the contextual quoted comment where 'video' and 'graphics processor' are being used as completely interchangeable terms when that is not really true.


So when Gruber is asking about " are there technical barriers to expandable graphics through PCI that would only be used for compute , as opposed to video"... that is essentially in his context " expandable graphics through PCI that would be only used for AI compute .... "

First, Gruber is explicitly leaving video out of the loop all together. So Apple's response really isn't directly at 'display GPU' , but it will have very similar issues.

So we get the Apple responses of essentially "go buy your AI compute in the cloud" along with this "our optimizations are not aimed there" response. Which shouldn't be much of a surprise. Apple probably isn't going to be a yapping dog chasing after the large(est) language model problems.

Apple has a wide enough number of markets they are trying to penetrate with well funded competitors. ( AMD/Intel/Qualcomm in PCs space ) . Qualcomm in mobile and celluar modem space . Nvidia/AMD/Intel in consumer GPU space. Nvidia/AMD in non-AI-focused Pro card space. Qualcomm and others ( VR headset space). They don't need to chase another , 'red hot' area that is drawing gobs of both VC money and funding from entrenched players.

Apple's 'optimizations' probably has as much to do with their software as it does the hardware. Tossing CUDA and anything Khronos compute ( OpenCL , SYCL ) out the window basically completely detaches them for all of Nvidia , AMDs , and/or Intel efforts in compute GPU cards also. Apple is pretty much hostile to the software stack those cards are primarily aimed at. Apple has 'optimized' to Metal only. Which helps them find deeper synergies inside their own products behind their 'moat' , but at the same time worse portably outside the 'moat'. That optimization isn't 'free' ( it is dual edged).

Apple ejecting three 8-pin AUX power connectors out the window from the Mac Pro a 'heavy compute' card is not an option they are heavily considering. Probably a mistake long term, but in the mist of doubling down on marshaling the transition it is understandable. ( it is a mistake to forcibly drumbeat the vast majority of folks trying to do something portable off your system. Especially, when it largely doesn't have a GUI interface anyway. )


As others have mentioned, if they want to add GPU power, they would most likely do this by instead having modular AS CPU and GPU SoC components, such that you could add extra AS GPU chips to customize a the Mac Pro for GPU-heavy needs.

Apple's comment doesn't surface anything modular in the terms of multiple, modular SoCs at all there in the response. People outside of Apple keep trying to 'invent' that, but the 'optimizations' he refers to and the pragmatic 'unified, uniform' == 'shared' memory approach they have taken doesn't really lead much to modularity of two SoCs present in the same system aligning with the optimizations that they did. Trying to couple multiple large GPU core clusters together does have very real NUMA issues to get around. Even more so if have video out assigned to the same set of GPU clusters and associated display controllers hanging on same internal bus and memory systems.


Could the whole SoC subassembly ( SoC plus hyper coupled other components) be on a swappable card. Maybe. I don't think that will get folks what they want though ( a generic , long term, longitudinal slot/socket. ). Or at easily affordable prices.


The optimizations that Apple has done would push more so to perhaps having multiple macOS instances run inside the same system. [ that could perhaps cluster instances without tons of external container overhead on the 2nd or 3rd system instance. but not a application transparent unified instance. ]



But this would be too expensive to do for the Mac Pro only, so it would need to be something they would want to do across the Mac line. I've no idea what the chance of this would be.

Once the a M-series has a decent PCI-e interface ( > x16 PCI-e v4 in bandwidth) making a "Mac on a Card" would not be either hyper expensive , nor blocked from interacting with other Macs that only have TBv4 ports on them. If it is a 'remote' computer that access over virtual Ethernet-over-PCI-e then it is just another Mac on the network. TBv4 would be al lot slower cluster interlink network than x16 PCI-e v4 , but faster than most 1-10GbE implementations. (relatively cheaper too since no additional wires or Ethernet switches needed).

Pretty hurdle there is trying to get Industrial design to sign off on not fully enclosing it in a container of their own design and possibly using an AUX power cable to power it. But it is completely possible to put a whole computer onto a card with Apple SoCs. (at least small to midized ones. Monster large ones probably not. ).
 
  • Like
Reactions: bcortens
Apple could mount a dart board with potential Mac Pro features, throw dice at it that, which bounce off and land in a tub of tea… THEN read those leaves (in HEXadecimal, of course) to figure out what features to put in/leave out, and produce that. And, it’d have the same impact on their yearly revenue. :)
 
  • Love
Reactions: Longplays
The Mac Studio still many many of the Mini's thermal limitations ( only one air inlet on the bottom of the device). it isn't an enclosure where going to say "let's just throw thermal limits out the window... crank the clocks as high as they'll go" device.

Apple's silicon strategy has largely been based on doing as few SoCs as they can get away with.

Depends upon how much hw 3D ray tracing 'costs' in die space. It doesn't have to enormous , but the A series die size has gotten bloated on the last two iterations and Apple is likely looking to make it smaller ( as TSMC N3 costs substantively more). Not smaller than it has historically averaged , but back onto the average ( ~90mm^2).

Incrementally, faster CPU/GPU cores and memory would bring incrementally faster 3D-ray tracing even without hw assist. The hw assist would be a 'nice to have' , not a 'you can't ship without
If done properly the same SoC could be overclocked for the 400W cooling in Mac Pro and therefore a cheap way to increase the performance. Yes the chips will be smaller again, especially for the M3. The Max chip? Not so sure as ASi is competitive in mid range and lower high range compute application and I think Apple want to stay competitive in that market.

I agree that HW 3D ray tracing is a special case just like video encoders so these are optional and dependent on the markets Apple want to address. Apple is strong on video so video encoders are included. Given the M2 in Vision Pro, hw ray tracing in future M3 is possible because of need of energy efficiency.
 
If done properly the same SoC could be overclocked for the 400W cooling in Mac Pro and therefore a cheap way to increase the performance. Yes the chips will be smaller again, especially for the M3. The Max chip? Not so sure as ASi is competitive in mid range and lower high range compute application and I think Apple want to stay competitive in that market.

It sounds like you are pointing at the faster clocks in 'king of single thread benchmark' sense. Apple competitive with what though. The M1/M2 Max is mainly deployed as a laptop SoC. Apple isn't competitive with general PC market laptop thermal constraints?

Are the AMD and Intel CPU P cores 'properly' designed for maximum effectiveness in the laptop space ? Nope. So should Apple throw that away? The AMD/Intel P cores are 'greedy'. Greedy on die area consumed (relative to their cohort cores) and greedy on power consumed ( again relative to cohorts on die). Apple are 'properly' design to better get along with its die area cohorts. Which leads to greater , more well rounded performance rather than a single thread dragset focus.

If winning the drag race crown means that the GPU core count goes down , is that really an overall win? 'Properly' is a balancing act that extends beyond just the Arm P cores.

Apple is in a bit of an inbetween zone.


The Ultra does better than the very high count Xeon/Epyc dies on single thread and hangs with the 'mid range' (if include the more relatively expensive workstation packages) 13900K on multiple threading. ( behind on ST cypto to 13000 but ahead of 13000 MT crypto. ). As add P-core clusters , because they get along with other clusters (including other P-core clusters) it scales up better. If 'good neighbor' is a design criteria that is 'properly'.

The M2 implementations are being held back by fab process ( using a relatively old one). TSMC N3 should give Apple more 'slop' around a specifically sized P core. ( can upclock it without driving the footprint substantively higher. ) . But when competitors are on the exact same fab process and just want to allocate very much larger footprints to their P-cores ... I think Apple is just going to take the 'hit' and just try to get the next fab process first to leapfrog a bit so the gap is smaller.




For the Mac Pro ( which will probably only have multiple die SoCs) there more chips and UltraFusion interfaces they use the more power is going to go into the overhead for just that. Just trying to keep everything 'fused' is going to cost overhead. Probably not gong to throw everything at just one single hot rod core. Again the overall system would take priority in the design goals. If there are opportunity under the constraints left for a P-core to push those boundaries then fine. But not vice versa.


I agree that HW 3D ray tracing is a special case just like video encoders so these are optional and dependent on the markets Apple want to address. Apple is strong on video so video encoders are included. Given the M2 in Vision Pro, hw ray tracing in future M3 is possible because of need of energy efficiency.

Concur that Hw-Ray tracing will be more efficient and performance increase will 'fall out of that'. As opposed to, HW-Ray is some kind of 'brute force' thing with performance ( beating the 'foo XYZ' GPU benchmark) as some primarily objective.
 
I am fairly certain Nvidia accounted for that price point by limiting available supply.
I read this myth/consipiracy theory on the internet all the time. Do you have a source for that? Because limiting supply to get a higher price will usually lead to less revenue and less profit.

You can do the math yourself. Elementary school equation.

Assuming $250 to manufacture each GPU.
Sell 20 GPUs at $1,000 each = $20k in revenue. $15,000 in profit.
Sell 50 GPUs at $500 each = $50k in revenue. $37,500 in profit.

Not only that, Nvidia would not want to lose market share to AMD by artificially their own limiting supply.

The simplest explanation was that there was a huge supply and demand imbalance. Covid and crypto made Nvidia GPUs boom in demand. Supply was being increased as fast as possible but still couldn't meet demand because companies usually have to order chips from Samsung/TSMC many months in advance and manufacturing nearly ground to a halt.

GPUs aren't diamonds - where artificial scarcity is the only way to increase profit.
 
Last edited:
  • Like
Reactions: CWallace
It may come to a shock to many but shipments of desktop dGPU have been at a downward slope since as early as 2005. This to me indicates that "perfect" 4090-like performance is not selling all that well.

Use case of decades past are now approaching niche.

The number does not discriminate whether the dGPU was used exclusively for crypto or non-crypto use case.

9hGBfdHQBWtrbYQKAfFZWD.png


Source: https://www.tomshardware.com/news/sales-of-desktop-graphics-cards-hit-20-year-low
It doesn't shock me at all.

DIY computer market has been in slow decline ever since its peak in the 2000s. Basically, people now buy gaming laptops instead of building their own desktops.[0] So Nvidia is selling more discrete laptop GPUs but fewer desktop discrete GPUs. Without crypto, the decline would have been even faster.

Anecdotally, I used to build PCs myself and buy discrete GPUs. Now I only want to game on my laptop and own one computer.

[0]https://web.archive.org/web/20220628032503/https://www.idc.com/getdoc.jsp?containerId=prUS47570121
 
Last edited:
  • Like
Reactions: Longplays
I read this myth/consipiracy theory on the internet all the time. Do you have a source for that? Because limiting supply to get a higher price will usually lead to less revenue and less profit.

You can do the math yourself. Elementary school equation.

Assuming $250 to manufacture each GPU.
Sell 20 GPUs at $1,000 each = $20k in revenue. $15,000 in profit.
Sell 50 GPUs at $500 each = $50k in revenue. $37,500 in profit.

Not only that, Nvidia would not want to lose market share to AMD by artificially their own limiting supply.

The simplest explanation was that there was a huge supply and demand imbalance. Covid and crypto made Nvidia GPUs boom in demand. Supply was being increased as fast as possible but still couldn't meet demand because companies usually have to order chips from Samsung/TSMC many months in advance and manufacturing nearly ground to a halt.

GPUs aren't diamonds - where artificial scarcity is the only way to increase profit.
My apologies for the confusion.

I meant was Nvidia saw the projections of desktop dGPU sales and adjusted prices to cover the falling demand.

Similar to how Apple jacked up the 2023 Mac Pro by $1k knowing that demand is softening due to changing use cases and most preferring the Mac Studio for a pro desktop.

Mining aint as profitable as before and China banned it so that impacted desktop dGPU sales.
 
Last edited:
Signs that desktops & desktop dGPUs are in decline. x86 is getting fresh new competition from ARM SoC. In 2022 x86 shipped 263.7 million while ARM SoC shipped ~1 billion.

Less than a month ago Mediatek will be using Nvidia GPUs in their SoC for the automotive industry



Would not be too far fetch for MediaTek to eventually use them in smartphones SoCs


ARM SoC PCs too


Samsung's smartphone SoC is furthering its use of AMD GPUs


Qualcomm is releasing ARM SoC meant for PC laptops that was engineered by ex-Apple engineers that worked on Apple Silicon.


Windows 11 on ARM will help in this transition


To counter this Intel is proposing reducing legacy x86 architecture with Intel x86-S. This frees up space on the chip die for architecture relevant for the next half century & not the past half a century.


This will anger legacy x86 hardware/software users as it means higher prices on future legacy x86 products due to lower economies of scale. Reactions will be similar to Mac Pro Intel users when the M2 Ultra came out.

Legacy x86 will become as relevant as mainframes are today. Mainframes may be superior in ~1% of all uses cases today but ~99% the time people will prefer a "good enough" solution that SoC brings today.

If Intel is not that careful this may be their Kodak moment™

 
Last edited:
It sounds like you are pointing at the faster clocks in 'king of single thread benchmark' sense.
No really and not to compete with others to be the compute king but to squeeze out a little more from existing SoCs in cases where there are less thermal restrictions (Max in the studio and Ultra in the Mac Pro). Is it worth to over clock a modest 10-30% assuming perfect scaling on all subsystems such as RAM? Marketing for sure would love it.
I read the post but can't find references to clocking existing SoCs higher. The putative Extreme chip is a 4X Max not an overclocked Ultra. The hypothesis is that a overclocked Ultra would be cheaper than completely new 4X Max design.
 
I read the post but can't find references to clocking existing SoCs higher. The putative Extreme chip is a 4X Max not an overclocked Ultra. The hypothesis is that a overclocked Ultra would be cheaper than completely new 4X Max design.
Apple's chip design has them already at their peak performance per watt. Any higher and improvements is not linear.

Apple has historically down clocked their chips but not up clock them.

Part of the art of designing a microarchtitecture is striking the proper balance of complexity between IPC (which is usually associated with design complexity) and speed (which is usually associated with process complexity).

x86 cores have traditionally relatively narrow and have aimed towards the highest clocks allowed by the process.

This has been going on for ages BTW. Even back in the 80s this debate was still raging, and it was defined as "Speed Demons" (fast narrow cores, usually in-order back then) vs "Brainiacs" (wide superscalar cores, introducing of out of order).

Back then the debate was due to levels of integration; speed demons were smaller and could be implemented in a single chip, and thus run faster. Whereas brainiacs needed several chips to implement the architecture, and thus they had to run slower clocks. So each approach tried to balance have clocks as fast as possible vs do as much per clock as possible.

So far in terms of efficiency, Apple has demonstrated that their very wide but slower clocked ARM cores (that aim to get the best IPC possible) are more efficient, than the competing narrower x86 cores that boost higher but are less aggressive in their speculation/parallelism.

But you also need to understand that there is a big "business" part in these decisions.

Apple can afford to have larger cores, because they are providing a vertially integrated product to the end consumer. So although their chips are more "efficient" in terms of their power/performance envelope. They are less "efficient" in terms of their SoC die area. But that is OK, because they are using other parts of the product to subsidize the cost of the whole. In fact, Apple subsidies the cost of their cores through their entire product line, including smart phones.

Whereas the consumers for AMD/Intel are their OEMs, who end up creating the whole end product that it is sold to the end consumer. Therefore, x86 vendors have to be more efficient in terms of area, since their profit/margin comes from the chip itself not the end product.

In this regard, AMD has to be even more aggressive in terms of area than Intel, since they don't own their fabs. So AMD currently, tries to pack as many cores as possible per chiplet in order to maximize their profits. Which is why they tend to favor smaller cores than can be clocked relatively high.

So although their chips may not be as efficient in terms of their power/performance envelope once they clock as high as they have to in order to match the performance of Apple (for example). They're still competitive enough that AMD can make a profit out of the chips.

Hope this makes sense.
 
Last edited:
No really and not to compete with others to be the compute king but to squeeze out a little more from existing SoCs in cases where there are less thermal restrictions (Max in the studio and Ultra in the Mac Pro). Is it worth to over clock a modest 10-30% assuming perfect scaling on all subsystems such as RAM? Marketing for sure would love it.

We should remember a key selling point for Apple is how quiet the Mac is in even stressed operations, so even if the Studio's cooling system could dissipate far more heat, that would make the machine's fans louder and considering how many M1 Studio owners on this forum and elsewhere complained about fan noise to the point Apple explicitly moved to address that with the M2 cooling system...

Now it is true the 2023 Mac Pro is ridiculously over-cooled due to it being originally designed to handle Intel and AMD space heaters installed inside, there is still only one fan (the top one) cooling the SoC package so, again, while Apple could spin that fan (at least) harder, that would generate noise which runs counter to not just what Apple wants, but what many Mac Pro users who have their machine next to them want.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.