Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
Cant explain others? I'm done then. Good luck.
Ok. Let’s do it with photogrammetry.

Tell me which card you need four of.

Now, imagine a single card with the same performance as four of that card.

Now you are able to do the same job with 1 card instead of four.



What you seem to be missing is that the mere fact that there are four of something doesn’t mean it’s better than a single thing that is four times as fast.
 

diamond.g

macrumors G4
Mar 20, 2007
11,437
2,665
OBX
The way things are going [M1 in ipads, laptops and desktops], I think the offer of choice is going to get reduced even further in the next pro machines. Apple have shown us their direction, and I doubt it will change that much moving forward. Thats where the big questions on the Mac Pro come along, as it will need something totally different and would it be worth the effort?
Well it would be a good time to bring back the “trash can” Mac Pro Form factor.
 

diamond.g

macrumors G4
Mar 20, 2007
11,437
2,665
OBX
Ok. Let’s do it with photogrammetry.

Tell me which card you need four of.

Now, imagine a single card with the same performance as four of that card.

Now you are able to do the same job with 1 card instead of four.



What you seem to be missing is that the mere fact that there are four of something doesn’t mean it’s better than a single thing that is four times as fast.
I think that if a task can be accelerated with multiple of something always having multiple will be faster than just having 1. So if you get 100 TFLOP with 1 ASi GPU then you would get 100 * x with multiple units, right?
 

richinaus

macrumors 68020
Oct 26, 2014
2,432
2,186
I have to patient or they’ll ban me again :)
I was nearly banned for lack of patience the other week :)

Machine learning, Photogrammetry, 3D graphics, Video(Davinci), CG, and more. You are not convincing especially since you are not mentioning the work scale. Most people dont need multi GPU.

I am in 3D graphics, CG etc.
it is better to off load the more hardcore visualisation to the cloud these days or a cluster, and not rely on one desktop to do everything.
The hardware thesedays is getting to a point where I am close to saying my 16” MBP is good enough [it isn’t yet, but I am pretty sure the next M varient will be].
Saying this, as a mid level power user, the requirements for multi CUDA cores and multi core processors for the limited amount of people, will most likely be satisfied in PC’s. I simply cannot believe Apple will stay with large desktop computers with the directions already shown.

The Cube / trash can will be the future Mac Pro.
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
I think that if a task can be accelerated with multiple of something always having multiple will be faster than just having 1. So if you get 100 TFLOP with 1 ASi GPU then you would get 100 * x with multiple units, right?

Sure. But the statement was that certain tasks *always* require four GPUs. And my point is that if it requires four GPUs, that means it has a specific performance requirement.

And logically, if a single GPU meets that performance requirement, then you don’t need four of them.


If the original statement was “multiple cards is better because you can increase performance,” that would be a different thing.
 

diamond.g

macrumors G4
Mar 20, 2007
11,437
2,665
OBX
Sure. But the statement was that certain tasks *always* require four GPUs. And my point is that if it requires four GPUs, that means it has a specific performance requirement.

And logically, if a single GPU meets that performance requirement, then you don’t need four of them.


If the original statement was “multiple cards is better because you can increase performance,” that would be a different thing.
That is fair. It would be interesting to see if Apple decides to continue with a *TX Full tower case (is it ATX, or BTX) or switch to either the trash can shape or maybe the G4 Cube shape.
 

sunny5

macrumors 68000
Jun 11, 2021
1,838
1,706
Sure. But the statement was that certain tasks *always* require four GPUs. And my point is that if it requires four GPUs, that means it has a specific performance requirement.

And logically, if a single GPU meets that performance requirement, then you don’t need four of them.


If the original statement was “multiple cards is better because you can increase performance,” that would be a different thing.
Did I even say always? I said there are several tasks require multiple GPU. You have to admit the fact that multiple GPU still exists.
 

diamond.g

macrumors G4
Mar 20, 2007
11,437
2,665
OBX
  • Like
Reactions: JMacHack

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
Did I even say always? I said there are several tasks require multiple GPU. You have to admit the fact that multiple GPU still exists.

You also didn’t say “given current GPU performance.” You said that there are tasks that require multiple GPUs, even future apple GPU’s whose performance is unknown.

So, yes, you did say always.

Anyway, I’ve been repeating myself, so I’m going to go ahead and turn my stereo to 11 (which is louder than your stereo which only goes to 10), and let someone else explain it to you.


Just remember: 4x < y if y > 4x
 
  • Haha
Reactions: sunny5

leman

macrumors Core
Oct 14, 2008
19,522
19,679
So, what if the “GPUs“ are not GPUs as we know it right now, but a new kind of dedicated, massively parallel computing devices accelerating specialist applications?

Well, thats what a modern GPU is in a nutshell - a massively parallel processor. Since a GPU is expected to do all kinds of tasks, you can’t make it too specialized though. In the end, Apple is taking a redundant approach. They have the GPU as a general-purpose parallel processor, the NPU as a specialized ML accelerator and the AMX coprocessor as a more flexible (but slower) matrix multiplication unit. I think it makes a lot of sence.
 

richinaus

macrumors 68020
Oct 26, 2014
2,432
2,186
You also didn’t say “given current GPU performance.” You said that there are tasks that require multiple GPUs, even future apple GPU’s whose performance is unknown.

So, yes, you did say always.

Anyway, I’ve been repeating myself, so I’m going to go ahead and turn my stereo to 11 (which is louder than your stereo which only goes to 10), and let someone else explain it to you.


Just remember: 4x < y if y > 4x

The issue on these forums, is that we all have our own specific needs.
Apple often doesn’t meet these needs so we get angry and come on macrumors to complain.

Apple are a consumer company, and will develop systems for the majority of users.

The majority of Apple users, I would suggest, want a simple box they can just turn on and not worry about it.

A single GPU in a smaller box that is quick enough for most people, will make the machine attractive. Just like the trash can - this is the Apple way really and was the way they were going until the hardware [that they couldn’t control] went in a different direction.

Now they control the hardware, they can develop a GPU that performs as well as 4 if they want, or develop GPU’s that work for most pro users, in an attractive compact package.
 
  • Like
Reactions: Fawkesguyy

bobcomer

macrumors 601
May 18, 2015
4,949
3,699
I certainly see Apple pushing a smaller desktop box that is unexpandable, rather than continuing with the Mac Pro [except for some vanity reasons and because they can].
That's pretty much the same way low and medium end Windows business desktops are going -- the last bunch I bought were just that. It's easier to buy them that way...
 

09872738

Cancelled
Feb 12, 2005
1,270
2,125
Well, thats what a modern GPU is in a nutshell - a massively parallel processor. Since a GPU is expected to do all kinds of tasks, you can’t make it too specialized though. In the end, Apple is taking a redundant approach. They have the GPU as a general-purpose parallel processor, the NPU as a specialized ML accelerator and the AMX coprocessor as a more flexible (but slower) matrix multiplication unit. I think it makes a lot of sence.
I agree. Did not look into the NPU, however: ML tends to be massively parallel by nature (which is why GPUs are used in the first place), so it seems likely the NPU is massively parallel as well?
 
Last edited:

dgdosen

macrumors 68030
Dec 13, 2003
2,817
1,463
Seattle
Then that's the problem. Mac Pro can use up to 4 GPU. 1 GPU for Mac Pro isn't enough.
Are you mocking users using multiple GPU? You cant deny the fact that some software take advantage from multiple GPU. Mac Pro 2019 itself is already prove you wrong. If Mac does not require multiple GPU, then you are justifying Mac's limitation. Do you really think 1 GPU can outperform 4 GPU for machine learning, 3D graphics, and more? Gosh.
This looks like a suspension trigger… Makes me think of Spinal Tap and the amplifier going up to 11.

Edit - I swear I wrote this before reading reply #42
 

Flint Ironstag

macrumors 65816
Dec 1, 2013
1,334
744
Houston, TX USA
@cmaier, sometimes having multiple GPUs in a box offers a degree of flexibility that a single card cannot. Here is one example:

say I have (4), 25 TLFOP GPUs with 32GB of VRAM each in one Mac Pro

and (1), 100 TFLOP apple GPU with I don't know - pick a reasonable amount of VRAM if it's going to be affordable to anyone.

The task is cracking a password hash (or several). In certain situations, it is beneficial to run multiple attacks on the same data set simultaneously, as data recovered in one attack often aids recovery in the others.

With a single GPU, you're doing one attack at a time.

Plus, if one is good, 4 is better as long as your task scales, and many do. I don't know what industry you're in, but I'm constantly amazed by the diverse array of gear my clients connect to their Macs.

Modularity exists for a reason in the workstation arena. Cheers
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
@cmaier, sometimes having multiple GPUs in a box offers a degree of flexibility that a single card cannot. Here is one example:

say I have (4), 25 TLFOP GPUs with 32GB of VRAM each in one Mac Pro

and (1), 100 TFLOP apple GPU with I don't know - pick a reasonable amount of VRAM if it's going to be affordable to anyone.

The task is cracking a password hash (or several). In certain situations, it is beneficial to run multiple attacks on the same data set simultaneously, as data recovered in one attack often aids recovery in the others.

With a single GPU, you're doing one attack at a time.

Plus, if one is good, 4 is better as long as your task scales, and many do. I don't know what industry you're in, but I'm constantly amazed by the diverse array of gear my clients connect to their Macs.

Modularity exists for a reason in the workstation arena. Cheers

But that‘s assuming a false premise. Why do you think if you have one GPU you can only do one attack at a time? Once again, it’s perfectly possible for one Apple GPU to have identical performance characteristics to 4[fill in blank] GPUs. Including the ability to process in parallel.

As for the rest, that’s a non sequitor. I admit that for *any* GPU, if 1 is good, 4 is likely better (putting aside power usage/heat). But that’s never been what this discussion is about.
 
  • Like
Reactions: secretbum

altaic

Suspended
Jan 26, 2004
712
484
I think the desire for 4x branded GPUs stems from the desire to upgrade and customize. Most people don’t know the trade offs, though: if you’ve got 64 PCIe lanes and 48 of them go unused, that’s a heck of a lot of wasted silicon and other infrastructure that could have otherwise gone toward more on die GPUs.

For that matter, some of that silicon area could go toward some kind of clever NUMA fabric to allow for a different kind of expandability. That’s what I’m not so secretly hoping for, anyway.
 

leman

macrumors Core
Oct 14, 2008
19,522
19,679
@cmaier, sometimes having multiple GPUs in a box offers a degree of flexibility that a single card cannot. Here is one example:

say I have (4), 25 TLFOP GPUs with 32GB of VRAM each in one Mac Pro

and (1), 100 TFLOP apple GPU with I don't know - pick a reasonable amount of VRAM if it's going to be affordable to anyone.

The task is cracking a password hash (or several). In certain situations, it is beneficial to run multiple attacks on the same data set simultaneously, as data recovered in one attack often aids recovery in the others.

With a single GPU, you're doing one attack at a time.

Plus, if one is good, 4 is better as long as your task scales, and many do. I don't know what industry you're in, but I'm constantly amazed by the diverse array of gear my clients connect to their Macs.

Modularity exists for a reason in the workstation arena. Cheers

Who is limiting you to running a single attack at a time? I don’t want to sound condescending, but have you ever programmed a GPU? These are massively parallel processors and it’s exceedingly difficult (and inefficient) to run a single task on them to begin with. If you want good performance, you will be running dozens of thousands hash attempts on a single large GPU.
 

Boil

macrumors 68040
Oct 23, 2018
3,478
3,173
Stargate Command
I want my original thought for a Mac Pro Cube...

Mac mini footprint, and extend the height to the same dims as said footprint; you know, so it's a Cube...

"Beefy" (for ASi needs) integrated PSU, basic Mx? APU on main logic board with SSD(s), MLB connects to ultra-high-speed backplane; options for add-in cards (CPU core cards, GPU core cards, Neural Engine core cards, SSD cards, maybe even an A/V I/O card)...

The new personal workstation, customized to meet your assorted professional needs...! ;^p
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
Based on information Apple has released so far, I’m fairly confident that there won’t be any traditional dGPU system in any of the new Macs. Apples approach is more flexible and more efficient, so there is really no reason for them to use dGPUs.

The energy efficient wins are pretty broad, but more flexible? That is a reach. A discrete GPU chip can be used on an add-in-card or embedded onto the motherboard. That is two deployment profiles. iGPU don't have that flexibility since sharing the same memory controller with the CPU ( and other I/O function units in the package).

dGPUs can use DRAM , GDDR , or HBM RAM relatively easily. CPU's .... not so much (at least not a wide set of general purpose workloadds with mid-to-high workload concurrency ).

Apple is getting some 'wins' here but those are trade-offs ( those aren't "free" wins. They are paying something for them. One of which is flexibility).

Unified Memory doesn't require homogeneity (one type of memory) and single package ( SoC).

CXL%202.0%20Press%20Briefing%20Deck_Embargoed-page-005_575px.jpg



or

"... AMD announced that future Epyc+Radeon generations will include shared memory/cache coherency between the GPU and CPU over the Infinity Fabric, ..."
9Qga6NNhTucXzkjkiFNoKf-970-80.jpg.webp



is spreading control and access to memory over different packages "free". No; most design decisions come with trade-ffs. It has higher latencies , but it is more flexible. IBM and Nvidia have been doing something similar with their more proprietary memory coherence links between Power 8-9 and Nvidia. To lessor extent has 4-16 socket CPU set-ups from a variety of implementors earlier ( IBM Z (and previous), Power , Intel , etc. ) .


If thrown on constraints like "must fit in smaller area" , "most consume lowest power " , "most be lowest overall latency", etc. that is something different.


There are some slightly warped tasks on "flexibility" where being able to natively run iPhone apps and Mac apps on the same GPU with no semantic gaps in GPU code execution... Yeah that is more flexibility. However, bending "flexibility" to fit the largely self imposed design constraints that Apple has imposed on itself ( as opposed to primarily external customer driven) is a bit of hand waving.

Certainly not flexible in time. 3-4 years later, a faster GPU comes out and ..... stuck with the exact same GPU started out with. That is not particularly flexible in the slightest. Even in an embedded context. Let's say SoC package is over due for an update , but dGPU from other source has substantively iterated. If the SoC would couple to the dGPU then could iterate. So system delivery schedule is more flexible. ( Apple could render that moot buy taking position won't do any update unless all major subcomponents more ... but that is yet again self imposed , not market driven, constraint. )


Apple also probably has layered as security constraint aspect over this. So not only cache coherency is being simplified by running it through a single memory controller hierarchy , but also a single place to set the security policies ( which process can touch which pages. ). Yet again those that is more "flexibility" to implement the requirement on the table rather than boarder implementation options.


Looking at the performance projections, I‘d expect high-end Apple laptops to have GPU performance roughly comparable to a mobile RTX 3060.

Since Apple sells 70+ % laptops, that is very nice for that subset of the Mac product line up. Doesn't present much potential breadth of performance coverage for the desktop side though.

Apple probably will drive more future customers into laptops. Pretty good chance though that isn't a 'free lunch" at the top end of the desktop space. They probably won't loose much sleep, if any, over that outcome.
 
Last edited:

leman

macrumors Core
Oct 14, 2008
19,522
19,679
The energy efficient wins are pretty broad, but more flexible?

Yeah, I’m wondering myself with what I meant by „more flexible“ ? I suppose what I was trying to say is that Apples architecture allows true heterogeneous computing. “Flexibility” here refers to software and heterogeneous work scheduling, not configurability.

As to CLX and friends, I want to see how it will work in practice. So far it sounds to me that they are simply talking about increasing the link bandwidth.

Since Apple sells 70+ % laptops, that is very nice for that subset of the Mac product line up. Doesn't present much potential breadth of performance coverage for the desktop side though.

Apple probably will drive more future customers into laptops. Pretty good chance though that isn't a 'free lunch" at the top end of the desktop space. They probably won't loose much sleep, if any, over that outcome.

I agree. It was a bit disappointing that the iMac does not have a faster GPU. It’s not that critical for a home computer maybe, but larger iMacs always had better graphics than the laptops. If they standardize the performance to the laptop level, much of the reason to get an iMac will go away.
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
The task is cracking a password hash (or several). In certain situations, it is beneficial to run multiple attacks on the same data set simultaneously, as data recovered in one attack often aids recovery in the others.
....
Who is limiting you to running a single attack at a time? I don’t want to sound condescending, but have you ever programmed a GPU? These are massively parallel processors and it’s exceedingly difficult (and inefficient) to run a single task on them to begin with. If you want good performance, you will be running dozens of thousands hash attempts on a single large GPU.

Errr, brute force password cracking isn't a single task. If trying millions of different , independent permutations there is little to do data coupling between the intermediate computations being done on each "try". That means there are little to no limits to making this extremely embarrassingly parallel code. There is extremely little need for any coherence locks here at all.


The only "hard' part here is how to many chunks to "chop up" the tries into not that you can't over saturate the wavefront limits of the GPU.

There are other examples that are not embarrassingly parallel. ( You need elements were one round and/or try of the computation has to feed some data to a "neighboring" one. ) where might have some "hard to code" or "too many , too frequent data syncrhozition/locking points. ), but this one isn't it.
 

leman

macrumors Core
Oct 14, 2008
19,522
19,679
Errr, brute force password cracking isn't a single task. If trying millions of different , independent permutations there is little to do data coupling between the intermediate computations being done on each "try". That means there are little to no limits to making this extremely embarrassingly parallel code. There is extremely little need for any coherence locks here at all.


The only "hard' part here is how to many chunks to "chop up" the tries into not that you can't over saturate the wavefront limits of the GPU.

That’s exactly what I wrote.
 

09872738

Cancelled
Feb 12, 2005
1,270
2,125
I wonder where the term „embarassingly parallel“ comes from. Why „embarrassingly“?
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.