Do you think Apple will release a M1 Mac with a dedicated GPU

cmaier · Jun 12, 2021

sunny5 said:
Cant explain others? I'm done then. Good luck.

Ok. Let’s do it with photogrammetry.

Tell me which card you need four of.

Now, imagine a single card with the same performance as four of that card.

Now you are able to do the same job with 1 card instead of four.

What you seem to be missing is that the mere fact that there are four of something doesn’t mean it’s better than a single thing that is four times as fast.

diamond.g · Jun 12, 2021

richinaus said:
The way things are going [M1 in ipads, laptops and desktops], I think the offer of choice is going to get reduced even further in the next pro machines. Apple have shown us their direction, and I doubt it will change that much moving forward. Thats where the big questions on the Mac Pro come along, as it will need something totally different and would it be worth the effort?

Well it would be a good time to bring back the “trash can” Mac Pro Form factor.

diamond.g · Jun 12, 2021

cmaier said:
Ok. Let’s do it with photogrammetry.

Tell me which card you need four of.

Now, imagine a single card with the same performance as four of that card.

Now you are able to do the same job with 1 card instead of four.

What you seem to be missing is that the mere fact that there are four of something doesn’t mean it’s better than a single thing that is four times as fast.

I think that if a task can be accelerated with multiple of something always having multiple will be faster than just having 1. So if you get 100 TFLOP with 1 ASi GPU then you would get 100 * x with multiple units, right?

richinaus · Jun 12, 2021

cmaier said:
I have to patient or they’ll ban me again

I was nearly banned for lack of patience the other week

sunny5 said:
Machine learning, Photogrammetry, 3D graphics, Video(Davinci), CG, and more. You are not convincing especially since you are not mentioning the work scale. Most people dont need multi GPU.

I am in 3D graphics, CG etc.
it is better to off load the more hardcore visualisation to the cloud these days or a cluster, and not rely on one desktop to do everything.
The hardware thesedays is getting to a point where I am close to saying my 16” MBP is good enough [it isn’t yet, but I am pretty sure the next M varient will be].
Saying this, as a mid level power user, the requirements for multi CUDA cores and multi core processors for the limited amount of people, will most likely be satisfied in PC’s. I simply cannot believe Apple will stay with large desktop computers with the directions already shown.

The Cube / trash can will be the future Mac Pro.

cmaier · Jun 12, 2021

diamond.g said:
I think that if a task can be accelerated with multiple of something always having multiple will be faster than just having 1. So if you get 100 TFLOP with 1 ASi GPU then you would get 100 * x with multiple units, right?

Sure. But the statement was that certain tasks *always* require four GPUs. And my point is that if it requires four GPUs, that means it has a specific performance requirement.

And logically, if a single GPU meets that performance requirement, then you don’t need four of them.

If the original statement was “multiple cards is better because you can increase performance,” that would be a different thing.

diamond.g · Jun 12, 2021

cmaier said:
Sure. But the statement was that certain tasks *always* require four GPUs. And my point is that if it requires four GPUs, that means it has a specific performance requirement.

And logically, if a single GPU meets that performance requirement, then you don’t need four of them.

If the original statement was “multiple cards is better because you can increase performance,” that would be a different thing.

That is fair. It would be interesting to see if Apple decides to continue with a *TX Full tower case (is it ATX, or BTX) or switch to either the trash can shape or maybe the G4 Cube shape.

sunny5 · Jun 12, 2021

cmaier said:
Sure. But the statement was that certain tasks *always* require four GPUs. And my point is that if it requires four GPUs, that means it has a specific performance requirement.

And logically, if a single GPU meets that performance requirement, then you don’t need four of them.

If the original statement was “multiple cards is better because you can increase performance,” that would be a different thing.

Did I even say always? I said there are several tasks require multiple GPU. You have to admit the fact that multiple GPU still exists.

mi7chy · Jun 12, 2021

Use of multi-GPUs for GPU compute has been around for a long time. Even enthusiasts/gamers have SLI'd multiple GPUs. I had two Nvidia 9800GX2 SLI'd (each card has two GPUs so total of four GPUs) around 2008. Doubled up as a room heater.

https://securityledger.com/2012/12/new-25-gpu-monster-devours-passwords-in-seconds/

diamond.g · Jun 12, 2021

mi7chy said:
Use of multi-GPUs for GPU compute has been around for a long time. Even enthusiasts/gamers have SLI'd multiple GPUs. I had two Nvidia 9800GX2 SLI'd (each card has two GPUs so total of four GPUs) around 2008. Doubled up as a room heater.

https://securityledger.com/2012/12/new-25-gpu-monster-devours-passwords-in-seconds/

Mining altcoin is the modern day “ultimate” space heater!

cmaier · Jun 12, 2021

sunny5 said:
Did I even say always? I said there are several tasks require multiple GPU. You have to admit the fact that multiple GPU still exists.

You also didn’t say “given current GPU performance.” You said that there are tasks that require multiple GPUs, even future apple GPU’s whose performance is unknown.

So, yes, you did say always.

Anyway, I’ve been repeating myself, so I’m going to go ahead and turn my stereo to 11 (which is louder than your stereo which only goes to 10), and let someone else explain it to you.

Just remember: 4x < y if y > 4x

leman · Jun 12, 2021

09872738 said:
So, what if the “GPUs“ are not GPUs as we know it right now, but a new kind of dedicated, massively parallel computing devices accelerating specialist applications?

Well, thats what a modern GPU is in a nutshell - a massively parallel processor. Since a GPU is expected to do all kinds of tasks, you can’t make it too specialized though. In the end, Apple is taking a redundant approach. They have the GPU as a general-purpose parallel processor, the NPU as a specialized ML accelerator and the AMX coprocessor as a more flexible (but slower) matrix multiplication unit. I think it makes a lot of sence.

richinaus · Jun 12, 2021

cmaier said:
You also didn’t say “given current GPU performance.” You said that there are tasks that require multiple GPUs, even future apple GPU’s whose performance is unknown.

So, yes, you did say always.

Anyway, I’ve been repeating myself, so I’m going to go ahead and turn my stereo to 11 (which is louder than your stereo which only goes to 10), and let someone else explain it to you.

Just remember: 4x < y if y > 4x

The issue on these forums, is that we all have our own specific needs.
Apple often doesn’t meet these needs so we get angry and come on macrumors to complain.

Apple are a consumer company, and will develop systems for the majority of users.

The majority of Apple users, I would suggest, want a simple box they can just turn on and not worry about it.

A single GPU in a smaller box that is quick enough for most people, will make the machine attractive. Just like the trash can - this is the Apple way really and was the way they were going until the hardware [that they couldn’t control] went in a different direction.

Now they control the hardware, they can develop a GPU that performs as well as 4 if they want, or develop GPU’s that work for most pro users, in an attractive compact package.

bobcomer · Jun 12, 2021

richinaus said:
I certainly see Apple pushing a smaller desktop box that is unexpandable, rather than continuing with the Mac Pro [except for some vanity reasons and because they can].

That's pretty much the same way low and medium end Windows business desktops are going -- the last bunch I bought were just that. It's easier to buy them that way...

09872738 · Jun 12, 2021

leman said:
Well, thats what a modern GPU is in a nutshell - a massively parallel processor. Since a GPU is expected to do all kinds of tasks, you can’t make it too specialized though. In the end, Apple is taking a redundant approach. They have the GPU as a general-purpose parallel processor, the NPU as a specialized ML accelerator and the AMX coprocessor as a more flexible (but slower) matrix multiplication unit. I think it makes a lot of sence.

I agree. Did not look into the NPU, however: ML tends to be massively parallel by nature (which is why GPUs are used in the first place), so it seems likely the NPU is massively parallel as well?

dgdosen · Jun 12, 2021

sunny5 said:
Then that's the problem. Mac Pro can use up to 4 GPU. 1 GPU for Mac Pro isn't enough.

sunny5 said:
Are you mocking users using multiple GPU? You cant deny the fact that some software take advantage from multiple GPU. Mac Pro 2019 itself is already prove you wrong. If Mac does not require multiple GPU, then you are justifying Mac's limitation. Do you really think 1 GPU can outperform 4 GPU for machine learning, 3D graphics, and more? Gosh.

This looks like a suspension trigger… Makes me think of Spinal Tap and the amplifier going up to 11.

Edit - I swear I wrote this before reading reply #42

Flint Ironstag · Jun 12, 2021

@cmaier, sometimes having multiple GPUs in a box offers a degree of flexibility that a single card cannot. Here is one example:

say I have (4), 25 TLFOP GPUs with 32GB of VRAM each in one Mac Pro

and (1), 100 TFLOP apple GPU with I don't know - pick a reasonable amount of VRAM if it's going to be affordable to anyone.

The task is cracking a password hash (or several). In certain situations, it is beneficial to run multiple attacks on the same data set simultaneously, as data recovered in one attack often aids recovery in the others.

With a single GPU, you're doing one attack at a time.

Plus, if one is good, 4 is better as long as your task scales, and many do. I don't know what industry you're in, but I'm constantly amazed by the diverse array of gear my clients connect to their Macs.

Modularity exists for a reason in the workstation arena. Cheers

cmaier · Jun 12, 2021

Flint Ironstag said:
@cmaier, sometimes having multiple GPUs in a box offers a degree of flexibility that a single card cannot. Here is one example:

say I have (4), 25 TLFOP GPUs with 32GB of VRAM each in one Mac Pro

and (1), 100 TFLOP apple GPU with I don't know - pick a reasonable amount of VRAM if it's going to be affordable to anyone.

The task is cracking a password hash (or several). In certain situations, it is beneficial to run multiple attacks on the same data set simultaneously, as data recovered in one attack often aids recovery in the others.

With a single GPU, you're doing one attack at a time.

Plus, if one is good, 4 is better as long as your task scales, and many do. I don't know what industry you're in, but I'm constantly amazed by the diverse array of gear my clients connect to their Macs.

Modularity exists for a reason in the workstation arena. Cheers

But that‘s assuming a false premise. Why do you think if you have one GPU you can only do one attack at a time? Once again, it’s perfectly possible for one Apple GPU to have identical performance characteristics to 4[fill in blank] GPUs. Including the ability to process in parallel.

As for the rest, that’s a non sequitor. I admit that for *any* GPU, if 1 is good, 4 is likely better (putting aside power usage/heat). But that’s never been what this discussion is about.

altaic · Jun 12, 2021

I think the desire for 4x branded GPUs stems from the desire to upgrade and customize. Most people don’t know the trade offs, though: if you’ve got 64 PCIe lanes and 48 of them go unused, that’s a heck of a lot of wasted silicon and other infrastructure that could have otherwise gone toward more on die GPUs.

For that matter, some of that silicon area could go toward some kind of clever NUMA fabric to allow for a different kind of expandability. That’s what I’m not so secretly hoping for, anyway.

leman · Jun 13, 2021

Flint Ironstag said:
@cmaier, sometimes having multiple GPUs in a box offers a degree of flexibility that a single card cannot. Here is one example:

say I have (4), 25 TLFOP GPUs with 32GB of VRAM each in one Mac Pro

and (1), 100 TFLOP apple GPU with I don't know - pick a reasonable amount of VRAM if it's going to be affordable to anyone.

The task is cracking a password hash (or several). In certain situations, it is beneficial to run multiple attacks on the same data set simultaneously, as data recovered in one attack often aids recovery in the others.

With a single GPU, you're doing one attack at a time.

Plus, if one is good, 4 is better as long as your task scales, and many do. I don't know what industry you're in, but I'm constantly amazed by the diverse array of gear my clients connect to their Macs.

Modularity exists for a reason in the workstation arena. Cheers

Who is limiting you to running a single attack at a time? I don’t want to sound condescending, but have you ever programmed a GPU? These are massively parallel processors and it’s exceedingly difficult (and inefficient) to run a single task on them to begin with. If you want good performance, you will be running dozens of thousands hash attempts on a single large GPU.

Boil · Jun 13, 2021

I want my original thought for a Mac Pro Cube...

Mac mini footprint, and extend the height to the same dims as said footprint; you know, so it's a Cube...

"Beefy" (for ASi needs) integrated PSU, basic Mx? APU on main logic board with SSD(s), MLB connects to ultra-high-speed backplane; options for add-in cards (CPU core cards, GPU core cards, Neural Engine core cards, SSD cards, maybe even an A/V I/O card)...

The new personal workstation, customized to meet your assorted professional needs...! ;^p

deconstruct60 · Jun 13, 2021

leman said:
Based on information Apple has released so far, I’m fairly confident that there won’t be any traditional dGPU system in any of the new Macs. Apples approach is more flexible and more efficient, so there is really no reason for them to use dGPUs.

The energy efficient wins are pretty broad, but more flexible? That is a reach. A discrete GPU chip can be used on an add-in-card or embedded onto the motherboard. That is two deployment profiles. iGPU don't have that flexibility since sharing the same memory controller with the CPU ( and other I/O function units in the package).

dGPUs can use DRAM , GDDR , or HBM RAM relatively easily. CPU's .... not so much (at least not a wide set of general purpose workloadds with mid-to-high workload concurrency ).

Apple is getting some 'wins' here but those are trade-offs ( those aren't "free" wins. They are paying something for them. One of which is flexibility).

Unified Memory doesn't require homogeneity (one type of memory) and single package ( SoC).

CXL%202.0%20Press%20Briefing%20Deck_Embargoed-page-005_575px.jpg

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

or

"... AMD announced that future Epyc+Radeon generations will include shared memory/cache coherency between the GPU and CPU over the Infinity Fabric, ..."

AMD's CPU-to-GPU Infinity Fabric Detailed

Don't move the data.

www.tomshardware.com

is spreading control and access to memory over different packages "free". No; most design decisions come with trade-ffs. It has higher latencies , but it is more flexible. IBM and Nvidia have been doing something similar with their more proprietary memory coherence links between Power 8-9 and Nvidia. To lessor extent has 4-16 socket CPU set-ups from a variety of implementors earlier ( IBM Z (and previous), Power , Intel , etc. ) .

If thrown on constraints like "must fit in smaller area" , "most consume lowest power " , "most be lowest overall latency", etc. that is something different.

There are some slightly warped tasks on "flexibility" where being able to natively run iPhone apps and Mac apps on the same GPU with no semantic gaps in GPU code execution... Yeah that is more flexibility. However, bending "flexibility" to fit the largely self imposed design constraints that Apple has imposed on itself ( as opposed to primarily external customer driven) is a bit of hand waving.

Certainly not flexible in time. 3-4 years later, a faster GPU comes out and ..... stuck with the exact same GPU started out with. That is not particularly flexible in the slightest. Even in an embedded context. Let's say SoC package is over due for an update , but dGPU from other source has substantively iterated. If the SoC would couple to the dGPU then could iterate. So system delivery schedule is more flexible. ( Apple could render that moot buy taking position won't do any update unless all major subcomponents more ... but that is yet again self imposed , not market driven, constraint. )

Apple also probably has layered as security constraint aspect over this. So not only cache coherency is being simplified by running it through a single memory controller hierarchy , but also a single place to set the security policies ( which process can touch which pages. ). Yet again those that is more "flexibility" to implement the requirement on the table rather than boarder implementation options.

leman said:
Looking at the performance projections, I‘d expect high-end Apple laptops to have GPU performance roughly comparable to a mobile RTX 3060.

Since Apple sells 70+ % laptops, that is very nice for that subset of the Mac product line up. Doesn't present much potential breadth of performance coverage for the desktop side though.

Apple probably will drive more future customers into laptops. Pretty good chance though that isn't a 'free lunch" at the top end of the desktop space. They probably won't loose much sleep, if any, over that outcome.

leman · Jun 13, 2021

deconstruct60 said:
The energy efficient wins are pretty broad, but more flexible?

Yeah, I’m wondering myself with what I meant by „more flexible“ 😁 I suppose what I was trying to say is that Apples architecture allows true heterogeneous computing. “Flexibility” here refers to software and heterogeneous work scheduling, not configurability.

As to CLX and friends, I want to see how it will work in practice. So far it sounds to me that they are simply talking about increasing the link bandwidth.

deconstruct60 said:
Since Apple sells 70+ % laptops, that is very nice for that subset of the Mac product line up. Doesn't present much potential breadth of performance coverage for the desktop side though.

Apple probably will drive more future customers into laptops. Pretty good chance though that isn't a 'free lunch" at the top end of the desktop space. They probably won't loose much sleep, if any, over that outcome.

I agree. It was a bit disappointing that the iMac does not have a faster GPU. It’s not that critical for a home computer maybe, but larger iMacs always had better graphics than the laptops. If they standardize the performance to the laptop level, much of the reason to get an iMac will go away.

deconstruct60 · Jun 13, 2021

leman said:
The task is cracking a password hash (or several). In certain situations, it is beneficial to run multiple attacks on the same data set simultaneously, as data recovered in one attack often aids recovery in the others.

Click to expand...

....
Who is limiting you to running a single attack at a time? I don’t want to sound condescending, but have you ever programmed a GPU? These are massively parallel processors and it’s exceedingly difficult (and inefficient) to run a single task on them to begin with. If you want good performance, you will be running dozens of thousands hash attempts on a single large GPU.

Errr, brute force password cracking isn't a single task. If trying millions of different , independent permutations there is little to do data coupling between the intermediate computations being done on each "try". That means there are little to no limits to making this extremely embarrassingly parallel code. There is extremely little need for any coherence locks here at all.

The only "hard' part here is how to many chunks to "chop up" the tries into not that you can't over saturate the wavefront limits of the GPU.

There are other examples that are not embarrassingly parallel. ( You need elements were one round and/or try of the computation has to feed some data to a "neighboring" one. ) where might have some "hard to code" or "too many , too frequent data syncrhozition/locking points. ), but this one isn't it.

leman · Jun 13, 2021

deconstruct60 said:
Errr, brute force password cracking isn't a single task. If trying millions of different , independent permutations there is little to do data coupling between the intermediate computations being done on each "try". That means there are little to no limits to making this extremely embarrassingly parallel code. There is extremely little need for any coherence locks here at all.

The only "hard' part here is how to many chunks to "chop up" the tries into not that you can't over saturate the wavefront limits of the GPU.

That’s exactly what I wrote.

09872738 · Jun 13, 2021

I wonder where the term „embarassingly parallel“ comes from. Why „embarrassingly“?

Do you think Apple will release a M1 Mac with a dedicated GPU

Suspended

macrumors G5

macrumors G5

macrumors 68030

Suspended

macrumors G5

Suspended

Suspended

macrumors G5

Suspended

macrumors Core

macrumors 68030

macrumors 601

Cancelled

macrumors 68030

macrumors 65816

Suspended

macrumors 6502a

macrumors Core

macrumors 68040

macrumors G5

macrumors Core

macrumors G5

macrumors Core

Cancelled

Our Staff