Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Status
Not open for further replies.

Kpjoslee

macrumors 6502
Sep 11, 2007
417
269
LOL. It’s faster than any integrated graphics solution, and as fast as many mid-range discrete graphics solutions.

Do you have some sort of personal grudge here?
Uh.....which mid-range discrete GPU? M1 GPU is at the top of the game for IGP, but it definitely isn't as fast as mid-range discrete graphics card.
 

Jorbanead

macrumors 65816
Aug 31, 2018
1,209
1,438
= Apple M1 GPU speed is very disappointing. And we can be sure that this is not only for chess the case.
In what way? What are you comparing it to? Sure it’s disappointing if you’re comparison is to a RTX 3080 but we both know that’s a bi extreme. Just as extreme as comparing M1 to a Threadripper.
 

Kpjoslee

macrumors 6502
Sep 11, 2007
417
269
In some benchmarks, M1 beats GeForce GTX 1050 Ti and Radeon RX 560, for example.
Well, they are more than 4 years old barely mid-range cards. Current entry mid-range 1650 is in different class compared to M1 GPU.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,679
Well, they are more than 4 years old barely mid-range cards. Current entry mid-range 1650 is in different class compared to M1 GPU.

Not really. In gaming-related tests, M1 is roughly equivalent to a Max-Q version of the 1650, or just around 20-25% slower than the 1650. That's not too bad when you consider the power consumption figures of these GPUs.

In Geekbench compute, dGPUs do much better, that's true, but I have this suspicion that Geekbench strongly favors GPUs with dedicated memory. I have a hunch that it only measures the time it takes to run a kernel and excludes the time needed to actually upload the data into the GPU memory. Besides, Geekbench compute only uses fairly light computation that heavily rely on moving data from and to memory, which again puts dGPUs at a massive advantage while completely negating the benefit's of M1's humongous GPU cache. Real-world workloads rarely work that way.
 

Andropov

macrumors 6502a
May 3, 2012
746
990
Spain
Basically, every compute shader that needs to be executed only once (so the data needs to go from RAM -> GPU -> RAM) is massively benefited from UMA, right?
 
  • Like
Reactions: cmaier

leman

macrumors Core
Oct 14, 2008
19,521
19,679
Basically, every compute shader that needs to be executed only once (so the data needs to go from RAM -> GPU -> RAM) is massively benefited from UMA, right?

In a nutshell, yes, it should. There still might be a memory copy of course, but the copy itself is much faster. Also, there is the question of latency (time from submission to completion) which is rarely getting measured but which ends up to be quite important for user experience. Also, more complex tasks will benefit of Apple‘s large last-level cache.
 

Andropov

macrumors 6502a
May 3, 2012
746
990
Spain
In a nutshell, yes, it should. There still might be a memory copy of course, but the copy itself is much faster. Also, there is the question of latency (time from submission to completion) which is rarely getting measured but which ends up to be quite important for user experience. Also, more complex tasks will benefit of Apple‘s large last-level cache.
For compute workflows (I'm thinking numerical computing for scientific applications) you don't even need to copy the data, since you're likely already storing values in contiguous arrays. Apple offers makeBuffer(bytesNoCopy: UnsafeMutableRawPointer... for that.

I have never delved too deep into it, since the Metal documentation for compute shaders is scarce, so I'm not sure if I'm writing my compute kernels in the right way, but for data that needs to be synchronized in the CPU after every step (i.e. can't be kept in the GPU VRAM or is only processed once) my iPad Pro (A12X) always beats my 16" MacBook Pro. And it works pretty fast in either device, so my implementation can't be that bad.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,679
For compute workflows (I'm thinking numerical computing for scientific applications) you don't even need to copy the data, since you're likely already storing values in contiguous arrays. Apple offers makeBuffer(bytesNoCopy: UnsafeMutableRawPointer... for that.

Oh yes, if the implementation is optimized for Apple Silicon, you can definitely get tons of savings. I was thinking more about ”normal” software that is not specifically written for the Apple platform.

I have never delved too deep into it, since the Metal documentation for compute shaders is scarce, so I'm not sure if I'm writing my compute kernels in the right way, but for data that needs to be synchronized in the CPU after every step (i.e. can't be kept in the GPU VRAM or is only processed once) my iPad Pro (A12X) always beats my 16" MacBook Pro. And it works pretty fast in either device, so my implementation can't be that bad.

I’m not surprised. Memory copies to dGPU are ridiculously expensive compared to the compute capability. And yeah, that’s a critical flaw of most benchmarks.
 

Appletoni

Suspended
Original poster
Mar 26, 2021
443
177
What were your expectations for a GPU that consumes 10W?
(Benchmark: LC0 (chess) speed

M1 GPU = 300 nps

Other mobile GPUs = 50000 - 100000 nps)
——————————————————————

150W - 300W = 50000 - 100000 nps
10W = 3333 nps but it has only 300 nps

Apple, because it’s Apple x 10W = should have at least 16665 nps (3333x5)
 

leman

macrumors Core
Oct 14, 2008
19,521
19,679
(Benchmark: LC0 (chess) speed

M1 GPU = 300 nps

Other mobile GPUs = 50000 - 100000 nps)
——————————————————————

150W - 300W = 50000 - 100000 nps
10W = 3333 nps but it has only 300 nps

Apple, because it’s Apple x 10W = should have at least 16665 nps (3333x5)

Well, maybe you should submit a software patch that fixes the software to utilize M1 properly, then it will certainly run better.
 

jeanlain

macrumors 68020
Mar 14, 2009
2,463
958
(Benchmark: LC0 (chess) speed
You claim that "we can be sure that this is not only for chess the case" (sic), but you keep posting results from chess engines.

Show me a Metal app for which the M1 GPU is "disappointing".

Ha yes, you can use Mafia III. It crashes on M1 Macs, so it yields 0 fps. I suppose the M1 is disappointing in games as well.
 
  • Like
Reactions: JMacHack

Andropov

macrumors 6502a
May 3, 2012
746
990
Spain
(Benchmark: LC0 (chess) speed

M1 GPU = 300 nps

Other mobile GPUs = 50000 - 100000 nps)
——————————————————————

150W - 300W = 50000 - 100000 nps
10W = 3333 nps but it has only 300 nps

Apple, because it’s Apple x 10W = should have at least 16665 nps (3333x5)

So the M1 GPU scores more than an order of magnitude (or more) below what it should in this test. Surely it's more plausible that the test is not properly using the M1 capabilities than the M1 GPU is secretly TEN TIMES worse than any other GPU in the market (in performance per watt, no less)?
 
  • Like
Reactions: JMacHack

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
So the M1 GPU scores more than an order of magnitude (or more) below what it should in this test. Surely it's more plausible that the test is not properly using the M1 capabilities than the M1 GPU is secretly TEN TIMES worse than any other GPU in the market (in performance per watt, no less)?
There seems to be some sort of weird thing going on here by someone highly motivated to discredit the M1. The cherry-picking has gotten worse as the thread as lived on.
 

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
It went south as soon as it was demonstrated that Stockfish was run on the wrong settings and that with the right setting M1 dominated - but instead of conceding the point the cherrypicking and misrepresentations got worse (doubling down as it were).

Oh Yeah, well, um, what about this benchmark? M1 sucks again!

Chess engine (LC0) - fan
——————————————
2020 MBP 16” - 105dB
2021 MBP 13” - 0dB
 

leman

macrumors Core
Oct 14, 2008
19,521
19,679

Yes, AMD booked 5nm. In the second half of 2022. Why do you think TSMC has capacity in 2022? Because Apple will have transitioned to a lower node by then. Apple can pay more, so they get priority access. And of course, TSMC is working hard on expanding capacity, so hopefully there will be enough for everyone once 3nm reaches volume. Apple is Stil likely be at least a year ahead others here though.
 
  • Like
Reactions: dmccloud

crazy dave

macrumors 65816
Sep 9, 2010
1,453
1,229
Yes, AMD booked 5nm. In the second half of 2022. Why do you think TSMC has capacity in 2022? Because Apple will have transitioned to a lower node by then. Apple can pay more, so they get priority access. And of course, TSMC is working hard on expanding capacity, so hopefully there will be enough for everyone once 3nm reaches volume. Apple is Stil likely be at least a year ahead others here though.

According to the reports, rumors, and speculation tree, AMD and Apple will both be on 5nm+ node next year. Apple will still be there first, primarily for the upcoming iPhone, but it will likely be the case that when Zen 4 comes out next year it will be on the same node as what the competing AS Mac chips will be on. That hardwaretimes article has it pegged later than I’d heard (however may be more accurate! and if accurate may mean Apple will have a node edge).

EDIT: I should state that he is still wrong in general: Apple gets priority access to nodes because they help pay for it and ships waaay higher volumes of chips than AMD. Naturally most of those are iPhone chips, but that’s not really here or there for the purposes of manufacturing and the volumes don’t compare. AMD is getting bigger though and more able to buy volume earlier.

EDIT2: okay it looks like more people are saying end of 2022 for Zen 4 on 5nm+ which will probably still overlap with M-series 5nm+ chips but less so than I thought.
 
Last edited:

Appletoni

Suspended
Original poster
Mar 26, 2021
443
177
Let's get real there...

Special compiles of Stockfish, cFish, etc. are already optimizing compiles for using M1:s NEON, etc. But they are still less than half the speed of similar priced, similar-sized, computers. The fact of the matter is that M1 isn't that fast as the usual influencer-types (fanbois) make it out to be. A similar priced modern CPU from AMD runs circles around it. For certain use-cases, it may be "ok" for its "watt" but let's keep it real.. the CPU is faster on Apple PowerPoint.-presentations than it is in real-life performance. It's more or less a glorified std. ARM big-little phone CPU with a focus mainly on the low-power slow "little"-cores and relies heavily on optimized code to even be comparable to intel/AMD these days. The CPU is overrated and underperforming.. (not only for chess). Just compare it with amazing new stuff like the AMD 5700G and realize that anyone looking for "real" performance of CPU+GPU for the dollar should look elsewhere than the fruity un-open company these days.


No AVX2
No AVX512 (VNNI)
No Hyperthreading




How to compensate the losses?
 
  • Haha
  • Like
Reactions: 09872738 and mi7chy
Status
Not open for further replies.
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.