Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Status
Not open for further replies.

Taz Mangus

macrumors 604
Mar 10, 2011
7,815
3,504
It's the other one who's talking about 40 core laptop CPUs. Leifi is slightly more grounded in reality, but only slightly.
They appear to both just ramble on as if it proves anything about anything. Very strange thread, started that way and continues on that way after 32 pages. They both are single track, on auto repeat.
 

ddhhddhh2

macrumors regular
Jun 2, 2021
242
374
Taipei
They appear to both just ramble on as if it proves anything about anything. Very strange thread, started that way and continues on that way after 32 pages. They both are single track, on auto repeat.

So far, his responses have been annoying. It's like he's on a different radio channel, just saying what he wants to say.

There is already too much evidence, by too many people, that Chess is not optimized by the M1, and yet the speakers still only play their own noise.

The only reason I haven't ignored that noise so far is that I want to see how shameless people can be, the troll was actually treated well by everyone.
 
  • Like
Reactions: JMacHack

leman

macrumors Core
Oct 14, 2008
19,521
19,678
So far, his responses have been annoying. It's like he's on a different radio channel, just saying what he wants to say.

There is already too much evidence, by too many people, that Chess is not optimized by the M1, and yet the speakers still only play their own noise.

One problem I have with this thread is not just the very annoying attitude displayed by the main actors but also how folks have been replying to them. The main claims were that a) M1 is bad at running demanding SIMD code and b) it is only good on content creation benchmarks. To which people reply by stating that a) chess engines should start using Apple APIs to get good performance and b) by showing ... content creation benchmarks. This directly validates their position and reaffirms their belief that Apple Silicon is "fanboy hype".

The story in the end is very simple. Yes, Apple Silicon underperforms on a bunch of software suites that have not been properly tested on the platform or we know run suboptimal code path (at some point they were even bringing Blender to aid there arguments, never mind that that particular version of Blender did not run natively...) So what? Nobody cares. This is like me claiming that Windows is crap for data science because some obscure piece of software I use only supports Unix.
 

Boil

macrumors 68040
Oct 23, 2018
3,478
3,173
Stargate Command
This thread is a ****show of epic proportions, a literal Two Troll Circus...!

When do we move the conversation to Checkers benchmarks...?!? ;^p

And I do believe the whole gist of Apple silicon is overall end user experience, especially direct onscreen feedback while working in DCC software suites, not crushing some random non-optimized chess benchmark...

I doubt many are out there looking for a new machine to do any sort of DCC work out there asking how it does on said random chess benchmark...?

Madness...!
 
  • Like
Reactions: JMacHack and uller6

Sopel

macrumors member
Nov 30, 2021
41
85
What kind of macOS specific APIs are you using?

Accelerate and ML Compute?
Metal?
Grand Central Dispatch?
None. We expect apple to expose us a native solution like x86 intrinsics, because our network is very non-standard and requires manually written inference code from ground up. We will not use any dependencies like this. We use NEON currently because that's the only thing natively accessible on macs. We also don't use any windows/linux specific APIs for inference to be honest.

On a side note. The APIs might be too limiting for LC0 too. They have some custom stuff coded in, as afaik you can't have the flexibility of CUDA on macs. At least for the "neural engine".
 
Last edited:
  • Like
Reactions: Appletoni

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,866
I received a message from an lc0 dev that clarifies some things. I quote:

> 1. Lc0 does use Accelerate.
> 2. An experiment to use Core ML was underwhelming: https://github.com/daylen/leela_ane
> 3. Patches for Metal support or similar are welcome.
Not merely underwhelming - daylen reports "For reasons I don't understand yet, inference on ANE and GPU is slower than inference on CPU". That's a performance bug in someone's code (I don't pretend to know whether it's Apple's or daylen's problem).

(hey guy who keeps insisting optimization can't possibly make a huge difference - here, a 5.5 TFLOPS (FP16) inferencing-optimized engine is losing to the M1 general purpose CPUs. Does that mean Apple's ANE is bad? Nah. Just means there's some problem which, once solved, should unlock much higher performance.)
 

quarkysg

macrumors 65816
Oct 12, 2019
1,247
841
None. We expect apple to expose us a native solution like x86 intrinsics, because our network is very non-standard and requires manually written inference code from ground up. We will not use any dependencies like this. We use NEON currently because that's the only thing natively accessible on macs. We also don't use any windows/linux specific APIs for inference to be honest.

On a side note. The APIs might be too limiting for LC0 too. They have some custom stuff coded in, as afaik you can't have the flexibility of CUDA on macs. At least for the "neural engine".
??
 
  • Like
Reactions: Leifi

Boil

macrumors 68040
Oct 23, 2018
3,478
3,173
Stargate Command
On a side note. The APIs might be too limiting for LC0 too. They have some custom stuff coded in, as afaik you can't have the flexibility of CUDA on macs. At least for the "neural engine".


I actually think that what @Sopel says is very reasonable. If they use custom inference code, they can’t really use Accelerate. It’s not a drop-in replacement for many issues. Instead, efforts should be directed towards identifying potential issues with the CPU code on Apple ARM.
I think @quarkysg was questioning the "CUDA on macOS" part of the statement...?
 

quarkysg

macrumors 65816
Oct 12, 2019
1,247
841
I actually think that what @Sopel says is very reasonable. If they use custom inference code, they can’t really use Accelerate. It’s not a drop-in replacement for many issues. Instead, efforts should be directed towards identifying potential issues with the CPU code on Apple ARM.
Maybe I don’t understand the intricacies, but saying not using OS dependent APIs based on principal but using a closed API seems like double standards to me.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,678
Maybe I don’t understand the intricacies, but saying not using OS dependent APIs based on principal but using a closed API seems like double standards to me.

Ah, I see what you mean. But Stockfish (what @Sopel referred to as „we“) does not use CUDA - it’s CPU only. The CUDA comment was regarding LC0, a different codebase that seems to rely on more standard ML appriach (they also use Apple Accelerate).
 
  • Like
Reactions: Appletoni and Sopel

the8thark

macrumors 601
Apr 18, 2011
4,628
1,735
This thread is a ****show of epic proportions, a literal Two Troll Circus...!
I agree totally.
What matters is how your workflows are improved. Benchmark numbers are not always an indicator of better workflow performance. Also bragging about benchmark numbers and results is no difference to a pissing contest.

If people want real world data, they should follow / ask the people they trust how their similar workflows are improved and maybe get some numbers for those workflows as quantifiable data for a comparison.
 
  • Like
Reactions: Boil

quarkysg

macrumors 65816
Oct 12, 2019
1,247
841
Ah, I see what you mean. But Stockfish (what @Sopel referred to as „we“) does not use CUDA - it’s CPU only. The CUDA comment was regarding LC0, a different codebase that seems to rely on more standard ML appriach (they also use Apple Accelerate).
Ic. An apology is in order then. @Sopel very sorry for not understanding fully your post.
 
  • Like
Reactions: Appletoni and Leifi

Appletoni

Suspended
Original poster
Mar 26, 2021
443
177
What do you base this belief on? What are you smoking?

40 performance cores of similar design, with similar overhead for busses, etc. would make the die... physically the largest desktop/laptop die on the market (like... 1100-1200 mm^2 plus?) with super-bad yield rates (thus much higher price), much higher power consumption and the requirement to down-clock it (impacting single thread performance - which does matter) to fit within the thermal/power envelope of the target devices.

Quite likely the the on die fabric would be overwhelmed with the bandwidth contention with that many dies on board, and also quite likely that there wouldn't be sufficient DRAM bandwidth to feed it in any case.

And at the end of the day... they're winning with 8 P cores.


And surely, if Apple can do it, Intel or AMD with their far greater CPU manufacturing experience could do it also. Where's the 40 core intel laptop CPU (or desktop for that matter)?

?
Everything is possible ;)
I remember when people says 2 core cpu is enough and 4 core inside a notebook isn’t possible.
Intel have plans to go with 16 cores cpu inside notebooks and not only 8 cores.
Since a long time you can buy notebooks with a 3950 (16 cores) AMD cpu: https://www.xmg.gg/en/xmg-apex-15-e20/
And AMD have plans for a 32 cores cpu inside notebooks and much bigger Threadripper for desktop/workstation, than only 64 cores.
 

JMacHack

Suspended
Mar 16, 2017
1,965
2,424
One problem I have with this thread is not just the very annoying attitude displayed by the main actors but also how folks have been replying to them. The main claims were that a) M1 is bad at running demanding SIMD code and b) it is only good on content creation benchmarks. To which people reply by stating that a) chess engines should start using Apple APIs to get good performance and b) by showing ... content creation benchmarks. This directly validates their position and reaffirms their belief that Apple Silicon is "fanboy hype".

The story in the end is very simple. Yes, Apple Silicon underperforms on a bunch of software suites that have not been properly tested on the platform or we know run suboptimal code path (at some point they were even bringing Blender to aid there arguments, never mind that that particular version of Blender did not run natively...) So what? Nobody cares. This is like me claiming that Windows is crap for data science because some obscure piece of software I use only supports Unix.
I’d be more inclined to the argument that “ASi and s just bad at simd” if the bench results weren’t an order of magnitude lower. If it was beaten by the competition by 50 or even 75%, then yeah, might be right. But when the results are that bad something’s way off.

It doesn’t help that the matter is deliberately brought up by a troll.

Everything is possible ;)
I remember when people says 2 core cpu is enough and 4 core inside a notebook isn’t possible.
Intel have plans to go with 16 cores cpu inside notebooks and not only 8 cores.
Since a long time you can buy notebooks with a 3950 (16 cores) AMD cpu: https://www.xmg.gg/en/xmg-apex-15-e20/
And AMD have plans for a 32 cores cpu inside notebooks and much bigger Threadripper for desktop/workstation, than only 64 cores.
If their power draw is increasing, as rumored, I have a difficult time seeing the point of shoving more cores down the throat. I’m of the sentiment that a laptop shouldn’t roast my lap and need to be tethered to an outlet.

Also, the era you refer to had a similar problem, you could theoretically throw as many cores into a laptop as you like but there’s a point of diminishing returns. See: no G5 PowerBook

On the topic of diminishing returns, having that many cores seems to only benefit video and 3d rendering in benchmarks that I’ve seen. This may change in the future, but having 32 cores in a laptop seems beyond the pale. As other programmers said to me “1 woman can make a baby in 9 months, but 9 women can’t make a baby in 1 month.”

And as an aside, Apple silicon is very competitive with 16 core, 32 thread cpus with only 10 cores (only 8 performance cores). So why would Apple specifically need to reach core count parity when they manage with nearly half the cores at all?
 

Taz Mangus

macrumors 604
Mar 10, 2011
7,815
3,504
I’d be more inclined to the argument that “ASi and s just bad at simd” if the bench results weren’t an order of magnitude lower. If it was beaten by the competition by 50 or even 75%, then yeah, might be right. But when the results are that bad something’s way off.

It doesn’t help that the matter is deliberately brought up by a troll.


If their power draw is increasing, as rumored, I have a difficult time seeing the point of shoving more cores down the throat. I’m of the sentiment that a laptop shouldn’t roast my lap and need to be tethered to an outlet.

Also, the era you refer to had a similar problem, you could theoretically throw as many cores into a laptop as you like but there’s a point of diminishing returns. See: no G5 PowerBook

On the topic of diminishing returns, having that many cores seems to only benefit video and 3d rendering in benchmarks that I’ve seen. This may change in the future, but having 32 cores in a laptop seems beyond the pale. As other programmers said to me “1 woman can make a baby in 9 months, but 9 women can’t make a baby in 1 month.”

And as an aside, Apple silicon is very competitive with 16 core, 32 thread cpus with only 10 cores (only 8 performance cores). So why would Apple specifically need to reach core count parity when they manage with nearly half the cores at all?
The power draw from the Intel and AMD CPUs will only climb as more CPU and GPU Cores are added. More power draw means more heat, more fan noise and the inability to achieve maximum performance running on battery alone. It seems to me that those laptops become desktops because they need to be connected to the power most of the time.

Who would have imagined having a high performance laptop that is quiet, runs cool and can run on battery alone without needing to be plugged into the power outlet to achieve sustained performance.
 

jeanlain

macrumors 68020
Mar 14, 2009
2,462
956
Ah, I see what you mean. But Stockfish (what @Sopel referred to as „we“) does not use CUDA - it’s CPU only. The CUDA comment was regarding LC0, a different codebase that seems to rely on more standard ML appriach (they also use Apple Accelerate).
What all this also shows is that these chess engines are coded for certain platforms and that, at best, they will be ported to the M1/ARM. With performance penalties.
If these engines were initially coded for ARM/Neon/Metal/Accelerate/coreML, then porting CUDA and such may prove suboptimal.
That's what it costs to have a smaller marker share.
 

bcortens

macrumors 65816
Aug 16, 2007
1,324
1,796
Canada
What all this also shows is that these chess engines are coded for certain platforms and that, at best, they will be ported to the M1/ARM. With performance penalties.
If these engines were initially coded for ARM/Neon/Metal/Accelerate/coreML, then porting CUDA and such may prove suboptimal.
That's what it costs to have a smaller marker share.
I poked around the code in stockfish today and the use of neon is very basic while the codepath for AVX is much more optimized. I've been tempted to play with the NEON algorithms and also was considering trying to link in Apple's accelerate to use the simd_ instructions instead but haven't had time to do so.
 

cbum

macrumors member
Jun 16, 2015
57
42
Baltimore
I understand the frustration with some of the argumentation, but would like to point out that the discussion has been informative for a non-programmer like me.
 
  • Like
Reactions: Appletoni

throAU

macrumors G3
Feb 13, 2012
9,204
7,354
Perth, Western Australia
Everything is possible ;)
I remember when people says 2 core cpu is enough and 4 core inside a notebook isn’t possible.

Eventually, but we're talking about complaints with today's machine using today's hardware.

With today's leading edge 5nm manufacturing tech, 40 cores will be roughly the size I described. Unless intel was doing it, in which case multiply power consumption by 1.5-2x
 

Leifi

macrumors regular
Nov 6, 2021
128
121
Not merely underwhelming - daylen reports "For reasons I don't understand yet, inference on ANE and GPU is slower than inference on CPU". That's a performance bug in someone's code (I don't pretend to know whether it's Apple's or daylen's problem).

(hey guy who keeps insisting optimization can't possibly make a huge difference - here, a 5.5 TFLOPS (FP16) inferencing-optimized engine is losing to the M1 general purpose CPUs. Does that mean Apple's ANE is bad? Nah. Just means there's some problem which, once solved, should unlock much higher performance.)

Bugs and issues on proprietary un-open devices can perhaps be fixed or improved over time, or Apple could get its s*t together and open up and docuemnt better... but it's all guessess from there about "potential".. And for someone who cares about perfromance today and not next year when other hardware exists.. the M1 is a dud (for anything chess related)..
 
  • Like
Reactions: Appletoni

Leifi

macrumors regular
Nov 6, 2021
128
121
Love this idea of finding an irrelevant benchmark where the Apple chip is not tops in performance and then declaring it inferior for all purposes. I mostly come here for the laughs.

I have not seen anything in this thread remotely like "declaring it inferior for all purposes." .. on the contrary.. single-threaded performance is great, TDP as well, and compared to many older CPUs it flies...

The OP was probably simply disappointed after all the hype that it for some of the most demanding things you can put your CPU/GPU through (like Stockfish and Alpha Zero type chess engines) it is way below lower-priced competitors.

I think most people expected the M1, M1 Max, etc. to perform closer to its "hype-factor" from fanboys running native compiled versions of these kinds of high-performance apps.

For people who just use things like Numbers, Pages and Safari apps this is of course completely irrelevant, but if you are into chess it's sad that Apple currently underperforms big-time!
 
  • Like
Reactions: Appletoni
Status
Not open for further replies.
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.