Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

leman

macrumors Core
Oct 14, 2008
19,521
19,675
1- Why is the GPU of the MacBook Pro/Max more efficient than the Nvidia GPU for some tasks and worse for others? Is the Apple GPU similar to ASIC instead of a general-purpose GPU?

My guess is suboptimal code in the TF backend. GEMM should perform fairly well on M1 max, especially if onuses the new matrix SIMD intrinsics.

2 - Should matrix multiplication run faster by AMX coprocessor or GPU?

Depends on the matrix and the hardware config. M1 Max GPU should be faster for FP32 GEMM, but the AMX units might be faster for GEMM on the base M1 for example.
 
  • Like
Reactions: Xiao_Xi

Gnattu

macrumors 65816
Sep 18, 2020
1,106
1,668
similar to ASIC instead of a general-purpose GPU
It is hard to make an ASIC-like stuff to behave like a general-purpose GPU. So it is indeed a general-purpose GPU.

My guess is suboptimal code in the TF backend.
It is. The M1 Max only uses 12 watt in the test case included in that link. Such low power consumption indicates a very low GPU pipeline utilization(a huge part is just actually idling despite the "number reported is 100%").
 
  • Like
Reactions: Xiao_Xi

Bodhitree

macrumors 68020
Apr 5, 2021
2,085
2,216
Netherlands
Developing software that ships on a different architecture is actually a standard situation.
No it is not. It is only typical for development of software for devices that can't be used themselves for software development (like smartphones)

Actually in almost all situations where you are shipping a cross-platform product most of the development is done on a single lead platform. It’s not unusual, many applications are developed this way.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
It seems Julia in multithreading has some issues on M1. How about Dask or Modin?
 
Last edited:

project_2501

macrumors 6502a
Original poster
Jul 1, 2017
676
792
Aside from the number of GPU cores, are the M1 Pro and M1 Max equivalent for numerical computing workloads?

... do they have the same number of AMX matrix multipliers, for example?
 

leman

macrumors Core
Oct 14, 2008
19,521
19,675
Aside from the number of GPU cores, are the M1 Pro and M1 Max equivalent for numerical computing workloads?

... do they have the same number of AMX matrix multipliers, for example?

If you don’t do any GPU compute, they should be identical. From what we know, AMX is per CPU cluster (comprised of 4x P cores), so that should be identical between the two.
 
  • Like
Reactions: project_2501

ahurst

macrumors 6502
Oct 12, 2021
410
815
Surprised there hasn't been any discussion yet of the problems with Apple's Accelerate.framework and Numpy/Scipy, specifically that both projects have dropped official support for it due to buggy and very out-of-date BLAS and LAPACK implementations (Accelerate.framework supports the LAPACK 3.2.1 API, which was released in 2009). According to the maintainers they've had no luck getting Apple to take a look at the problems.

As someone who relies heavily on both packages (both directly and indirectly as dependencies of others) this sucks, because without a workaround it means we can't leverage the blazing speeds of the AMX to speed up matrix math in most Python code. The single-core performance is still fast enough that it's not a show-stopper, but it's something I really wish Apple would work with the devs to fix (like they did with TensorFlow and Blender).

Leman's right though that R and STAN performance are off the charts, the huge cache and single-core speed on these chips makes for some big reductions in model time (my M1 Pro smokes our lab's ~4 year old dual-Xeon server by a factor of 2 at 6-chain models). I needed to install the experimental branch of RStan/StanHeaders from GitHub to get around a weird bug that broke adapt_delta, but other than that it's been smooth and fast sailing!
 

project_2501

macrumors 6502a
Original poster
Jul 1, 2017
676
792
Surprised there hasn't been any discussion yet of the problems with Apple's Accelerate.framework and Numpy/Scipy, specifically that both projects have dropped official support for it due to buggy and very out-of-date BLAS and LAPACK implementations (Accelerate.framework supports the LAPACK 3.2.1 API, which was released in 2009). According to the maintainers they've had no luck getting Apple to take a look at the problems.

As someone who relies heavily on both packages (both directly and indirectly as dependencies of others) this sucks, because without a workaround it means we can't leverage the blazing speeds of the AMX to speed up matrix math in most Python code. The single-core performance is still fast enough that it's not a show-stopper, but it's something I really wish Apple would work with the devs to fix (like they did with TensorFlow and Blender).

Leman's right though that R and STAN performance are off the charts, the huge cache and single-core speed on these chips makes for some big reductions in model time (my M1 Pro smokes our lab's ~4 year old dual-Xeon server by a factor of 2 at 6-chain models). I needed to install the experimental branch of RStan/StanHeaders from GitHub to get around a weird bug that broke adapt_delta, but other than that it's been smooth and fast sailing!

It would be a real shame if Apple spoiled their advantage in hardware by being difficult to work work for open source projects such as scipy/numpy.

It would also be Apple shooting itself in the foot because the open source movement is where the centre of mass for numerical work is increasingly - not with the proprietary tools like MATLAB or with chosen corporate open source like Google's Tensorflow or Facebook's Pytorch.
 
  • Like
Reactions: ahurst

ahurst

macrumors 6502
Oct 12, 2021
410
815
It would be a real shame if Apple spoiled their advantage in hardware by being difficult to work work for open source projects such as scipy/numpy.

It would also be Apple shooting itself in the foot because the open source movement is where the centre of mass for numerical work is increasingly - not with the proprietary tools like MATLAB or with chosen corporate open source like Google's Tensorflow or Facebook's Pytorch.
Literally minutes after making that post, I actually found a pull request Apple made to Numpy earlier this year re-enabling Accelerate support, claiming that all the major show-stopping bugs have been fixed as of macOS 11.3. Glad to see them stepping up! Of course the LAPACK API version is still more than a decade old. but hopefully that's fixed soon so we can use Accelerate with SciPy as well.

Agreed that open source is the future of science, especially with an increasing (and important) emphasis on reproducability of methods and analyses. A big amount of the shift away from SPSS to R in my field (cognitive science) has been due to the replication crisis and journals demanding reproducible code for analyses.

Also I get that it has its place, but gosh do I ever hate MATLAB. A lot of the big fMRI/EEG/MEG analysis suites are all done in that language for legacy reasons and it's just so obtuse, it breaks so many basic conventions common to most modern languages and lacks any real package management solution. Looking forward to a future where we don't have to deal with it anymore.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
It would be a real shame if Apple spoiled their advantage in hardware by being difficult to work for open source projects such as scipy/numpy.
Indeed, Apple "hampers" the development of open-source projects by imposing strict requirements to cloud service providers that want to offer macOS-based instances.

Open-source projects will move faster when Github offers cheap runners for Apple Silicon.
 
  • Like
Reactions: project_2501

leman

macrumors Core
Oct 14, 2008
19,521
19,675
Literally minutes after making that post, I actually found a pull request Apple made to Numpy earlier this year re-enabling Accelerate support, claiming that all the major show-stopping bugs have been fixed as of macOS 11.3. Glad to see them stepping up! Of course the LAPACK API version is still more than a decade old. but hopefully that's fixed soon so we can use Accelerate with SciPy as well.

Yes, it seems that Apple finally started pouring some resources into these much neglected areas. Hopefully they pick up the pace and retake the leadership. They have some of the best numerical programmers in the world, it should be not to difficult for them to offer industry-leading libraries fro scientific computation.
 
  • Like
Reactions: project_2501

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Yes, it seems that Apple finally started pouring some resources into these much neglected areas. Hopefully they pick up the pace and retake the leadership. They have some of the best numerical programmers in the world, it should be not to difficult for them to offer industry-leading libraries fro scientific computation.
Let's hope so. It seems some popular numerical libraries will not use Apple's Accelerate framework unless Apple fixes its LAPLACK and BLAS libraries.

 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
General-purpose GPUs are ASICs.
It seems I don't understand well what ASICs is and the differences between an ASIC and a GPU.

For me, an ASIC is a specialized hardware that is very good in only one task. For instance, I would consider the media engine an ASIC if it wasn't inside the M1.

For me, GPUs are general-purpose hardware because they are good at several tasks: gaming, scientific computing or 3D rendering.

@cmaier Can you explain why you think GPUs are ASICs?
 

mi7chy

macrumors G4
Oct 24, 2014
10,622
11,294
When it comes to mining, for example, ASIC is a significant jump in performance but specialized for a specific crypto currency versus general purpose GPU that can mine multiple currencies. Ideal is a system with CPU+GPU+FPGA which is where Intel and AMD are heading with Altera and Xilinx acquisition.
 
Last edited:

cmaier

Suspended
Jul 25, 2007
25,405
33,474
California
It seems I don't understand well what ASICs is and the differences between an ASIC and a GPU.

For me, an ASIC is a specialized hardware that is very good in only one task. For instance, I would consider the media engine an ASIC if it wasn't inside the M1.

For me, GPUs are general-purpose hardware because they are good at several tasks: gaming, scientific computing or 3D rendering.

@cmaier Can you explain why you think GPUs are ASICs?

ASICS are chips that are “application specific” in the sense that they are designed for a particular use, sure. Historically, that just meant “it’s not a gate array. It’s not a CPU. It’s not a RAM. So it’s an ASIC!” Since ”application specific” can mean anything (it’s not a CPU, all it can do is handle graphics!), in the industry we use “ASIC” to refer to chips that are designed using a certain set of methodologies.
 
  • Like
Reactions: Xiao_Xi

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
@leman How is the performance of Stan in M1?

Is Stan the standard in the industry? Are Python's PyMC3 or Julia's Turing viable alternatives?
 

leman

macrumors Core
Oct 14, 2008
19,521
19,675
@leman How is the performance of Stan in M1?

I don’t work with Stan myself but I’ve been told by my colleagues that it’s very good. A base M1 performs on par (or slightly faster than) with the Intel i9 16” MBP - I can imagine that M1 Pro will be considerably faster.

Is Stan the standard in the industry? Are Python's PyMC3 or Julia's Turing viable alternatives?

Can’t really comment, not my area of expertise.

It seems NumPy doesn't use Apple's Accelerate framework, so you need to tweak the source code and compile it. The last post explains how to do it.

Yeah, it’s a shame they don’t default to building with Accelerate. I hope this changes in the future. Of course, much fault of it is on Apple for shipping a buggy version of Accelerate for years. Still, surprising how low the performance of OpenBLAS is…
 
  • Like
Reactions: Xiao_Xi

ADGrant

macrumors 68000
Mar 26, 2018
1,689
1,059
You missed the part where supercomputer running Linux (and all sorts of workstations, servers, server farms, cloud services etc.) are x86 based. Developing software for them on machines with different architecture is amateurish at best (if possible at all)
Actually, developing software that only supports one platform could be considered amateurish. Professional software developers try not to tie their software to a single platform (obviously this does not always apply to UI code).
 

jinnyman

macrumors 6502a
Sep 2, 2011
762
671
Lincolnshire, IL
As much as I like my M1 Max, I'd use x64 with Nvidia coda for even a simple prototyping.
I'm not an expert in data science, but I do my own personal project to evaluate a certain work related data for better presentation in my work. I don't require huge performance and hardware, but I need an access to vast array of community and reference information which CUDA has too much plentiful of. I respect an effort in M1 and whatnot, but waiting for navigation in uncharted water is imo too inefficient.
 
  • Like
Reactions: Xiao_Xi

haginile

macrumors regular
Dec 13, 2006
102
74

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
I've successfully compiled numpy against Accelerate
What about SciPy? It seems to be much more complex to compile SciPy against Accelerate.

performance is much better than openblas
Could you benchmark the performance of NumPy-Accelerate with NumPy-OpenBLAS and check if it has better multithreading performance? It seems that OpenBLAS has a bad multithreading performance.
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.