any intel-M1 comparisons on data science workloads?

Xiao_Xi · Nov 16, 2021

1- Why is the GPU of the MacBook Pro/Max more efficient than the Nvidia GPU for some tasks and worse for others? Is the Apple GPU similar to ASIC instead of a general-purpose GPU?

Low performance on matrix multipli… | Apple Developer Forums

developer.apple.com

2 - Should matrix multiplication run faster by AMX coprocessor or GPU?

leman · Nov 16, 2021

Xiao_Xi said:
1- Why is the GPU of the MacBook Pro/Max more efficient than the Nvidia GPU for some tasks and worse for others? Is the Apple GPU similar to ASIC instead of a general-purpose GPU?

Low performance on matrix multipli… | Apple Developer Forums

developer.apple.com

My guess is suboptimal code in the TF backend. GEMM should perform fairly well on M1 max, especially if onuses the new matrix SIMD intrinsics.

Xiao_Xi said:
2 - Should matrix multiplication run faster by AMX coprocessor or GPU?

Depends on the matrix and the hardware config. M1 Max GPU should be faster for FP32 GEMM, but the AMX units might be faster for GEMM on the base M1 for example.

Gnattu · Nov 16, 2021

Xiao_Xi said:
similar to ASIC instead of a general-purpose GPU

It is hard to make an ASIC-like stuff to behave like a general-purpose GPU. So it is indeed a general-purpose GPU.

leman said:
My guess is suboptimal code in the TF backend.

It is. The M1 Max only uses 12 watt in the test case included in that link. Such low power consumption indicates a very low GPU pipeline utilization(a huge part is just actually idling despite the "number reported is 100%").

Bodhitree · Nov 18, 2021

falainber said:
Developing software that ships on a different architecture is actually a standard situation.
No it is not. It is only typical for development of software for devices that can't be used themselves for software development (like smartphones)

Actually in almost all situations where you are shipping a cross-platform product most of the development is done on a single lead platform. It’s not unusual, many applications are developed this way.

Xiao_Xi · Nov 26, 2021

It seems Julia in multithreading has some issues on M1. How about Dask or Modin?

project_2501 · Dec 9, 2021

Aside from the number of GPU cores, are the M1 Pro and M1 Max equivalent for numerical computing workloads?

... do they have the same number of AMX matrix multipliers, for example?

leman · Dec 9, 2021

project_2501 said:
Aside from the number of GPU cores, are the M1 Pro and M1 Max equivalent for numerical computing workloads?

... do they have the same number of AMX matrix multipliers, for example?

If you don’t do any GPU compute, they should be identical. From what we know, AMX is per CPU cluster (comprised of 4x P cores), so that should be identical between the two.

ahurst · Dec 9, 2021

Surprised there hasn't been any discussion yet of the problems with Apple's Accelerate.framework and Numpy/Scipy, specifically that both projects have dropped official support for it due to buggy and very out-of-date BLAS and LAPACK implementations (Accelerate.framework supports the LAPACK 3.2.1 API, which was released in 2009). According to the maintainers they've had no luck getting Apple to take a look at the problems.

As someone who relies heavily on both packages (both directly and indirectly as dependencies of others) this sucks, because without a workaround it means we can't leverage the blazing speeds of the AMX to speed up matrix math in most Python code. The single-core performance is still fast enough that it's not a show-stopper, but it's something I really wish Apple would work with the devs to fix (like they did with TensorFlow and Blender).

Leman's right though that R and STAN performance are off the charts, the huge cache and single-core speed on these chips makes for some big reductions in model time (my M1 Pro smokes our lab's ~4 year old dual-Xeon server by a factor of 2 at 6-chain models). I needed to install the experimental branch of RStan/StanHeaders from GitHub to get around a weird bug that broke adapt_delta, but other than that it's been smooth and fast sailing!

project_2501 · Dec 9, 2021

ahurst said:
Surprised there hasn't been any discussion yet of the problems with Apple's Accelerate.framework and Numpy/Scipy, specifically that both projects have dropped official support for it due to buggy and very out-of-date BLAS and LAPACK implementations (Accelerate.framework supports the LAPACK 3.2.1 API, which was released in 2009). According to the maintainers they've had no luck getting Apple to take a look at the problems.

As someone who relies heavily on both packages (both directly and indirectly as dependencies of others) this sucks, because without a workaround it means we can't leverage the blazing speeds of the AMX to speed up matrix math in most Python code. The single-core performance is still fast enough that it's not a show-stopper, but it's something I really wish Apple would work with the devs to fix (like they did with TensorFlow and Blender).

Leman's right though that R and STAN performance are off the charts, the huge cache and single-core speed on these chips makes for some big reductions in model time (my M1 Pro smokes our lab's ~4 year old dual-Xeon server by a factor of 2 at 6-chain models). I needed to install the experimental branch of RStan/StanHeaders from GitHub to get around a weird bug that broke adapt_delta, but other than that it's been smooth and fast sailing!

It would be a real shame if Apple spoiled their advantage in hardware by being difficult to work work for open source projects such as scipy/numpy.

It would also be Apple shooting itself in the foot because the open source movement is where the centre of mass for numerical work is increasingly - not with the proprietary tools like MATLAB or with chosen corporate open source like Google's Tensorflow or Facebook's Pytorch.

ahurst · Dec 9, 2021

project_2501 said:
It would be a real shame if Apple spoiled their advantage in hardware by being difficult to work work for open source projects such as scipy/numpy.

It would also be Apple shooting itself in the foot because the open source movement is where the centre of mass for numerical work is increasingly - not with the proprietary tools like MATLAB or with chosen corporate open source like Google's Tensorflow or Facebook's Pytorch.

Literally minutes after making that post, I actually found a pull request Apple made to Numpy earlier this year re-enabling Accelerate support, claiming that all the major show-stopping bugs have been fixed as of macOS 11.3. Glad to see them stepping up! Of course the LAPACK API version is still more than a decade old. but hopefully that's fixed soon so we can use Accelerate with SciPy as well.

Agreed that open source is the future of science, especially with an increasing (and important) emphasis on reproducability of methods and analyses. A big amount of the shift away from SPSS to R in my field (cognitive science) has been due to the replication crisis and journals demanding reproducible code for analyses.

Also I get that it has its place, but gosh do I ever hate MATLAB. A lot of the big fMRI/EEG/MEG analysis suites are all done in that language for legacy reasons and it's just so obtuse, it breaks so many basic conventions common to most modern languages and lacks any real package management solution. Looking forward to a future where we don't have to deal with it anymore.

Xiao_Xi · Dec 9, 2021

project_2501 said:
It would be a real shame if Apple spoiled their advantage in hardware by being difficult to work for open source projects such as scipy/numpy.

Indeed, Apple "hampers" the development of open-source projects by imposing strict requirements to cloud service providers that want to offer macOS-based instances.

Open-source projects will move faster when Github offers cheap runners for Apple Silicon.

leman · Dec 9, 2021

ahurst said:
Literally minutes after making that post, I actually found a pull request Apple made to Numpy earlier this year re-enabling Accelerate support, claiming that all the major show-stopping bugs have been fixed as of macOS 11.3. Glad to see them stepping up! Of course the LAPACK API version is still more than a decade old. but hopefully that's fixed soon so we can use Accelerate with SciPy as well.

Yes, it seems that Apple finally started pouring some resources into these much neglected areas. Hopefully they pick up the pace and retake the leadership. They have some of the best numerical programmers in the world, it should be not to difficult for them to offer industry-leading libraries fro scientific computation.

Xiao_Xi · Dec 10, 2021

leman said:
Yes, it seems that Apple finally started pouring some resources into these much neglected areas. Hopefully they pick up the pace and retake the leadership. They have some of the best numerical programmers in the world, it should be not to difficult for them to offer industry-leading libraries fro scientific computation.

Let's hope so. It seems some popular numerical libraries will not use Apple's Accelerate framework unless Apple fixes its LAPLACK and BLAS libraries.

Unable to install scipy on macOS Big Sur (Intel hardware) · Issue #13102 · scipy/scipy

After Big Sur update I'm not able to install scipy in any of my virtualenvs. pip install scipy generate this error numpy.distutils.system_info.NotFoundError: No lapack/blas resources found. Note: A...

github.com

cmaier · Dec 10, 2021

Xiao_Xi said:
1- Why is the GPU of the MacBook Pro/Max more efficient than the Nvidia GPU for some tasks and worse for others? Is the Apple GPU similar to ASIC instead of a general-purpose GPU?

Low performance on matrix multipli… | Apple Developer Forums

developer.apple.com

2 - Should matrix multiplication run faster by AMX coprocessor or GPU?

General-purpose GPUs are ASICs.

Xiao_Xi · Dec 10, 2021

cmaier said:
General-purpose GPUs are ASICs.

It seems I don't understand well what ASICs is and the differences between an ASIC and a GPU.

For me, an ASIC is a specialized hardware that is very good in only one task. For instance, I would consider the media engine an ASIC if it wasn't inside the M1.

For me, GPUs are general-purpose hardware because they are good at several tasks: gaming, scientific computing or 3D rendering.

@cmaier Can you explain why you think GPUs are ASICs?

mi7chy · Dec 10, 2021

When it comes to mining, for example, ASIC is a significant jump in performance but specialized for a specific crypto currency versus general purpose GPU that can mine multiple currencies. Ideal is a system with CPU+GPU+FPGA which is where Intel and AMD are heading with Altera and Xilinx acquisition.

cmaier · Dec 10, 2021

Xiao_Xi said:
It seems I don't understand well what ASICs is and the differences between an ASIC and a GPU.

For me, an ASIC is a specialized hardware that is very good in only one task. For instance, I would consider the media engine an ASIC if it wasn't inside the M1.

For me, GPUs are general-purpose hardware because they are good at several tasks: gaming, scientific computing or 3D rendering.

@cmaier Can you explain why you think GPUs are ASICs?

ASICS are chips that are “application specific” in the sense that they are designed for a particular use, sure. Historically, that just meant “it’s not a gate array. It’s not a CPU. It’s not a RAM. So it’s an ASIC!” Since ”application specific” can mean anything (it’s not a CPU, all it can do is handle graphics!), in the industry we use “ASIC” to refer to chips that are designed using a certain set of methodologies.

Xiao_Xi · Dec 10, 2021

@leman How is the performance of Stan in M1?

Is Stan the standard in the industry? Are Python's PyMC3 or Julia's Turing viable alternatives?

Xiao_Xi · Dec 10, 2021

It seems NumPy doesn't use Apple's Accelerate framework, so you need to tweak the source code and compile it. The last post explains how to do it.

Why Python native on M1 Max is gre… | Apple Developer Forums

developer.apple.com

leman · Dec 10, 2021

Xiao_Xi said:
@leman How is the performance of Stan in M1?

I don’t work with Stan myself but I’ve been told by my colleagues that it’s very good. A base M1 performs on par (or slightly faster than) with the Intel i9 16” MBP - I can imagine that M1 Pro will be considerably faster.

Xiao_Xi said:
Is Stan the standard in the industry? Are Python's PyMC3 or Julia's Turing viable alternatives?

Can’t really comment, not my area of expertise.

Xiao_Xi said:
It seems NumPy doesn't use Apple's Accelerate framework, so you need to tweak the source code and compile it. The last post explains how to do it.

Why Python native on M1 Max is gre… | Apple Developer Forums

developer.apple.com

Yeah, it’s a shame they don’t default to building with Accelerate. I hope this changes in the future. Of course, much fault of it is on Apple for shipping a buggy version of Accelerate for years. Still, surprising how low the performance of OpenBLAS is…

ADGrant · Dec 10, 2021

falainber said:
You missed the part where supercomputer running Linux (and all sorts of workstations, servers, server farms, cloud services etc.) are x86 based. Developing software for them on machines with different architecture is amateurish at best (if possible at all)

Actually, developing software that only supports one platform could be considered amateurish. Professional software developers try not to tie their software to a single platform (obviously this does not always apply to UI code).

jinnyman · Dec 11, 2021

As much as I like my M1 Max, I'd use x64 with Nvidia coda for even a simple prototyping.
I'm not an expert in data science, but I do my own personal project to evaluate a certain work related data for better presentation in my work. I don't require huge performance and hardware, but I need an access to vast array of community and reference information which CUDA has too much plentiful of. I respect an effort in M1 and whatnot, but waiting for navigation in uncharted water is imo too inefficient.

Xiao_Xi · Jan 30, 2022

What is the state of numerical Python libraries? Are they now compiled against Accelerate or still using OpenBLAS?

Has anyone tried conda install "libblas=*=*accelerate"?

Knowledge Base — conda-forge 2022.01.28 documentation

conda-forge.org

haginile · Jan 30, 2022

Xiao_Xi said:
What is the state of numerical Python libraries? Are they now compiled against Accelerate or still using OpenBLAS?

Has anyone tried conda install "libblas=*=*accelerate"?

Knowledge Base — conda-forge 2022.01.28 documentation

conda-forge.org

I've successfully compiled numpy against Accelerate and the performance is much better than openblas. See How to build NumPy from source linked to Apple Accelerate framework?

Xiao_Xi · Jan 30, 2022

haginile said:
I've successfully compiled numpy against Accelerate

What about SciPy? It seems to be much more complex to compile SciPy against Accelerate.

haginile said:
performance is much better than openblas

Could you benchmark the performance of NumPy-Accelerate with NumPy-OpenBLAS and check if it has better multithreading performance? It seems that OpenBLAS has a bad multithreading performance.

any intel-M1 comparisons on data science workloads?

macrumors 68000

macrumors Core

macrumors 65816

macrumors 68020

macrumors 68000

Cancelled

macrumors Core

macrumors 6502

Cancelled

macrumors 6502

macrumors 68000

macrumors Core

macrumors 68000

Suspended

macrumors 68000

Suspended

Suspended

macrumors 68000

macrumors 68000

macrumors Core

macrumors 68000

macrumors 6502a

macrumors 68000

macrumors regular

macrumors 68000

Our Staff