any intel-M1 comparisons on data science workloads?

haginile · Jan 31, 2022

Xiao_Xi said:
What about SciPy? It seems to be much more complex to compile SciPy against Accelerate.

Could you benchmark the performance of NumPy-Accelerate with NumPy-OpenBLAS and check if it has better multithreading performance? It seems that OpenBLAS has a bad multithreading performance.

I posted this on reddit a while back:

Using OpenBlas (Single-Threaded):

Dotted two 4096x4096 matrices in 2.90 s.
Dotted two vectors of length 524288 in 0.25 ms.
SVD of a 2048x1024 matrix in 0.81 s.
Cholesky decomposition of a 2048x2048 matrix in 0.11 s.
Eigendecomposition of a 2048x2048 matrix in 5.36 s.

Using OpenBlas (Multi-Threaded):

Dotted two 4096x4096 matrices in 0.56 s.
Dotted two vectors of length 524288 in 0.26 ms.
SVD of a 2048x1024 matrix in 3.15 s.
Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
Eigendecomposition of a 2048x2048 matrix in 10.17 s.

Using Accelerate:

Dotted two 4096x4096 matrices in 0.28 s.
Dotted two vectors of length 524288 in 0.11 ms.
SVD of a 2048x1024 matrix in 0.47 s.
Cholesky decomposition of a 2048x2048 matrix in 0.06 s.
Eigendecomposition of a 2048x2048 matrix in 4.98 s.

haginile · Feb 22, 2022

This article might be of interest. The power consumption part is super interesting as well.

crazy dave · Feb 22, 2022

haginile said:
This article might be of interest. The power consumption part is super interesting as well.

URL broken for me

Andropov · Feb 22, 2022

crazy dave said:
URL broken for me

Probably meant this article

Andropov · Feb 22, 2022

falainber said:
You missed the part where supercomputer running Linux (and all sorts of workstations, servers, server farms, cloud services etc.) are x86 based. Developing software for them on machines with different architecture is amateurish at best (if possible at all)

The top 4 fastest supercomputers in the world are not x86/amd64 based.

Sydde · Feb 22, 2022

Andropov said:
The top 4 fastest supercomputers in the world are not x86/amd64 based.

In fact, the most powerful one seems to be some sort of ARM cluster. However, the #6-10 are x86. Getting time on one of these beasts is a major challenge, and the one you can actually use will most likely be EPYC or Xeon.

Still, compilers run on the machine they run on and do not really care if their output will not. Building for a SC is like building for anything else. Good luck with the debug cycle though, even if you are targeting the same architecture – once you have a zillion threads running at the same time, you damn well better have them properly synchronized or you will end up with an impressive heap of garbage really fast.

leman · Feb 23, 2022

Sydde said:
In fact, the most powerful one seems to be some sort of ARM cluster. However, the #6-10 are x86. Getting time on one of these beasts is a major challenge, and the one you can actually use will most likely be EPYC or Xeon.

Fugaku is a weird one, as it is often mentioned as some sort of proof of ARM's technological superiority and advancement, but the simple fact is that A64F is a fairly crappy CPU even by the crappy CPU standards. It is a niche product with 512bit vector ALUs, more of a GPU than a CPU really, and is optimised for data-parallel scientific computation. Now pack thousands of those things together and you get the fastest supercomputer ever. The rest of the features follow suit: HBM2 RAM, advanced reliability features etc. In short, it's not something you'd want in your desktop computer (not even a workstation one), and it's not a technology that can be meaningfully applied to everyday prosumer computing. A modern iPhone will run circles around A64F for most things people do on their computers.

Andropov · Feb 23, 2022

leman said:
Fugaku is a weird one, as it is often mentioned as some sort of proof of ARM's technological superiority and advancement, but the simple fact is that A64F is a fairly crappy CPU even by the crappy CPU standards. It is a niche product with 512bit vector ALUs, more of a GPU than a CPU really, and is optimised for data-parallel scientific computation.

Saying it's more of a GPU than a CPU it's a bit too extreme. It's true that it's optimised for SIMD-heavy scientific computation, but that's actually fine. A lot of scientific simulations are parallel enough that massive SIMD capabilities are a big plus, but don't comply to the strict control flow requirements a GPU requires to be performant (branching is the main reason that comes to mind) nor are parallelizable for the entire routine (i.e. sections with a tight loop that are trivially vectorized but are then followed by some operation on the result of the loop that aren't).

(Not useful for an everyday computer, for sure).

januarydrive7 · Feb 23, 2022

leman said:
Fugaku is a weird one, as it is often mentioned as some sort of proof of ARM's technological superiority and advancement, but the simple fact is that A64F is a fairly crappy CPU even by the crappy CPU standards. It is a niche product with 512bit vector ALUs, more of a GPU than a CPU really, and is optimised for data-parallel scientific computation. Now pack thousands of those things together and you get the fastest supercomputer ever. The rest of the features follow suit: HBM2 RAM, advanced reliability features etc. In short, it's not something you'd want in your desktop computer (not even a workstation one), and it's not a technology that can be meaningfully applied to everyday prosumer computing. A modern iPhone will run circles around A64F for most things people do on their computers.

To be fair, M1 Pro/Max are more GPU than CPU 😅

Xiao_Xi · Jul 18, 2022

Is Apple's Accelerate framework still based on an old LAPLACK and BLAS library?
Does it affect the performance of Scipy and Numpy?

Sydde · Jul 26, 2022

januarydrive7 said:
To be fair, M1 Pro/Max are more GPU than CPU ?

But not really by all that much. The Pro Core complex is more than two-thirds the size (die area) of the GPU complex. And, if the M-series cores implement SVE, (theoretically, up 2048-bit vectors), that is some major CPU.

Xiao_Xi · Jul 28, 2022

How does the Julia programming language compare to Python and Matlab for data science/numerical computation?

It seems that libraries can access Apple's GPU through Metal.jl as they do with Nvidia GPUs through CUDA.jl.

Sterkenburg · Jul 28, 2022

Xiao_Xi said:
How does the Julia programming language compare to Python and Matlab for data science/numerical computation?

It seems that libraries can access Apple's GPU through Metal.jl as they do with Nvidia GPUs through CUDA.jl.

Interesting. I dabbled with Julia years ago, in my previous life as a computing enthusiast working in academia. We had invited Karpinski and Bezanson to give a workshop at our department, at the time the language was still on version 0.4 or so and they were mostly marketing it as a better alternative to Matlab for scientists (i.e equally easy to write but faster).

But then I switched to the industry and nowadays mostly use Python for ML and data science work. I think the language has evolved considerably since then, so it would be interesting to hear the perspective of someone who uses it in actual projects.

Xiao_Xi · Aug 18, 2022

The Julia developers seem to have fixed the issues in Apple Silicon and Julia 1.8 runs smoothly.

Julia 1.8 Highlights

Highlights of the Julia 1.8 release.

julialang.org

Search

Search

any intel-M1 comparisons on data science workloads?

haginile

macrumors regular

haginile

macrumors regular

crazy dave

macrumors 68000

Andropov

macrumors 6502a

Andropov

macrumors 6502a

Sydde

macrumors 68030

leman

macrumors Core

Andropov

macrumors 6502a

januarydrive7

macrumors 6502a

Xiao_Xi

macrumors 68000

Sydde

macrumors 68030

Xiao_Xi

macrumors 68000

Sterkenburg

macrumors 6502a

Xiao_Xi

macrumors 68000

Julia 1.8 Highlights

Our Staff