Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

haginile

macrumors regular
Dec 13, 2006
102
74
What about SciPy? It seems to be much more complex to compile SciPy against Accelerate.


Could you benchmark the performance of NumPy-Accelerate with NumPy-OpenBLAS and check if it has better multithreading performance? It seems that OpenBLAS has a bad multithreading performance.
I posted this on reddit a while back:


Using OpenBlas (Single-Threaded):

Dotted two 4096x4096 matrices in 2.90 s.
Dotted two vectors of length 524288 in 0.25 ms.
SVD of a 2048x1024 matrix in 0.81 s.
Cholesky decomposition of a 2048x2048 matrix in 0.11 s.
Eigendecomposition of a 2048x2048 matrix in 5.36 s.

Using OpenBlas (Multi-Threaded):

Dotted two 4096x4096 matrices in 0.56 s.
Dotted two vectors of length 524288 in 0.26 ms.
SVD of a 2048x1024 matrix in 3.15 s.
Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
Eigendecomposition of a 2048x2048 matrix in 10.17 s.

Using Accelerate:

Dotted two 4096x4096 matrices in 0.28 s.
Dotted two vectors of length 524288 in 0.11 ms.
SVD of a 2048x1024 matrix in 0.47 s.
Cholesky decomposition of a 2048x2048 matrix in 0.06 s.
Eigendecomposition of a 2048x2048 matrix in 4.98 s.
 

Sydde

macrumors 68030
Aug 17, 2009
2,563
7,061
IOKWARDI
In fact, the most powerful one seems to be some sort of ARM cluster. However, the #6-10 are x86. Getting time on one of these beasts is a major challenge, and the one you can actually use will most likely be EPYC or Xeon.

Still, compilers run on the machine they run on and do not really care if their output will not. Building for a SC is like building for anything else. Good luck with the debug cycle though, even if you are targeting the same architecture – once you have a zillion threads running at the same time, you damn well better have them properly synchronized or you will end up with an impressive heap of garbage really fast.
 
  • Like
Reactions: bobcomer

leman

macrumors Core
Oct 14, 2008
19,521
19,675
In fact, the most powerful one seems to be some sort of ARM cluster. However, the #6-10 are x86. Getting time on one of these beasts is a major challenge, and the one you can actually use will most likely be EPYC or Xeon.

Fugaku is a weird one, as it is often mentioned as some sort of proof of ARM's technological superiority and advancement, but the simple fact is that A64F is a fairly crappy CPU even by the crappy CPU standards. It is a niche product with 512bit vector ALUs, more of a GPU than a CPU really, and is optimised for data-parallel scientific computation. Now pack thousands of those things together and you get the fastest supercomputer ever. The rest of the features follow suit: HBM2 RAM, advanced reliability features etc. In short, it's not something you'd want in your desktop computer (not even a workstation one), and it's not a technology that can be meaningfully applied to everyday prosumer computing. A modern iPhone will run circles around A64F for most things people do on their computers.
 
  • Like
Reactions: bobcomer

Andropov

macrumors 6502a
May 3, 2012
746
990
Spain
Fugaku is a weird one, as it is often mentioned as some sort of proof of ARM's technological superiority and advancement, but the simple fact is that A64F is a fairly crappy CPU even by the crappy CPU standards. It is a niche product with 512bit vector ALUs, more of a GPU than a CPU really, and is optimised for data-parallel scientific computation.
Saying it's more of a GPU than a CPU it's a bit too extreme. It's true that it's optimised for SIMD-heavy scientific computation, but that's actually fine. A lot of scientific simulations are parallel enough that massive SIMD capabilities are a big plus, but don't comply to the strict control flow requirements a GPU requires to be performant (branching is the main reason that comes to mind) nor are parallelizable for the entire routine (i.e. sections with a tight loop that are trivially vectorized but are then followed by some operation on the result of the loop that aren't).

(Not useful for an everyday computer, for sure).
 
  • Like
Reactions: crazy dave

januarydrive7

macrumors 6502a
Oct 23, 2020
537
578
Fugaku is a weird one, as it is often mentioned as some sort of proof of ARM's technological superiority and advancement, but the simple fact is that A64F is a fairly crappy CPU even by the crappy CPU standards. It is a niche product with 512bit vector ALUs, more of a GPU than a CPU really, and is optimised for data-parallel scientific computation. Now pack thousands of those things together and you get the fastest supercomputer ever. The rest of the features follow suit: HBM2 RAM, advanced reliability features etc. In short, it's not something you'd want in your desktop computer (not even a workstation one), and it's not a technology that can be meaningfully applied to everyday prosumer computing. A modern iPhone will run circles around A64F for most things people do on their computers.
To be fair, M1 Pro/Max are more GPU than CPU ?
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Is Apple's Accelerate framework still based on an old LAPLACK and BLAS library?
Does it affect the performance of Scipy and Numpy?
 

Sydde

macrumors 68030
Aug 17, 2009
2,563
7,061
IOKWARDI
To be fair, M1 Pro/Max are more GPU than CPU ?

But not really by all that much. The Pro Core complex is more than two-thirds the size (die area) of the GPU complex. And, if the M-series cores implement SVE, (theoretically, up 2048-bit vectors), that is some major CPU.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
How does the Julia programming language compare to Python and Matlab for data science/numerical computation?

It seems that libraries can access Apple's GPU through Metal.jl as they do with Nvidia GPUs through CUDA.jl.

 

Sterkenburg

macrumors 6502a
Oct 27, 2016
556
553
Japan
How does the Julia programming language compare to Python and Matlab for data science/numerical computation?

It seems that libraries can access Apple's GPU through Metal.jl as they do with Nvidia GPUs through CUDA.jl.
Interesting. I dabbled with Julia years ago, in my previous life as a computing enthusiast working in academia. We had invited Karpinski and Bezanson to give a workshop at our department, at the time the language was still on version 0.4 or so and they were mostly marketing it as a better alternative to Matlab for scientists (i.e equally easy to write but faster).

But then I switched to the industry and nowadays mostly use Python for ML and data science work. I think the language has evolved considerably since then, so it would be interesting to hear the perspective of someone who uses it in actual projects.
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.