I posted this on reddit a while back:What about SciPy? It seems to be much more complex to compile SciPy against Accelerate.
Could you benchmark the performance of NumPy-Accelerate with NumPy-OpenBLAS and check if it has better multithreading performance? It seems that OpenBLAS has a bad multithreading performance.
Using OpenBlas (Single-Threaded):
Dotted two 4096x4096 matrices in 2.90 s.
Dotted two vectors of length 524288 in 0.25 ms.
SVD of a 2048x1024 matrix in 0.81 s.
Cholesky decomposition of a 2048x2048 matrix in 0.11 s.
Eigendecomposition of a 2048x2048 matrix in 5.36 s.
Using OpenBlas (Multi-Threaded):
Dotted two 4096x4096 matrices in 0.56 s.
Dotted two vectors of length 524288 in 0.26 ms.
SVD of a 2048x1024 matrix in 3.15 s.
Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
Eigendecomposition of a 2048x2048 matrix in 10.17 s.
Using Accelerate:
Dotted two 4096x4096 matrices in 0.28 s.
Dotted two vectors of length 524288 in 0.11 ms.
SVD of a 2048x1024 matrix in 0.47 s.
Cholesky decomposition of a 2048x2048 matrix in 0.06 s.
Eigendecomposition of a 2048x2048 matrix in 4.98 s.