Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

istvan60

macrumors newbie
Original poster
Nov 17, 2020
18
7
I know there are tons of posts with benchmarks and tests whether a given software works, but I would like to dedicate this thread to data science apps e.g. R, RStudio, SPSS, Python (I know it will work natively), QGIS and additional software that researchers use to create illustrations as e.g. CorelDraw, Gimp....

Has any of you tried out R or SPSS under a new M1 Macbook, do either of these work fine under Rosetta 2 (faster / slower / do not work at all), as I suppose none has a native ARM version yet. Or, did anyone try CorelDraw as well?

Don't hesitate to post new questions and answers, but please focus on data science apps.

There are useful discussion related to the topic here

but no answers yet....
 
  • Like
Reactions: Mbrum
According to this article at the Reg, R is awaiting a Fortran Compiler before it will work on Apple Silicon. The Julia Language is tracking its "porting" work here.
 
  • Like
Reactions: pldelisle
I know there are tons of posts with benchmarks and tests whether a given software works, but I would like to dedicate this thread to data science apps e.g. R, RStudio, SPSS, Python (I know it will work natively), QGIS and additional software that researchers use to create illustrations as e.g. CorelDraw, Gimp....

R works great for me under Rosetta. I also managed to build it natively, but some packages are blocked (especially geoscience stuff).

Hi!
A native Fortran compiler for Apple silicon is out.
https://www.nag.com/news/first-fortran-compiler-apple-silicon-macs

However, it would be very nice to see tests on R functioning under Rosetta 2


Commercial compiler availability doesn’t really help. We need gcc (nobody bothered to port it until now because there was little interest of running it on iOS) or Flang (which will still take at least half a year). Preliminary versions of gcc are available though and they seem to work well enough.
 
  • Like
Reactions: pldelisle
R works great for me under Rosetta. I also managed to build it natively, but some packages are blocked (especially geoscience stuff).




Commercial compiler availability doesn’t really help. We need gcc (nobody bothered to port it until now because there was little interest of running it on iOS) or Flang (which will still take at least half a year). Preliminary versions of gcc are available though and they seem to work well enough.
Glad to hear that! And did you notice any change in performance? I am usually running RAM hungry scripts and tons of iterations so both the processing speed and RAM handling is an issue. I have doubts about performance under Rosetta2....

Any opinion on that?
 
Glad to hear that! And did you notice any change in performance? I am usually running RAM hungry scripts and tons of iterations so both the processing speed and RAM handling is an issue. I have doubts about performance under Rosetta2....

Any opinion on that?

R on M1 under Rosetta is faster than R running on my 16" MBP. But I suppose it will depend on what you do. I have not tested it under circumstances that will require very high amounts of RAM. If you have any tests or scripts you want to try out, I can gladly do it, just send me a message.
 
Screenshot 2020-11-30 at 9.34.32.png

Under:
R version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin17.0 (64-bit)


install.packages("benchmarkme")
install.packages("benchmarkmeData")
ibrary(benchmarkme)

get_cpu()
#"Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz"

get_ram()
#17.2 GB

res = benchmark_std()

# Programming benchmarks (5 tests):
3,500,000 Fibonacci numbers calculation (vector calc): 0.263 (sec).
Grand common divisors of 1,000,000 pairs (recursion): 0.682 (sec).
Creation of a 3,500 x 3,500 Hilbert matrix (matrix calc): 0.338 (sec).
Creation of a 3,000 x 3,000 Toeplitz matrix (loops): 1.17 (sec).
Escoufiers method on a 60 x 60 matrix (mixed): 1.21 (sec).

# Matrix calculation benchmarks (5 tests):
Creation, transp., deformation of a 5,000 x 5,000 matrix: 0.629 (sec).
2,500 x 2,500 normal distributed random matrix^1,000: 0.149 (sec).
Sorting of 7,000,000 random values: 0.706 (sec).
2,500 x 2,500 cross-product matrix (b = a' * a): 10.8 (sec).
Linear regr. over a 5,000 x 500 matrix (c = a \ b'): 0.837 (sec).

# Matrix function benchmarks (5 tests):
Cholesky decomposition of a 3,000 x 3,000 matrix: 5.47 (sec).
Determinant of a 2,500 x 2,500 random matrix: 6.64 (sec).
Eigenvalues of a 640 x 640 random matrix: 0.916 (sec).
FFT over 2,500,000 random values: 0.322 (sec).
Inverse of a 1,600 x 1,600 random matrix: 5.4 (sec).

upload_results(res)
Creating temporary file
Getting system specs. This can take a while on Macs
Uploading results
Upload complete
Tracking id: 2020-11-29-22273922
[1] "2020-11-29-22273922"

plot(res)
You are ranked 183 out of 749 machines.
Press return to get next plot
You are ranked 467 out of 747 machines.
Press return to get next plot
You are ranked 660 out of 747 machines.

1606725356120.png

1606725379160.png

1606725387146.png



I am really curious how yours will turn out.
 

Attachments

  • 1606725365392.png
    1606725365392.png
    41.8 KB · Views: 217
Here are the results running under the native R

Code:
R version 4.0.3 (2020-10-10) -- "Bunny-Wunnies Freak Out"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: arm-apple-darwin20.2.0 (64-bit)

> res <- benchmarkme::benchmark_std()
# Programming benchmarks (5 tests):
    3,500,000 Fibonacci numbers calculation (vector calc): 0.112 (sec).
    Grand common divisors of 1,000,000 pairs (recursion): 0.26 (sec).
    Creation of a 3,500 x 3,500 Hilbert matrix (matrix calc): 0.146 (sec).
    Creation of a 3,000 x 3,000 Toeplitz matrix (loops): 0.597 (sec).
    Escoufier's method on a 60 x 60 matrix (mixed): 0.598 (sec).
# Matrix calculation benchmarks (5 tests):
    Creation, transp., deformation of a 5,000 x 5,000 matrix: 0.245 (sec).
    2,500 x 2,500 normal distributed random matrix^1,000: 0.113 (sec).
    Sorting of 7,000,000 random values: 0.606 (sec).
    2,500 x 2,500 cross-product matrix (b = a' * a): 9.54 (sec).
    Linear regr. over a 5,000 x 500 matrix (c = a \ b'): 0.799 (sec).
# Matrix function benchmarks (5 tests):
    Cholesky decomposition of a 3,000 x 3,000 matrix: 5.21 (sec).
    Determinant of a 2,500 x 2,500 random matrix: 1.78 (sec).
    Eigenvalues of a 640 x 640 random matrix: 0.476 (sec).
    FFT over 2,500,000 random values: 0.113 (sec).
    Inverse of a 1,600 x 1,600 random matrix: 1.45 (sec).

I have the feeling that matrix operations need a bit more optimisation. I'm sure they can go faster.
 

Attachments

  • m1-native-r.pdf
    246.2 KB · Views: 469
P.S. And here are the results running under Rosetta. It seems that Rosetta has some problems with R interpreter (doesn't surprise me), so if you are relying on a a lot of algorithms implemented in pure R, you might run into problems. I don't think it's a common scenario though.

Another observation is that some matrix operations run faster under Rosetta than native. I would speculate that this is because the Intel codegen emits SIMD code while ARM does not. It is either the problem with the early fortran port or maybe the source is optimised for Intel but not for ARM. I am sure that once this is fixed, we should get a healthy boost running natively as well.

Code:
R version 4.0.3 (2020-10-10) -- "Bunny-Wunnies Freak Out"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin17.0 (64-bit)

benchmarkme::get_cpu()
sysctl: unknown oid 'machdep.cpu.vendor'
$vendor_id
character(0)
attr(,"status")
[1] 1

$model_name
[1] "VirtualApple @ 2.50GHz processor"

$no_of_cores
[1] 8



# Programming benchmarks (5 tests):
    3,500,000 Fibonacci numbers calculation (vector calc): 0.177 (sec).
    Grand common divisors of 1,000,000 pairs (recursion): 4.67 (sec).
    Creation of a 3,500 x 3,500 Hilbert matrix (matrix calc): 0.205 (sec).
    Creation of a 3,000 x 3,000 Toeplitz matrix (loops): 1.21 (sec).
    Escoufier's method on a 60 x 60 matrix (mixed): 66.5 (sec).
# Matrix calculation benchmarks (5 tests):
    Creation, transp., deformation of a 5,000 x 5,000 matrix: 0.416 (sec).
    2,500 x 2,500 normal distributed random matrix^1,000: 0.15 (sec).
    Sorting of 7,000,000 random values: 0.62 (sec).
    2,500 x 2,500 cross-product matrix (b = a' * a): 7.24 (sec).
    Linear regr. over a 5,000 x 500 matrix (c = a \ b'): 0.609 (sec).
# Matrix function benchmarks (5 tests):
    Cholesky decomposition of a 3,000 x 3,000 matrix: 4 (sec).
    Determinant of a 2,500 x 2,500 random matrix: 1.99 (sec).
    Eigenvalues of a 640 x 640 random matrix: 0.446 (sec).
    FFT over 2,500,000 random values: 0.12 (sec).
    Inverse of a 1,600 x 1,600 random matrix: 1.61 (sec).
 
leman, thanks very much for your report.

Are you using R configured to run the reference BLAS/LAPACK linear algebra libraries? Much faster matrix operations are possible by switching to the libraries from Apple's accelerate framework, shipped as part of macOS.

See here: https://mpopov.com/blog/2019/06/04/faster-matrix-math-in-r-on-macos/

It would be very informative to see benchmarks from your M1 machine using the Apple Accelerate BLAS/LAPACK libraries.

Here are the same benchmarks from my computer, a mid-2014 MacBook Pro, using the accelerate libraries. Some of the matrix operations are more than an order of magnitude faster than the results posted above for the M1 or the 2017 MBP.


> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

...

> get_cpu()
$vendor_id
[1] "GenuineIntel"

$model_name
[1] "Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz"

$no_of_cores
[1] 8

> get_ram()
17.2 GB

> res = benchmark_std()
# Programming benchmarks (5 tests):
3,500,000 Fibonacci numbers calculation (vector calc): 0.277 (sec).
Grand common divisors of 1,000,000 pairs (recursion): 0.616 (sec).
Creation of a 3,500 x 3,500 Hilbert matrix (matrix calc): 0.407 (sec).
Creation of a 3,000 x 3,000 Toeplitz matrix (loops): 1.35 (sec).
Escoufier's method on a 60 x 60 matrix (mixed): 0.902 (sec).
# Matrix calculation benchmarks (5 tests):
Creation, transp., deformation of a 5,000 x 5,000 matrix: 0.702 (sec).
2,500 x 2,500 normal distributed random matrix^1,000: 0.157 (sec).
Sorting of 7,000,000 random values: 0.639 (sec).
2,500 x 2,500 cross-product matrix (b = a' * a): 0.188 (sec).
Linear regr. over a 5,000 x 500 matrix (c = a \ b'): 0.0213 (sec).
# Matrix function benchmarks (5 tests):
Cholesky decomposition of a 3,000 x 3,000 matrix: 0.22 (sec).
Determinant of a 2,500 x 2,500 random matrix: 0.164 (sec).
Eigenvalues of a 640 x 640 random matrix: 0.344 (sec).
FFT over 2,500,000 random values: 0.25 (sec).
Inverse of a 1,600 x 1,600 random matrix: 0.318 (sec).
 
  • Like
Reactions: pldelisle
leman, thanks very much for your report.

Are you using R configured to run the reference BLAS/LAPACK linear algebra libraries? Much faster matrix operations are possible by switching to the libraries from Apple's accelerate framework, shipped as part of macOS.

I tried both back then, didn’t get much difference except in some isolated tests. Another thing to consider is that Apple-provided BLAS/LAPACK are outdated and have known issues. The R team actively discourages their use. If you are in business of multiplying very large matrices and you need to do it fast and often, you are better off using a custom package that offloads this task to the GPU (M1 has dedicated support for that). There is also AMX, but that’s a very strange beast, I don’t know how general-purpose it is.
 
  • Like
Reactions: istvan60
Using rosetta on base M1 macbook air, got almost same result with leman's, except it correctly detects the model_name

> get_cpu()
sysctl: unknown oid 'machdep.cpu.vendor'
$vendor_id
character(0)
attr(,"status")
[1] 1

$model_name
[1] "Apple M1"

$no_of_cores
[1] 8

Warning message:
In system("sysctl -n machdep.cpu.vendor", intern = TRUE) :
running command 'sysctl -n machdep.cpu.vendor' had status 1
> get_ram()
8.59 GB
> res = benchmark_std()
# Programming benchmarks (5 tests):
3,500,000 Fibonacci numbers calculation (vector calc): 0.184 (sec).
Grand common divisors of 1,000,000 pairs (recursion): 4.74 (sec).
Creation of a 3,500 x 3,500 Hilbert matrix (matrix calc): 0.222 (sec).
Creation of a 3,000 x 3,000 Toeplitz matrix (loops): 1.26 (sec).
Escoufier's method on a 60 x 60 matrix (mixed): 66.5 (sec).
# Matrix calculation benchmarks (5 tests):
Creation, transp., deformation of a 5,000 x 5,000 matrix: 0.418 (sec).
2,500 x 2,500 normal distributed random matrix^1,000: 0.147 (sec).
Sorting of 7,000,000 random values: 0.625 (sec).
2,500 x 2,500 cross-product matrix (b = a' * a): 7.18 (sec).
Linear regr. over a 5,000 x 500 matrix (c = a \ b'): 0.605 (sec).
# Matrix function benchmarks (5 tests):
Cholesky decomposition of a 3,000 x 3,000 matrix: 3.99 (sec).
Determinant of a 2,500 x 2,500 random matrix: 1.99 (sec).
Eigenvalues of a 640 x 640 random matrix: 0.441 (sec).
FFT over 2,500,000 random values: 0.129 (sec).
Inverse of a 1,600 x 1,600 random matrix: 1.63 (sec).

For the Apple BLAS, I can't find the libBLAS.dylib both in /Library and /System/Library
 
And here is the benchmark running from native R:

> library(benchmarkme)
> get_cpu()
sysctl: unknown oid 'machdep.cpu.vendor'
$vendor_id
character(0)
attr(,"status")
[1] 1

$model_name
[1] "Apple M1"

$no_of_cores
[1] 8

Warning message:
In system("sysctl -n machdep.cpu.vendor", intern = TRUE) :
running command 'sysctl -n machdep.cpu.vendor' had status 1
> get_ram()
8.59 GB
> res = benchmark_std()
# Programming benchmarks (5 tests):
3,500,000 Fibonacci numbers calculation (vector calc): 0.111 (sec).
Grand common divisors of 1,000,000 pairs (recursion): 0.282 (sec).
Creation of a 3,500 x 3,500 Hilbert matrix (matrix calc): 0.162 (sec).
Creation of a 3,000 x 3,000 Toeplitz matrix (loops): 0.583 (sec).
Escoufier's method on a 60 x 60 matrix (mixed): 0.413 (sec).
# Matrix calculation benchmarks (5 tests):
Creation, transp., deformation of a 5,000 x 5,000 matrix: 0.248 (sec).
2,500 x 2,500 normal distributed random matrix^1,000: 0.107 (sec).
Sorting of 7,000,000 random values: 0.607 (sec).
2,500 x 2,500 cross-product matrix (b = a' * a): 0.135 (sec).
Linear regr. over a 5,000 x 500 matrix (c = a \ b'): 0.045 (sec).
# Matrix function benchmarks (5 tests):
Cholesky decomposition of a 3,000 x 3,000 matrix: 0.233 (sec).
Determinant of a 2,500 x 2,500 random matrix: 0.409 (sec).
Eigenvalues of a 640 x 640 random matrix: 0.618 (sec).
FFT over 2,500,000 random values: 0.0953 (sec).
Inverse of a 1,600 x 1,600 random matrix: 0.363 (sec).
> plot(res)
You are ranked 1 out of 93 machines.
Press return to get next plot
You are ranked 1 out of 93 machines.
Press return to get next plot
You are ranked 69 out of 93 machines.
>
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.