Apple Silicon in Sciences

HiddenPaul · Dec 22, 2022

I made this thread because I thought sites regarding Apple Silicon Macs regarding science is lacking and I thought this thread would be a helpful place for any future scientists or current ones to get info regarding Apple Silicon Macs.

Please post only things regarding the usage of Apple Silicon Macs in science. Post articles, github repositories, sites, etc. regarding this topic.

I'm gonna start with this:
Apple Silicon Performance in Scientific Computing

PDF version:

https://arxiv.org/pdf/2211.00720.pdf

theorist9 · Dec 22, 2022

I created two Mathematica benchmarks, and sent them to two posters on another forum who have M1 Max's. These calculate the %difference in wall clock runtime between whatever they're run on and my 2019 i9 iMac (see config details below). They were run using Mathematica 13.0.1 (current version is 13.1).

The tables below show the results from one M1 Max user; the results for the other user didn't differ significantly.

It's been opined that, to the extent Mathematica doesn't do well on AS vs. Intel, it's because AS's math libraries aren't as well optimized as Intel's MKL. Thus my upper table consists entirely of symbolic tasks and, indeed, all of these are faster on AS than my i9 iMac. However, they are not that much faster. You can see the M1 Max averages only 2% to 18% faster for these suites of symbolic computations.

The lower table features a graphing suite, where AS was 21% faster. I also gave it an image-processing task, and it was 46% slower, possibly because it uses numeric computations.

Details:

Symbolic benchmark: Consists of six suites of tests: Three integration suites, a simplify suite, a solve suite, and a miscellaneous suite. There are a total of 58 calculations. On my iMac, this takes 37 min, so an average of ~40s/calculation. It produces a summary table at the end, which shows the percentage difference in run time between my iMac whatever device it's run on. Most of these calculations appear to be are single-core only (Wolfram Kernel shows ~100% CPU in Activity Monitor). However, the last one (polynomial expansion) appears to be multi-core (CPU ~ 500%).

Graphing and image processing benchmark: Consists of five graphs (2D and 3D) and one set of image processing tasks (processing an image taken by JunoCam, which is the public-outreach wide-field visible-light camera on NASA’s Juno Jupiter orbiter). It takes 2 min. on my 2019 i9 iMac. As with the above, it produces a summary table at the end. The four graphing tasks appear to be single-core only (Wolfram Kernel shows ~100% CPU in Activity Monitor). However, the imaging processing task appears to be multi-core (CPU ~ 250% – 400%).

Here's how the percent differences in the summary tables are calculated (ASD = Apple Silicon Device, or whatever computer it's run on):

% difference = (ASD time/(average of ASD time and iMac time) – 1)*100.

Thus if the iMac takes 100 s, and the ASD takes 50 s, the ASD would get a value of –33, meaning the ASD is 33% faster; if the ASD takes 200 s, it would get a value of 33, meaning it is 33% slower. By dividing by the average of the iMac and ASD times, we get the same absolute percentage difference regardless of whether the two-fold difference goes in one direction or the other. For instance, If we instead divided by the iMac time, we'd get 50% faster and 100% slower, respectively, for the above two examples.

I also provide a mean and standard deviation for the percentages from each suite of tests. I decided to average the percentages rather than the times so that all processes within a test suite are weighted equally, i.e., so that processes with long run times don't dominate.

iMac details:
2019 27" iMac (19,1), i9-9900K (8 cores, Coffee Lake, 3.6 GHz/5.0 GHz), 32 GB DDR4-2666 RAM, Radeon Pro 580X (8 GB GDDR5)
Mathematica 13.0.1
MacOS Monterey 12.4

Realityck · Dec 22, 2022

A stack of Apple laptops could work as a powerful supercomputer

Supercomputers are expensive to buy and run, but connecting up lots of consumer Apple computers that contain M1 chips may be able to do the same job for less

www.newscientist.com

Researchers who need powerful computers to run their experiments have been moving away from monolithic supercomputers, which are expensive to build and maintain. Instead, they have often preferred clusters of devices originally designed as graphics cards for desktop computers.

theorist9 · Dec 22, 2022

Realityck said:
A stack of Apple laptops could work as a powerful supercomputer

Supercomputers are expensive to buy and run, but connecting up lots of consumer Apple computers that contain M1 chips may be able to do the same job for less

www.newscientist.com

Do you have a version of that article that's not behind a paywall?

theorist9 · Dec 22, 2022

HiddenPaul said:
I made this thread because I thought sites regarding Apple Silicon Macs regarding science is lacking and I thought this thread would be a helpful place for any future scientists or current ones to get info regarding Apple Silicon Macs.

Please post only things regarding the usage of Apple Silicon Macs in science. Post articles, github repositories, sites, etc. regarding this topic.

I'm gonna start with this:
Apple Silicon Performance in Scientific Computing

PDF version:

https://arxiv.org/pdf/2211.00720.pdf

Suggestion: Instead of just linking articles, it would be useful if posters could summarize their essential content.

Xiao_Xi · Dec 22, 2022

Can anyone explain why the M1 using OpenCL outperforms Nvidia data center GPUs using CUDA? Are those Nvidia GPUs that weak in single precision?

leman · Dec 22, 2022

I am alarmed about how such low quality research is being published and disseminated. Not only are the authors using ancient benchmarks and get basic hardware specs horribly wrong, but their results are entirely nonsensical as well. There is no way in hell that an M1 with its 2.6TFLOPs will outperform an A100 on GEMM or FFT by a factor of over 10.

Please, delete this thread and don’t even discuss this stuff. This is just embarrassing.

richmlow · Dec 22, 2022

leman said:
I am alarmed about how such low quality research is being published and disseminated. Not only are the authors using ancient benchmarks and get basic hardware step horribly wrong, but their results are entirely nonsensical as well. There is no way in hell that an M1 with its 2.6TFLOPs will outperform an A100 on GEMM or FFT by a factor of over 10.

Please, delete this thread and don’t even discuss this stuff. This is just embarrassing.

Indeed. Preprints (including the paper on AS and scientific computing) on arXiv typically have NOT been officially
peer-reviewed and published in reputable academic journals.

As such, I view preprints with a healthy dose of skepticism.

In my discipline (mathematics), I've seen outlandish (and wrong) preprints on arXiv purporting to have "simple" proofs of Fermat's Last Theorem, the Goldbach conjecture, the Collatz conjecture, 4-color Map theorem, etc.

richmlow

Zest28 · Dec 22, 2022

Someone did a test running Linux on a PC with a 12th gen Intel CPU, and it was able to beat a M1 Max in performance with Python.

And I don't see M1 competing against heavy computation that is usually done on cloud services.

iPadified · Dec 23, 2022

leman said:
I am alarmed about how such low quality research is being published and disseminated. Not only are the authors using ancient benchmarks and get basic hardware specs horribly wrong, but their results are entirely nonsensical as well. There is no way in hell that an M1 with its 2.6TFLOPs will outperform an A100 on GEMM or FFT by a factor of over 10.

Please, delete this thread and don’t even discuss this stuff. This is just embarrassing.

Is that your scientifically funded argument? Please find the articles that shows the opposite to the preprint.

leman · Dec 23, 2022

iPadified said:
Is that your scientifically funded argument? Please find the articles that shows the opposite to the preprint.

Are you honestly telling me that the critical thinking ability of an average MR poster is so low that they would eat up any kind of drivel as long as it’s neatly wrapped and made serious-looking? Come on, we both know that you can do better.

Regarding the article. Just look at the graphs. They claim that (base) M1 achieves over 10 TFLOPs in GEMM and 5000 TFLOPs in FFT. That’s a GPU with 2.6 TFLOPs peak theoretical throughput we are talking about. And it’s supposed to be 10 to 1000 times faster than the beefy A100 with its 20TFLOPs? What? And they even report numbers like 3x10^17 FLOPs for M1, what??? You know that the fastest supercomputer in the world is slower than that? And then they go like “oh, it’s shared memory and zero-copy”. Utterly ridiculous.

I have no idea whether the article is supposed to be a joke or the authors were simply incredibly lazy and turned off their brains when writing this. Zero research, zero effort, maximum nonsense. At any rate, this is beyond embarrassing. Especially since they put their names on it.

P.S. By the way, results for Nvidia GPUs are plausible and in line with other benchmarks and tests. The results for Apple Silicon are massively overblown. My completely random guess: the OpenCL kernels bugged out and didn’t even run, hence reported arbitrary execution time numbers. I’ve seen this myself with some older OpenCL benchmarks. Poor quality software plus poor quality writing = recipe for disaster.

iPadified · Dec 23, 2022

leman said:
Are you honestly telling me that the critical thinking ability of an average MR poster is so low that they would eat up any kind of drivel as long as it’s neatly wrapped and made serious-looking? Come on, we both know that you can do better.

Regarding the article. Just look at the graphs. They claim that (base) M1 achieves over 10 TFLOPs in GEMM and 5000 TFLOPs in FFT. That’s a GPU with 2.6 TFLOPs peak theoretical throughput we are talking about. And it’s supposed to be 10 to 1000 times faster than the beefy A100 with its 20TFLOPs? What? And they even report numbers like 3x10^17 FLOPs for M1, what??? You know that the fastest supercomputer in the world is slower than that? And then they go like “oh, it’s shared memory and zero-copy”. Utterly ridiculous.

I have no idea whether the article is supposed to be a joke or the authors were simply incredibly lazy and turned off their brains when writing this. Zero research, zero effort, maximum nonsense. At any rate, this is beyond embarrassing. Especially since they put their names on it.

P.S. By the way, results for Nvidia GPUs are plausible and in line with other benchmarks and tests. The results for Apple Silicon are massively overblown. My completely random guess: the OpenCL kernels bugged out and didn’t even run, hence reported arbitrary execution time numbers. I’ve seen this myself with some older OpenCL benchmarks. Poor quality software plus poor quality writing = recipe for disaster.

I agree the paper is of poor quality and results are questionable. That is beside the point. The scientific approach would be to show data that refutes their results and preferably explain their results such as a bug.

Assuming something is wrong because it goes against the knowledge or politics (of the time) is a very dangerous path to take. A recent example is global warming and a few hundred years ago you literally lost your head if you claimed the world is round. There are plenty of such examples in the literature.

leman · Dec 23, 2022

iPadified said:
I agree the paper is of poor quality and results are questionable. That is beside the point. The scientific approach would be to show data that refutes their results and preferably explain their results such as a bug.

Not my job. I have demonstrated that their results are nonsensical and that’s the end of it as far as I’m concerned. What is scientific about overanalyzing nonsensical result? I’m not their supervisor nor their teacher. If someone from my group would dare publish this kind of stuff they‘d probably be fired in an instant.

iPadified said:
Assuming something is wrong because it goes against the knowledge or politics (of the time) is a very dangerous path to take. A recent example is global warming and a few hundred years ago you literally lost your head if you claimed the world is round. There are plenty of such examples in the literature.

Isn’t that a bit of a straw-man argument though? I fully agree with what you say here, but, as old Soviet joke goes, “sometimes a banana is just a banana”. I am not attacking this “paper” because it claims that M1 is faster than Nvidia, I am attacking it because it claims that M1 is many orders of magnitude faster than it’s maximal theoretical performance (which I have personally measured and confirmed), in addition to messing up basic things like the GPU specs.

Lihp8270 · Dec 23, 2022

Zest28 said:
Someone did a test running Linux on a PC with a 12th gen Intel CPU, and it was able to beat a M1 Max in performance with Python.

And I don't see M1 competing against heavy computation that is usually done on cloud services.

Low power mobile chip loses against desktop cpu.

Low power mobile chip can’t compete against cloud computing.

Neither of these should be a surprise.

iPadified · Dec 23, 2022

leman said:
Not my job. I have demonstrated that their results are nonsensical and that’s the end of it as far as I’m concerned. What is scientific about overanalyzing nonsensical result? I’m not their supervisor nor their teacher. If someone from my group would dare publish this kind of stuff they‘d probably be fired in an instant.

Isn’t that a bit of a straw-man argument though? I fully agree with what you say here, but, as old Soviet joke goes, “sometimes a banana is just a banana”. I am not attacking this “paper” because it claims that M1 is faster than Nvidia, I am attacking it because it claims that M1 is many orders of magnitude faster than it’s maximal theoretical performance (which I have personally measured and confirmed), in addition to messing up basic things like the GPU specs.

You asked to delete the thread because the paper went agains your opinion or experience. By all means tear it to pieces but the thread should not be deleted. By the way there might be other papers about ASi and scientific computing that might be interesting. Nice counterpoint to video editing and 3D GPU rendering.

In my group it would not been sent in before we could explained what was going on as the results are too good to be true. First question I would ask if the axes really are correct. Even linear scale would be too good to be true but more plausible.

leman · Dec 23, 2022

iPadified said:
You asked to delete the thread because the paper went agains your opinion or experience.

I asked to delete this thread because threads like these will be used to make fun of Mac users. Do we really need another discussion to fuel the "clueless Mac user" meme?

iPadified said:
In my group it would not been sent in before we could explained what was going on as the results are too good to be true. First question I would ask if the axes really are correct. Even linear scale would be too good to be true but more plausible.

Exactly! If your benchmark tells you that a passively cooled laptop is ten times faster than the fastest supercomputer in the world, it's time to put the keyboard away and look very very hard at your methodology. And not proceed to write these numbers in your paper and call them "impressive, but significantly larger than plausible". What were they even thinking?

Philip Turner · Dec 23, 2022

I've been benchmarking the M1 GPU family in scientific computational chemistry software, and it's very bad. An Nvidia 2080 has 10 TFLOPS and 400 GB/s, same as the M1 Max. Yet on OpenMM, the 2080 runs 200 ns/day and the M1 Max runs 50 ns/day. The same issue is happening with GROMACS and other software in that field. 4x performance drop with the same theoretical power.

I have to reverse-engineer the M1 GPU architecture just to start debugging these performance issues. However, with enough work an M1 GPU could at least have equal performance to an equivalent Nvidia GPU. A bigger issue is that many high-performance computing frameworks use CUDA, HIP, and sometimes SYCL, but never OpenCL. On top of that, Apple's OpenCL driver is severely under optimized because they want you using Metal. The hardware also doesn't support 64-bit atomics, although I'm working on an emulation library to fix that. Hopefully it has performance competitive to CPU FP64.

MayaUser · Dec 23, 2022

I think miss-leading topics should be erased because some people read this and buy because of this

Realityck · Dec 23, 2022

Zest28 said:
Someone did a test running Linux on a PC with a 12th gen Intel CPU, and it was able to beat a M1 Max in performance with Python.

And I don't see M1 competing against heavy computation that is usually done on cloud services.

Few years back I remember people stopped using python on Intel platform because of some issues. If you wanted to code something to run the fastest speeds would you dare use python which is a high level interpreter language on a ARM platform versus intel platform in the first place? There are lots of speed examples that would be considered more reliable than running two interpreter results on different processor platforms.

Xiao_Xi · Dec 23, 2022

MayaUser said:
I think miss-leading topics should be erased because some people read this and buy because of this

I would keep the thread as criticism of the paper has value. Anyone can read the paper, so reasonable criticism can help people understand the paper better and decide accordingly.

Anyway, I doubt anyone would buy hardware that can't do double precision to do scientific calculations.

jdb8167 · Dec 23, 2022

Realityck said:
Few years back I remember people stopped using python on Intel platform because of some issues. If you wanted to code something to run the fastest speeds would you dare use python which is a high level interpreter language on a ARM platform versus intel platform in the first place? There are lots of speed examples that would be considered more reliable than running two interpreter results on different processor platforms.

Python is fine as a way of running compute on a GPU. The Python code is slow but that speed is lost in the overall time it takes to do the computation on the GPU.

leman · Dec 23, 2022

Philip Turner said:
I've been benchmarking the M1 GPU family in scientific computational chemistry software, and it's very bad. An Nvidia 2080 has 10 TFLOPS and 400 GB/s, same as the M1 Max. Yet on OpenMM, the 2080 runs 200 ns/day and the M1 Max runs 50 ns/day. The same issue is happening with GROMACS and other software in that field. 4x performance drop with the same theoretical power.

I have to reverse-engineer the M1 GPU architecture just to start debugging these performance issues. However, with enough work an M1 GPU could at least have equal performance to an equivalent Nvidia GPU. A bigger issue is that many high-performance computing frameworks use CUDA, HIP, and sometimes SYCL, but never OpenCL. On top of that, Apple's OpenCL driver is severely under optimized because they want you using Metal. The hardware also doesn't support 64-bit atomics, although I'm working on an emulation library to fix that. Hopefully it has performance competitive to CPU FP64.

Hi Philipp, nice to see you here! I’ve been reading your posts on the Apple forums for a while. Your work is very impressive!

If you have CUDA kernels, the best path is probably using preprocessor macros and C++ features of Metal to run original CUDA code directly on Metal. That’s what Apple does with Blender from what I understand. Regarding atomic, the documents say that 64-bit int atomic are supported, but I haven’t tried it out. Are you saying it doesn’t work?

Xiao_Xi said:
Anyway, I doubt anyone would buy hardware that can't do double precision to do scientific calculations.

Depending what scientific calculations. If you need more precision, software solutions work and are fairly fast. It’s not like consumer GPUs are good at double precision anyway.

jeanlain · Dec 23, 2022

Apple makes exactly zero effort toward science. In fact, there isn't even a "scientific" category of App Store apps. You have to choose between "education" and "health".

Who in their right mind would buy a Mac to develop scientific apps? I mean, beside me? 😅 (I develop a toy scientific app as a hobby, but I won't use a Mac for any kind of serious analysis).

theorist9 · Dec 23, 2022

jeanlain said:
Apple makes exactly zero effort toward science. In fact, there isn't even a "scientific" category of App Store apps. You have to choose between "education" and "health".

Who in their right mind would buy a Mac to develop scientific apps? I mean, beside me? 😅 (I develop a toy scientific app as a hobby, but I won't use a Mac for any kind of serious analysis).

I did development work in computational biophysics on my Mac, and the Mac was bought with that in mind, because I needed both a native Unix interface and MS Office (the Mac is the only system on which you can get both). When the program was finished, I used our university's computer clusters to scale it up. But even after that I continued to use the Mac for exploratory work on new classes of organisms.

Plus I wanted a sophisticated GUI for when I wasn't in Terminal—in particular, I wanted powerful windows management so I could easily move among multiple open windows in multiple programs—and at the time MacOS far outstripped both Linux and Windows in that regard (and still does).

And my dual-PPC G5 was super-quiet and completely reliable. Its one downside is that, for the same money, I could have gotten an x86 box that was faster. But having my runs finish faster was less important than having a nice system to interact with.

jeanlain · Dec 23, 2022

theorist9 said:
I did development work in computational biophysics on my Mac, and the Mac was bought with that in mind, because I needed both a native Unix interface and MS Office

So the advantage of the Mac over a linux/unix workstation was Office, not the hardware.

Apple Silicon in Sciences

macrumors newbie

macrumors 601

macrumors G5

macrumors 601

macrumors 601

macrumors 68000

macrumors Core

macrumors 6502

macrumors 68030

macrumors 68020

macrumors Core

macrumors 68020

macrumors Core

macrumors 65816

macrumors 68020

macrumors Core

macrumors regular

macrumors 68040

macrumors G5

macrumors 68000

macrumors 601

macrumors Core

macrumors 68020

macrumors 601

macrumors 68020

Our Staff