Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

project_2501

macrumors 6502a
Original poster
Jul 1, 2017
676
792
I'm interested in how the M1 performs on data science workloads, specifically the Python ecosystem of numerical computing.

Currently Intel provide libraries like the Intel MKL which help software like Python take advantage of Intel CPU support for things like matrix multiplication, FFTs, neural networks, etc.

Is there something like this for Apple M1 that open source software like Python can take advantage of?

How does Python numpy/scipy perform on M1?
 

Gnattu

macrumors 65816
Sep 18, 2020
1,106
1,668

falainber

macrumors 68040
Mar 16, 2016
3,539
4,136
Wild West
Thanks - that is a useful page.

I wonder if Python will take advantage of Apple's Accelerate?
Why would anyone want to use Macs for data science? College project? Maybe. Real work? No way. The ecosystem is too limited. When your best hardware option is a laptop (or even a desktop at some point) it's just too limiting.
 

theorist9

macrumors 68040
May 28, 2015
3,880
3,060
If you're interested in the deep learning subset of data science, you might find this thread interesting:
 

project_2501

macrumors 6502a
Original poster
Jul 1, 2017
676
792
If you're interested in the deep learning subset of data science, you might find this thread interesting:
thanks - that thread is very interesting
 

leman

macrumors Core
Oct 14, 2008
19,521
19,675
I'm interested in how the M1 performs on data science workloads, specifically the Python ecosystem of numerical computing.

I don’t know about Python, but M1 is absolutely ridiculous in R and Stan. Also, if you work with matrices and your software takes advantage of Accelerate, you get the benefit of Apples dedicated matrix units.

Why would anyone want to use Macs for data science? College project? Maybe. Real work? No way. The ecosystem is too limited. When your best hardware option is a laptop (or even a desktop at some point) it's just too limiting.

Because they are the fastest portable hardware for this type of workload around? And what do you mean „ecosystem too limited“? You use the laptop for development and prototyping, the real work happens on a supercomputer running Linux. Also depends on the scale of your data. Not everyone doing data science works with TBs of data. Our datasets are much smaller and using a laptop to process it is very feasible. Especially if that laptop is as fast as a large desktop workstation.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,675
And herein lies the problem: The supercomputer most likely runs CUDA, the Mac doesn‘t

Why is this a problem? For most tasks, the API will choose the appropriate backend. I mean, we have folks prototyping with PyTorch and Tensorflow on their Macs and then deploying to the cluster — the code uses the CPU on the local machine and CUDA on the cluster.

Of course, if you rely on low-level programing via CUDA directly, then sure, Mac is probably not the best platform.
 

Romain_H

macrumors 6502a
Sep 20, 2021
520
438
Why is this a problem? For most tasks, the API will choose the appropriate backend. I mean, we have folks prototyping with PyTorch and Tensorflow on their Macs and then deploying to the cluster — the code uses the CPU on the local machine and CUDA on the cluster.

Of course, if you rely on low-level programing via CUDA directly, then sure, Mac is probably not the best platform.
Indeed. In my case… no luck. Still probably porting to metal, since overall the dev experience is superior, so research and develpment may proceed faster. Once the algo is stable I may have to port back to CUDA
 

leman

macrumors Core
Oct 14, 2008
19,521
19,675
Indeed. In my case… no luck. Still probably porting to metal, since overall the dev experience is superior, so research and develpment may proceed faster. Once the algo is stable I may have to port back to CUDA

CUDA and MSL are similar enough that using some macros and strategic planning might allow you to use the same kernel code for both. BTW that’s how Apple is porting Blender Cycles to Metal.
 

Romain_H

macrumors 6502a
Sep 20, 2021
520
438
CUDA and MSL are similar enough that using some macros and strategic planning might allow you to use the same kernel code for both. BTW that’s how Apple is porting Blender Cycles to Metal.
Well, its not that easy. Plus its not only the GPGPU code; there‘s quite a bit of CPU code around… hitherto I used Qt for that, but I am not sure if I continue that path. Embedding CUDA / Metal in Qt projects is not necessarily straightforward
 

leman

macrumors Core
Oct 14, 2008
19,521
19,675
Well, its not that easy. Plus its not only the GPGPU code; there‘s quite a bit of CPU code around… hitherto I used Qt for that, but I am not sure if I continue that path. Embedding CUDA / Metal in Qt projects is not necessarily straightforward

No, it’s definitely not. Although I must admire Nvidia’s evil marketing genius a bit - by making CUDA so “easy” to use and locking people into NVCC mixed code paradigm, they made properly disentangling CPU/GPU code very painful, which locks people even more into their platform.
 

falainber

macrumors 68040
Mar 16, 2016
3,539
4,136
Wild West
I don’t know about Python, but M1 is absolutely ridiculous in R and Stan. Also, if you work with matrices and your software takes advantage of Accelerate, you get the benefit of Apples dedicated matrix units.



Because they are the fastest portable hardware for this type of workload around? And what do you mean „ecosystem too limited“? You use the laptop for development and prototyping, the real work happens on a supercomputer running Linux. Also depends on the scale of your data. Not everyone doing data science works with TBs of data. Our datasets are much smaller and using a laptop to process it is very feasible. Especially if that laptop is as fast as a large desktop workstation.
You missed the part where supercomputer running Linux (and all sorts of workstations, servers, server farms, cloud services etc.) are x86 based. Developing software for them on machines with different architecture is amateurish at best (if possible at all)
 

falainber

macrumors 68040
Mar 16, 2016
3,539
4,136
Wild West
CUDA and MSL are similar enough that using some macros and strategic planning might allow you to use the same kernel code for both. BTW that’s how Apple is porting Blender Cycles to Metal.
And why would anyone want to go through all these hassles in a first place? There are tons of hardware options with x86 and CUDA available with stable software stacks, excellent developer tools for this hardware and vast developer communities. Compare this to one forum thread on MR for M1 based Macs and the choice must be clear to anyone.
 

jerryk

macrumors 604
Nov 3, 2011
7,421
4,208
SF Bay Area
And why would anyone want to go through all these hassles in a first place? There are tons of hardware options with x86 and CUDA available with stable software stacks, excellent developer tools for this hardware and vast developer communities. Compare this to one forum thread on MR for M1 based Macs and the choice must be clear to anyone.
That is the conclusion I have come to. I do most of my ML work on a Windows deskside machine with Nvidia RTX GPUs. I can load the CUDA toolkit and frameworks like TensorFlow or PyTorch and just go. If I get stuck on some issue there are a lot of other people running the same SW and Hardware stack. And when I finish doing basic model configuration on my desktop, I can push to a cloud environment as required with minimal changes.

With this said, I am finding I don't use the desktop machine as much as I once did. I now do a lot of preliminary work in Colab in the cloud. It's free even with GPU support. And I can design and train models anywhere that I have internet access.
 
Last edited:

leman

macrumors Core
Oct 14, 2008
19,521
19,675
You missed the part where supercomputer running Linux (and all sorts of workstations, servers, server farms, cloud services etc.) are x86 based. Developing software for them on machines with different architecture is amateurish at best (if possible at all)

Why would you say that? Developing software that ships on a different architecture is actually a standard situation. Especially if you are talking about something as implementation-dependent as data science libraries or GPGPU. There are only few relevant architectural differences between x86-64 and Aarch64, which are all very well documented and can be easily taken care of with some basic planning. Not that it matters for most people doing data science as they are going to use abstractions provided by high-level languages and APIs in the first place.

And sure, you might call it amateurish, but that's how stuff works in real life. All software I wrote in the last ten years or so (using x86 as my dev platform) compiles and works without fail for x86-32, x86-64 and Aarch64.

And why would anyone want to go through all these hassles in a first place? There are tons of hardware options with x86 and CUDA available with stable software stacks, excellent developer tools for this hardware and vast developer communities. Compare this to one forum thread on MR for M1 based Macs and the choice must be clear to anyone.

I never claimed that one would. I certainly would not. If my job were to develop CUDA software (and I hope I will never get there), I will get myself a laptop with a Nvidia GPU. I was merely commenting on a specific post.

Of course, since I don't work with CUDA and none of the tools I use rely on Nvidia's tech, M1 Macs is pretty much the best hardware platform on the market for me right now. Extremely portable with excellent battery life, unmatched usability and performance that rivals large desktop workstations (in workflows that I care about) makes it — as you say — a choice clear to anyone. I mean, why would I choose an x86 platform that ends up being 30-40% slower for my work and has half usable battery life?
 

falainber

macrumors 68040
Mar 16, 2016
3,539
4,136
Wild West
Why would you say that? Developing software that ships on a different architecture is actually a standard situation. Especially if you are talking about something as implementation-dependent as data science libraries or GPGPU. There are only few relevant architectural differences between x86-64 and Aarch64, which are all very well documented and can be easily taken care of with some basic planning. Not that it matters for most people doing data science as they are going to use abstractions provided by high-level languages and APIs in the first place.

And sure, you might call it amateurish, but that's how stuff works in real life. All software I wrote in the last ten years or so (using x86 as my dev platform) compiles and works without fail for x86-32, x86-64 and Aarch64.



I never claimed that one would. I certainly would not. If my job were to develop CUDA software (and I hope I will never get there), I will get myself a laptop with a Nvidia GPU. I was merely commenting on a specific post.

Of course, since I don't work with CUDA and none of the tools I use rely on Nvidia's tech, M1 Macs is pretty much the best hardware platform on the market for me right now. Extremely portable with excellent battery life, unmatched usability and performance that rivals large desktop workstations (in workflows that I care about) makes it — as you say — a choice clear to anyone. I mean, why would I choose an x86 platform that ends up being 30-40% slower for my work and has half usable battery life?
Developing software that ships on a different architecture is actually a standard situation.
No it is not. It is only typical for development of software for devices that can't be used themselves for software development (like smartphones)
 

leman

macrumors Core
Oct 14, 2008
19,521
19,675
Developing software that ships on a different architecture is actually a standard situation.
No it is not. It is only typical for development of software for devices that can't be used themselves for software development (like smartphones)

Well, duh. And that's exactly why it's a standard situation. Much of the software developed in the last years was for smartphones.

Anyway, are you developing your software on the supercomputer directly? Or are you developing it on a local workstation that uses different CPU and OS? How do you think software for all these PowerPC and ARM supercomputers is developed?
 
  • Like
Reactions: Basic75

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,126
2,706
If I get stuck on some issue there are a lot of other people running the same SW and Hardware stack. And when I finish doing basic model configuration on my desktop, I can push to a cloud environment as required with minimal changes.
Two interesting things here:

1. It's not just people using the same software/hardware which can help. I do research, so not shipping products and I regularly have to check what other research groups in the world do. So checking out a repo and build it to run it two minutes later is a huge factor. On macOS I have to fiddle around to get things going and it's not a small task.

2. Pushing off to cloud/clusters, Apple doesn't scale (yet?). I can easily deploy to 500 or 1000 GPUs and more. Not only can I deploy, but I can easily buy these systems and have them ready to go in no time. And then there's the whole Nvidia software stack, no matter if I want to do autonomous driving, robotics, physical sensor development, medicine, genetics, biology, whatever... Nvidia has a tool for everything that makes life so much easier and saves a ton of time (and I'm not talking only a few days here).

Nvidia isn't different than Apple, both are trying to lock you into their eco system and keep you there. And while Apple is leading this for video/photo/music work, anything for gaming, science and simulation is Nvidias turf. Apple could hire 50000 world leading engineers and wouldn't get close in the next decade. That ship is sailed.
 

mi7chy

macrumors G4
Oct 24, 2014
10,622
11,294
Cross platform software repository support is still iffy on M1. Had hashcat installed via Homebrew and although it was older version 6.1.1 vs current 6.2.4 it was working. Found out Homebrew finally updated it to 6.2.4 but after updating it no longer runs with "no devices found/left".

Update: Successfully downgraded to 6.1.1 and semi-working again using comment suggestions in this link.

https://dae.me/blog/2516/downgrade-any-homebrew-package-easily/

Now getting "clCreateKernel(): CL_INVALID_KERNEL" after hashmode 1800 doing benchmark so probably need to downgrade from Monterey to Big Sur.

Update 2: hashcat 6.1.1 fully working with Big Sur 11.6.1. Hopefully this unbreaks other OpenCL apps.
 
Last edited:

senttoschool

macrumors 68030
Nov 2, 2017
2,626
5,482
Why would anyone want to use Macs for data science? College project? Maybe. Real work? No way. The ecosystem is too limited. When your best hardware option is a laptop (or even a desktop at some point) it's just too limiting.
From my experience, Windows sucks for a lot of development because a lot of libraries and packages assume you're running some kind of Unix terminal, which works perfectly fine on Linux and macOS. Yes, there's WSL but it's a pain to setup and not "native".

Then there's Linux which sucks as a general operating system.

For me, the only choice for development is macOS.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.