Apple Silicon deep learning performance

Appletoni · Mar 12, 2022

gl3lan said:
In case anyone is interested, in ran a fairly simple MNIST benchmark (proposed here : https://github.com/apple/tensorflow_macos/issues/25) on my recently acquired M1 Pro MBP (16-core GPU, 16GB RAM). I installed Tensorflow using the following guide (https://developer.apple.com/metal/tensorflow-plugin/).

For reference, this benchmark seems to run at around 24ms/step on M1 GPU.

On the M1 Pro, the benchmark runs at between 11 and 12ms/step (twice the TFLOPs, twice as fast as an M1 chip).

The same benchmark run on an RTX-2080 (fp32 13.5 TFLOPS) gives 6ms/step and 8ms/step when run on a GeForce GTX Titan X (fp32 6.7 TFLOPs). A similar level of performance should be also expected on the M1 Max GPU (which should run twice as fast as the M1 Pro).

Of course, this benchmark runs a fairly simple CNN model but it already gives an idea. Keep also in mind that RTX generation cards are able to run faster at fp16 precision, I am not sure it would apply to Apple Silicon.

I would be happy to run any other benchmark if suggested (or help someone to run the benchmark on a M1 Max chip), even if I am more of a PyTorch guy. ;-)

[edit] Makes me wonder whether I should have gone for the M1 Max chip... probably not.

Apple Silicon deep learning performance is terrible.

Take a look at KatoGo benchmarks and LC0 benchmarks.

Xiao_Xi · Mar 12, 2022

Appletoni said:
Apple Silicon deep learning performance is terrible

Honestly, the deep learning training in Apple Silicon remains unreliable, but the inference (CoreML) seems to be surprisingly good.

Results
YOLOv5 ? v6.1-25-gcaf7ad0 torch 1.11.0 CPU

YOLOv5s inference time
640x640 image bs1
PyTorch 1.11.0 CPU 344 ms
CoreML 5.2.0 27 ms

GPU acceleration for Apple's M1 chip? · Issue #47702 · pytorch/pytorch

🚀 Feature Hi, I was wondering if we could evaluate PyTorch's performance on Apple's new M1 chip. I'm also wondering how we could possibly optimize Pytorch's capabilities on M1 GPUs/neural engines. ...

github.com

buckwheet · Apr 10, 2022

I'm really hoping Apple announces some significant move forward with ML at WWDC. Things have been weirdly quiet from them on the ML software support front. The Mac Studio machines could be great for local ML work, if optimized for the job, and Swift has been differentiable for a while but not much gets said about it... (I mean, I realize it's already there for us to use, but it seems like the kind of thing Apple could wrap into some more dev-friendly, higher-level API, to me—e.g., tools for doing RL-like tasks with differential programming in pure Swift).

So there seem to be plenty of reasons for excitement, but nothing is really coalescing into useful tools or exciting announcements... weird... Maybe this year? I expected a lot more last year, but maybe they were waiting to have more of their own silicon out before committing heavily?

Of course, I do get that between Nvidia+CUDA and Google+TPU the market is pretty much cornered for enterprise ML stuff, but I do think there's still room for providing better support for end-users to train and test/develop on their local machines. Fingers crossed on WWDC to announce something worth getting excited about... ?

TiggrToo · Apr 10, 2022

buckwheet said:
I'm really hoping Apple announces some significant move forward with ML at WWDC. Things have been weirdly quiet from them on the ML software support front. The Mac Studio machines could be great for local ML work, if optimized for the job, and Swift has been differentiable for a while but not much gets said about it... (I mean, I realize it's already there for us to use, but it seems like the kind of thing Apple could wrap into some more dev-friendly, higher-level API, to me—e.g., tools for doing RL-like tasks with differential programming in pure Swift).

So there seem to be plenty of reasons for excitement, but nothing is really coalescing into useful tools or exciting announcements... weird... Maybe this year? I expected a lot more last year, but maybe they were waiting to have more of their own silicon out before committing heavily?

Of course, I do get that between Nvidia+CUDA and Google+TPU the market is pretty much cornered for enterprise ML stuff, but I do think there's still room for providing better support for end-users to train and test/develop on their local machines. Fingers crossed on WWDC to announce something worth getting excited about... ?

Just as an FYI: https://developer.apple.com/forums/thread/700083

jerryk · Apr 10, 2022

buckwheet said:
I'm really hoping Apple announces some significant move forward with ML at WWDC. Things have been weirdly quiet from them on the ML software support front. The Mac Studio machines could be great for local ML work, if optimized for the job, and Swift has been differentiable for a while but not much gets said about it... (I mean, I realize it's already there for us to use, but it seems like the kind of thing Apple could wrap into some more dev-friendly, higher-level API, to me—e.g., tools for doing RL-like tasks with differential programming in pure Swift).

So there seem to be plenty of reasons for excitement, but nothing is really coalescing into useful tools or exciting announcements... weird... Maybe this year? I expected a lot more last year, but maybe they were waiting to have more of their own silicon out before committing heavily?

Of course, I do get that between Nvidia+CUDA and Google+TPU the market is pretty much cornered for enterprise ML stuff, but I do think there's still room for providing better support for end-users to train and test/develop on their local machines. Fingers crossed on WWDC to announce something worth getting excited about... ?

It would be nice if Apple played more in this space, but the Nvidia+CUDA is so common with the major frameworks, Tensorflow, Pytorch, etc. having support for it. And play well with the Nvidia 3070s in my deskside machine at a relatively low cost.

Xiao_Xi · Apr 11, 2022

buckwheet said:
Swift has been differentiable for a while but not much gets said about it

What could Apple do that Google hasn't tried? Although Julia and Swift are better equipped for ML and RL than Python, all the major libraries in ML and RL use Python, so I doubt anything will change soon.

buckwheet said:
I do get that between Nvidia+CUDA and Google+TPU the market is pretty much cornered for enterprise ML stuff, but I do think there's still room for providing better support for end-users to train and test/develop on their local machines.

Unless Apple offers a solution for personal computers and servers, who is going to learn a solution that is not scalable?

buckwheet · Apr 11, 2022

Xiao_Xi said:
What could Apple do that Google hasn't tried? Although Julia and Swift are better equipped for ML and RL than Python, all the major libraries in ML and RL use Python, so I doubt anything will change soon.

Unless Apple offers a solution for personal computers and servers, who is going to learn a solution that is not scalable?

Well, broadly speaking, the "what could Apple do that Google hasn't tried" philosophy is pretty much a non-starter for tech, so I'll let that question go as a non sequitur. I mean, presumably Google is still willing to try things that Google hasn't tried... ?? But, in my understanding, differentiable programming brings online learning to the table, which offers a lot of potential—different applications, perhaps, but lots of potential.

On the second question; anything that can export a graph to ONNX (for onnxruntime) is pretty much scaleable, no?

buckwheet · Apr 11, 2022

jerryk said:
It would be nice if Apple played more in this space, but the Nvidia+CUDA is so common with the major frameworks, Tensorflow, Pytorch, etc. having support for it. And play well with the Nvidia 3070s in my deskside machine at a relatively low cost.

Yeah, I run a linux box with a 2070 in it... can't really afford a 30x0 at the moment, and can rarely find one to buy anyway! ?

PS - In case anyone suspects I'm going to engage in some idiotic platform war here, I'm not. Nvidia+CUDA is obviously a no-brainer. I just think it makes sense for Apple to leverage the horsepower of their new machines for this purpose, and I'd love to have a machine that could tackle big music projects and train ML models the rest of the time.

Xiao_Xi · Apr 11, 2022

buckwheet said:
differentiable programming brings online learning to the table, which offers a lot of potential—different applications, perhaps, but lots of potential.

Google also thought Swift could be a great programming language for ML and created Swift for Tensorflow. They dropped it because everyone in the ML world uses Python.

GitHub - tensorflow/swift: Swift for TensorFlow

Swift for TensorFlow. Contribute to tensorflow/swift development by creating an account on GitHub.

www.tensorflow.org

buckwheet · Apr 12, 2022

Xiao_Xi said:
Google also thought Swift could be a great programming language for ML and created Swift for Tensorflow. They dropped it because everyone in the ML world uses Python.

GitHub - tensorflow/swift: Swift for TensorFlow

Swift for TensorFlow. Contribute to tensorflow/swift development by creating an account on GitHub.

www.tensorflow.org

I'm very much aware of S4TF.
I've heard that Chris Lattner's underlying objective was to get first-class support for differentiable programming in Swift and that S4TF was a good way to do that. If that's true, then mission accomplished. If it isn't true, Swift got differentiable programming anyway. But this isn't about Python vs Swift, so I'm not sure why you bring it up. Apple could just as well announce a partnership with PyTorch at WWDC to give us up-to-the-minute Metal support for all new releases. That would keep us using Python, but it would still be super cool. And changing .device("cuda") to .device("metal") would be a pretty scalable way to work on a Mac Studio, no? That wouldn't offend any Nvidia+CUDA sensibilities, would it?

Xiao_Xi · Apr 12, 2022

I seem to have misunderstood you. I thought you wanted Apple to promote Swift in the ML world when you wrote:

buckwheet said:
Apple could wrap into some more dev-friendly, higher-level API, to me—e.g., tools for doing RL-like tasks with differential programming in pure Swift

buckwheet said:
Apple could just as well announce a partnership with PyTorch at WWDC to give us up-to-the-minute Metal support for all new releases

That would be very cool!

Boil · Apr 12, 2022

Maybe WWDC will see that second 16-core Neural Engine cluster in the M1 Max SoC unlocked...?

Pressure · Apr 13, 2022

Boil said:
Maybe WWDC will see that second 16-core Neural Engine cluster in the M1 Max SoC unlocked...?

It's probably just for redundancy or a way to magically increase performance in the next generation of SoCs.

Xiao_Xi · May 18, 2022

buckwheet said:
Apple could just as well announce a partnership with PyTorch at WWDC to give us up-to-the-minute Metal support for all new releases.

You may be right.

Introducing Accelerated PyTorch Training on Mac – PyTorch

pytorch.org

buckwheet · May 18, 2022

Xiao_Xi said:
You may be right.

Introducing Accelerated PyTorch Training on Mac – PyTorch

pytorch.org

Haha! Yup, I just downloaded and gave it a quick spin. My 16" M1 Pro MBP, base GPU, just beat my RTX-2070 on a simple MNIST test (3.2s vs 5.7s per epoch). Of course, the 2070 is a mid-level, previous gen GPU, but I'm still pleasantly surprised. I'd imagine I'll see different results on different tests, of course. Still, it means I now have another option for running jobs while the 2070 is busy. Super cool.

I'm looking forward to seeing some user benchmarks of M1 Ultras against higher-end cards like the 3090. It might make a Mac Studio look a bit more interesting if performance is good (as I mentioned elsewhere, it would be great to have a music production machine that could run ML jobs as well).
Of course, the 4000 series are just around the corner, and if they manage to keep prices in reason, and availability isn't a total fiasco, then the Mac Studio might lose some of its appeal again... heh...

PS — As I hoped, switching platforms is a simple as swapping "cuda" for "mps". Perfect.

Xiao_Xi · May 19, 2022

Could someone run some of the benchmark tests that Apple ran for the Pytorch blog post?

Tested with macOS Monterey 12.3, prerelease PyTorch 1.12, ResNet50 (batch size=128), HuggingFace BERT (batch size=64), and VGG16 (batch size=64)

Boomhowler · May 19, 2022

buckwheet said:
Haha! Yup, I just downloaded and gave it a quick spin. My 16" M1 Pro MBP, base GPU, just beat my RTX-2070 on a simple MNIST test (3.2s vs 5.7s per epoch). Of course, the 2070 is a mid-level, previous gen GPU, but I'm still pleasantly surprised. I'd imagine I'll see different results on different tests, of course. Still, it means I now have another option for running jobs while the 2070 is busy. Super cool.

I'm looking forward to seeing some user benchmarks of M1 Ultras against higher-end cards like the 3090. It might make a Mac Studio look a bit more interesting if performance is good (as I mentioned elsewhere, it would be great to have a music production machine that could run ML jobs as well).
Of course, the 4000 series are just around the corner, and if they manage to keep prices in reason, and availability isn't a total fiasco, then the Mac Studio might lose some of its appeal again... heh...

PS — As I hoped, switching platforms is a simple as swapping "cuda" for "mps". Perfect.

Did you find the exact benchmarks that they used in the blogpost or did you create something yourself?

Luca1995it · May 19, 2022

I've started benchmarking the M1 Max with PyTorch here: https://github.com/lucadiliello/pytorch-apple-silicon-benchmarks

senttoschool · May 19, 2022

Luca1995it said:
I've started benchmarking the M1 Max with PyTorch here: https://github.com/lucadiliello/pytorch-apple-silicon-benchmarks

Does increasing the memory size requirement of the project make the M series look better?

V100 only has 16GB of VRam. But Apple Silicon can go up to 128GB of VRam at a relatively cheap price. Perhaps this is where it can shine today?

For example, V100 16GB retailed for ~$10,000 when it first launched. An M1 Ultra with 64-core GPU and 128GB of RAM is $5800.

Xiao_Xi · May 20, 2022

senttoschool said:
For example, V100 16GB retailed for ~$10,000 when it first launched. An M1 Ultra with 64-core GPU and 128GB of RAM is $5800.

V100 is the previous generation of Nvidia GPUs. Wouldn't it be a fairer comparison M1 Ultra vs A100?

senttoschool said:
V100 only has 16GB of VRam. But Apple Silicon can go up to 128GB of VRam at a relatively cheap price. Perhaps this is where it can shine today?

Can Apple's GPU be a serious alternative to Nvidia's GPU in deep learning without fp16 and bfloat16 support?

senttoschool said:
Does increasing the memory size requirement of the project make the M series look better?

Does PyTorch have a profiler like Tensorflow has?

Optimize TensorFlow performance using the Profiler | TensorFlow Core

www.tensorflow.org

Analyze tf.data performance with the TF Profiler | TensorFlow Core

www.tensorflow.org

It seems that the current version of PyTorch is faster than the new version on CPU. Does anyone have a good explanation for this?

1.12 on M1 Pro Chip not using all the CPU cores. (device="cpu") · Issue #77938 · pytorch/pytorch

🐛 Describe the bug After updated to nightly-build PyTorch 1.12, a performance test is made to compare 'mps' over 'cpu' as shown below: import torch from tqdm import trange DTYPE = torch.float32 MAT...

github.com

CarbonCycles · May 20, 2022

One of the ML researchers I follow has started posting some benchmarks...not bad!

Running PyTorch on the M1 GPU

Today, the PyTorch Team has finally announced M1 GPU support, and I was excited to try it. Here is what I found.

sebastianraschka.com

jerryk · May 20, 2022

CarbonCycles said:
One of the ML researchers I follow has started posting some benchmarks...not bad!

Running PyTorch on the M1 GPU

Today, the PyTorch Team has finally announced M1 GPU support, and I was excited to try it. Here is what I found.

sebastianraschka.com

Thanks for posting the link. Looks like Apple has a ways to go. The numbers were a bit shocking, especially compared to old GPU cards like the Nvidia 1080. That is 3 to 4 generations old.

leman · May 20, 2022

jerryk said:
Thanks for posting the link. Looks like Apple has a ways to go. The numbers were a bit shocking, especially compared to old GPU cards like the Nvidia 1080. That is 3 to 4 generations old.

That’s a 250W desktop GPU with dedicated ML accelerator hardware vs. 20W laptop general purpose laptop GPU. Perf per watt is comparable.

Once Apple releases more capable matrix coprocessors the gap will shrink tremendously.

Luca1995it · May 20, 2022

I will soon add also the A100 in the comparison, I should get access to a DGX in a few days.

Luca1995it · May 20, 2022

Xiao_Xi said:
V100 is the previous generation of Nvidia GPUs. Wouldn't it be a fairer comparison M1 Ultra vs A100?

I will soon add also the A100 in the comparison, I should get access to a DGX in a few days.

Apple Silicon deep learning performance

Suspended

Apple Silicon deep learning performance is terrible.​

macrumors 68000

Results​

macrumors 6502

macrumors 601

macrumors 604

macrumors 68000

macrumors 6502

macrumors 6502

macrumors 68000

macrumors 6502

macrumors 68000

macrumors 68040

macrumors 603

macrumors 68000

macrumors 6502

macrumors 68000

macrumors 6502

macrumors newbie

macrumors 68030

macrumors 68000

macrumors regular

macrumors 604

macrumors Core

macrumors newbie

macrumors newbie

Our Staff

Apple Silicon deep learning performance is terrible.

Results