Apple Silicon deep learning performance

jinnyman · Jan 9, 2022

Well. out of curiosity, I ran the code through my MBP 13" with rosetta2 CPU only. The fan hit over 7000 RPM for the first time!

Each epoch took avg 65 sec.

white7561 · Jan 9, 2022

iDron said:
It should be quite comparable. Even with Colab Pro, it's not guaranteed that you get a faster GPU. The slower GPUs in Colab are in practice quite comparable to the M1 Pro GPU. It hugely depends on your specific models. People have benchmarked somewhere that standard MNIST (I believe) models might run slighlyt (~20%) faster. I've found some of my personal stuff running slightly slower.

On Colab Pro, the fastest GPUs are at about 9TFLOPS, compared to 5 for the M1 Pro. So if you get an instance with those, the M1 Pro will be about half the speed.

However, code optimazations for the M1 architecture in the future might make things a bit faster.

Also, some models might not benefit from a GPU at all, and of course M1 has a pretty fast CPU.

I see. Do you think that we will ever be able to utilize the Neural Engine from our Macs to increase the speed further in future updates???

I'm really new to AI etc and about to start a course on it. So pardon if my questions like comparing it to google colab etc is kinda a newbie question

Xiao_Xi · Jan 9, 2022

white7561 said:
Do you think that we will ever be able to utilize the Neural Engine from our Macs to increase the speed further in future updates???

After you train the model, you should convert your Tensorflow/Pytorch model to CoreML to use the Neural Engine and speed up predictions.

Philip Turner in the Apple dev forum claims that "The neural engine can't be used for training. It uses only 16-bit half precision, not 16-bit bfloat16. That means gradients can't propagate through it for ML, but ANE can be used for inference."
Source: https://developer.apple.com/forums/thread/697983

Can someone confirm it?

project_2501 · Jan 9, 2022

white7561 said:
I see. Do you think that we will ever be able to utilize the Neural Engine from our Macs to increase the speed further in future updates???

I'm really new to AI etc and about to start a course on it. So pardon if my questions like comparing it to google colab etc is kinda a newbie question

you can run MNIST on a Raspberry Pi Zero. It costs about £4. It's 500 times cheaper than my laptop, and only 25 times slower.

IPython Neural Networks on a Raspberry Pi Zero

There is an updated version of this guide at http://makeyourownneuralnetwork.blogspot.co.uk/2017/01/neural-networks-on-raspberry-pi-zero.htm...

makeyourownneuralnetwork.blogspot.com

mi7chy · Jan 9, 2022

project_2501 said:
you can run MNIST on a Raspberry Pi Zero. It costs about £4. It's 500 times cheaper than my laptop, and only 25 times slower.

IPython Neural Networks on a Raspberry Pi Zero

There is an updated version of this guide at http://makeyourownneuralnetwork.blogspot.co.uk/2017/01/neural-networks-on-raspberry-pi-zero.htm...

makeyourownneuralnetwork.blogspot.com

Any comparisons to the new $15 Pi Zero 2 W, base $35 Pi 4 or Coral?

Xiao_Xi · Jan 9, 2022

Eloquent Arduino has run benchmanks with:

Arduino Nano 33 BLE Sense (Cortex M4 @ 64 MHz)
ESP32 (Xtensa dual-core @ 240 MHz)
Feather M4 Express (Cortex M4F @ 200 MHz)
STM32 Nucleo H743ZI2 (Cortex M7 @ 480 MHz)
Arduino Portenta (Cortex M7 @ 480 MHz)
Teensy 4.0 (Cortex M7 @ 600 MHz)
Raspberry Pi Pico (Rp2040 / Cortex M0+ @ 125 MHz)

TinyML Benchmark: Fully Connected Neural Networks

Ever wandered how fast are the major microcontroller boards to run Tensorflow Lite neural networks? In this post we'll find it out for the case of Fully Connected networks.

eloquentarduino.github.io

I hope the benchmark can help you although those are not the micros you are interested in.

mi7chy · Jan 9, 2022

Looking for more recent comparison than this three year old one with Pi 4, Pi 3 and Coral. Pi 2 Zero W performance is close Pi 3 performance so still relevant.

https://www.hackster.io/news/benchm...on-the-new-raspberry-pi-4-model-b-88db9304ce4

iDron · Jan 9, 2022

white7561 said:
I see. Do you think that we will ever be able to utilize the Neural Engine from our Macs to increase the speed further in future updates???

I'm really new to AI etc and about to start a course on it. So pardon if my questions like comparing it to google colab etc is kinda a newbie question

As others have said, I don’t think the neural engine will be very helpful for training. I’m actually wondering why apple built this into all its chips. I’ve never seen it being used on my MacBook and I wouldn’t see where it would be beneficial in day to day iPhone apps.

Don’t plan on benefitting from it. It may be used for inference though.

leman · Jan 10, 2022

iDron said:
As others have said, I don’t think the neural engine will be very helpful for training. I’m actually wondering why apple built this into all its chips. I’ve never seen it being used on my MacBook and I wouldn’t see where it would be beneficial in day to day iPhone apps.

It’s used for image classification maybe Touch ID,, built in camera, possibly audio processing, Siri and stuff like that. As to why Apple built it or why it has limitations it has, I think it’s pretty obvious: Apple wanted to have an ultra-low-power inference hardware to power its OS-level ML features. The neural engine was never intended to be a general-purpose ML solution, just run a certain common subset of networks with minimal energy expenditure. Apples General-purpose ML hardware are the AMX units.

white7561 · Jan 10, 2022

leman said:
It’s used for image classification maybe Touch ID,, built in camera, possibly audio processing, Siri and stuff like that. As to why Apple built it or why it has limitations it has, I think it’s pretty obvious: Apple wanted to have an ultra-low-power inference hardware to power its OS-level ML features. The neural engine was never intended to be a general-purpose ML solution, just run a certain common subset of networks with minimal energy expenditure. Apples General-purpose ML hardware are the AMX units.

AFAIK there are some AI features that use the neural engine. Like for example if you opened a picture with texts in it. You can hover and copy it . So like OCR

iDron · Jan 10, 2022

leman said:
It’s used for image classification maybe Touch ID,, built in camera, possibly audio processing, Siri and stuff like that. As to why Apple built it or why it has limitations it has, I think it’s pretty obvious: Apple wanted to have an ultra-low-power inference hardware to power its OS-level ML features. The neural engine was never intended to be a general-purpose ML solution, just run a certain common subset of networks with minimal energy expenditure. Apples General-purpose ML hardware are the AMX units.

Okay. But is this so much better than using CPU/GPU? Given it is not being used a long time, quite often probably just seconds or even fractions of a second. Slightly better power efficiency does not really matter then.

leman · Jan 10, 2022

iDron said:
Okay. But is this so much better than using CPU/GPU? Given it is not being used a long time, quite often probably just seconds or even fractions of a second. Slightly better power efficiency does not really matter then.

Apple obviously things it does matter. I tend to agree with them. The NPU seems to be orders of magnitude more efficient than other matrix units, which means that you can get a bunch of useful tricks (like the text recognition in the photos) or high quality video calls without significantly impacting your battery. In the end, its these small things that make a huge qualitative real-world difference when using Apple products. And given how little space the NPU takes on chip, I'd say its well worth it.

iDron · Jan 10, 2022

leman said:
Apple obviously things it does matter. I tend to agree with them. The NPU seems to be orders of magnitude more efficient than other matrix units, which means that you can get a bunch of useful tricks (like the text recognition in the photos) or high quality video calls without significantly impacting your battery. In the end, its these small things that make a huge qualitative real-world difference when using Apple products. And given how little space the NPU takes on chip, I'd say its well worth it.

Well that's probably true, if you consider it would automatically OCR every photo taken for example. So the ANE is actually meant to be used mostly by system processes then? Even for inference it might not make sense to worry about it as an individual developer, data scientist or researcher.

leman · Jan 10, 2022

iDron said:
Well that's probably true, if you consider it would automatically OCR every photo taken for example. So the ANE is actually meant to be used mostly by system processes then? Even for inference it might not make sense to worry about it as an individual developer, data scientist or researcher.

It's meant as a very energy efficient way to implement some basic ML features in your apps and services without paying the usual energy cost associated with ML. It's about making ML integrate as naturally and seamlessly into the system as possible, without impacting the user experience with diminishing battery life or heating up their system.

As such, I think that the NPU can be very valuable for app developers who want to add some ML features while still offering that polished experience that users of Apple devices expect. But again, Apple's NPU is not about maximising performance, it's about providing a user-oriented service with the lowest energy cost. Or, to put differently, it's about enabling new class of user applications without compromising the system.

If you are a researcher, ANE is probably not very useful to you, as you need both high performance and a lot of flexibility (ANE offers neither).

Andropov · Jan 10, 2022

Another cool thing about the NPU is that you could develop something GPU intensive (a game, a video editor...) and add machine learning features to it without dropping frames.

In fact, I remember that when the A12 was still new and shiny, some people said that if the GPU was idle, NPU-capable CoreML tasks would be scheduled to the GPU, but if the GPU was busy, they'd run on the NPU instead.

I don't know if this is still true or if it ever was (I never got around to trying it). But I always thought it was weird that CoreML would prefer to schedule tasks to the GPU instead of the NPU when both were capable.

Xiao_Xi · Jan 10, 2022

Is the Apple NPU a small version of the AWS Inferentia chip?

Xiao_Xi · Jan 13, 2022

white7561 said:
is currently the M1 Pro 16 cores slower than the Colab Pro?

The main problem with the tensorflow-metal plugin is its lack of reliability. It seems that the tensorflow-metal sometimes gives different results than the vanilla TensorFlow. For example:

GAN with tensorflow-metal gives di… | Apple Developer Forums

developer.apple.com

white7561 · Jan 13, 2022

Xiao_Xi said:
The main problem with the tensorflow-metal plugin is its lack of reliability. It seems that the tensorflow-metal sometimes gives different results than the vanilla TensorFlow. For example:

GAN with tensorflow-metal gives di… | Apple Developer Forums

developer.apple.com

Huh interesting. Good thing to know before using it.

Xiao_Xi · Feb 1, 2022

A comparison of the deep learning performance of M1 Pro, M1 Max and some Nvidia GPUs.

Deep Learning on the M1 Pro with Apple Silicon

Let's take my new Macbook Pro for a spin and see how well it performs, shall we?. Made by Thomas Capelle using Weights & Biases

wandb.ai

Xiao_Xi · Feb 18, 2022

Xiao_Xi said:
It seems that Apple is working faster and needs less time to adapt Tensorflow-Metal to every new Tensorflow version. It took Apple one month and a half for version 2.6 and one month for 2.7.

It seems that Apple is working faster now and can release its Tensorflow within two weeks of Google releasing Tensorflow.

Client Challenge

tensorflow

TensorFlow is an open source machine learning framework for everyone.

pypi.org

I hope Apple can compile a compatible tensorflow-text soon.

Please support tensorflow-text-mac… | Apple Developer Forums

developer.apple.com

Xiao_Xi · Feb 26, 2022

Pytorch seems to run faster than Tensorflow on M1 Macs.

Welcome to AMD

AMD delivers leadership high-performance and adaptive computing solutions to advance data center AI, AI PCs, intelligent edge devices, gaming, & beyond.

nod.ai

CarbonCycles · Mar 4, 2022

Xiao_Xi said:
Pytorch seems to run faster than Tensorflow on M1 Macs.

Welcome to AMD

AMD delivers leadership high-performance and adaptive computing solutions to advance data center AI, AI PCs, intelligent edge devices, gaming, & beyond.

nod.ai

Is the node.ai shark proprietary?

Xiao_Xi · Mar 5, 2022

CarbonCycles said:
Is the node.ai shark proprietary?

I don't think so. You can check its repo.

GitHub - nod-ai/SHARK-Studio: SHARK Studio -- Web UI for SHARK+IREE High Performance Machine Learning Distribution

SHARK Studio -- Web UI for SHARK+IREE High Performance Machine Learning Distribution - nod-ai/SHARK-Studio

github.com

They have shared better instructions for installing shark on Apple M1 Mac.

Added missing set_dep_pypaths scripts. by pashu123 · Pull Request #2 · nod-ai/SHARK-Studio

github.com

Xiao_Xi · Mar 9, 2022

Apple seems to be working on a solution to solve the problems that arise when a package uses Tensorflow-macos instead of Tensorflow. ??

[A] lot of the packages that check for Tensorflow do it against the package name reserved for the baseline TensorFlow called tensorflow, whereas the TensorFlow package on MacOS is tensorflow-macos. We are working on a generic solution that will fix this issue but it will take until TF2.9 timeline for that to take effect.

Source: https://developer.apple.com/forums/thread/701656

niray9 · Mar 11, 2022

Why Python native on M1 Max is gre… | Apple Developer Forums

developer.apple.com

Apple Silicon deep learning performance

macrumors 6502a

macrumors 6502a

macrumors 68000

macrumors 6502a

Suspended

macrumors 68000

Suspended

macrumors regular

macrumors Core

macrumors 6502a

macrumors regular

macrumors Core

macrumors regular

macrumors Core

macrumors 6502a

macrumors 68000

macrumors 68000

macrumors 6502a

macrumors 68000

macrumors 68000

macrumors 68000

macrumors regular

macrumors 68000

macrumors 68000

macrumors member

Our Staff