Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

patent10021

macrumors 68040
Original poster
Apr 23, 2004
3,530
809
Clarification on Metal/ANE and PT/TF for ML on Silicon.

Reading various Apple pages and other blogs here's what I understand.

PyTorch
To accelerate the training of ML models, PT takes advantage of the hardware acceleration of the ANE, but any model you use needs to be translated/compiled as a CoreML version of the model.

PT also supports GPU-accelerated training on Mac via Metal.

If we don't intend to deploy transformers/models on Apple devices, we don't need to take into consideration how many cores the ANE has on any machine we are considering using for training? In this case, we can just look at number of GPU cores when choosing a Mac?

Without deploying transformers/models on Apple devices can we still use ANE for training purposes only? Can we use both ANE and Metal for maximum training performance?

Tensorflow
TF uses tensorflow-metal PluggableDevice to accelerate the training of machine learning models on Silicon with Metal.

TF models compiled as CoreML models also uses ANE.


Is this a correct summary?
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Yes.
When training ML models, developers benefit from accelerated training on GPUs with PyTorch and TensorFlow by leveraging the Metal Performance Shaders (MPS) back end. For deployment of trained models on Apple devices, they use coremltools, Apple’s open-source unified conversion tool, to convert their favorite PyTorch and TensorFlow models to the Core ML model package format. Core ML then seamlessly blends CPU, GPU, and ANE (if available) to create the most effective hybrid execution plan exploiting all available engines on a given device.
 

patent10021

macrumors 68040
Original poster
Apr 23, 2004
3,530
809
Yes.

Thanks. Are you training on Silicon? Cloud? Linux? I'm looking for a new machine. Mainly computer vision and analytics.

I'm still unclear on this question though.

Without deploying transformers/models on Apple devices can we still use ANE for training purposes only?

Can we use both ANE and Metal for maximum training performance?
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Without deploying transformers/models on Apple devices can we still use ANE for training purposes only?
So far, Tensorflow and PyTorch only use Apple's GPU.

I'm looking for a new machine. Mainly computer vision and analytics.
I would try Google Colab or Amazon Sagemaker Studio Lab and see if either of them suits my needs. I wouldn't train a model in Apple Silicon because the Metal backend for Tensorflow and Pytorch are not mature enough.

You may find interesting this presentation about how Stable Diffusion was trained on AWS.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,674
ANE is generally only used for inference. For training the GPU (and maybe AMX) is used.
 
  • Like
Reactions: Xiao_Xi

patent10021

macrumors 68040
Original poster
Apr 23, 2004
3,530
809
ANE is generally only used for inference. For training the GPU (and maybe AMX) is used.
Right. Reading further,

When training ML models, developers benefit from accelerated training on GPUs with PyTorch and TensorFlow by leveraging Metal. Core ML seamlessly blends CPU, GPU, and ANE (if available) to create the most effective hybrid execution plan exploiting all available engines on a given device.

Therefore since both the 48 core GPU and 64 core GPU M1 Mac Studios both have 32 core NEs there's no point in looking at the number of NE cores when shopping for a Mac right?

Training on Silicon appears to be on par with many discreet Nvidia GPUs. Even more, WWDC2022 videos showed that TensorFlow will soon support Metal 3, which according to Apple should provide a 16x increase in performance. Do you interpret this as meaning that we can now confidently have high performance training on Macs and not have to think about training on systems with discreet Nvidia GPUs? Examining tests on GitHub it appears with Silicon we can get performance on par with Colab Tesla GPU accelerators.

Lastly, clearly a 64-core GPU is recommended if one can purchase it. But what about memory? Do you really need to max memory to 64GB or 128GB? For deep learning training are we still better off with a Windows system + Titan/RTX GPU?
 
Last edited:

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Training on Silicon appears to be on par with Nvidia/RTX PC systems.
Examining tests on GitHub it appears with Silicon we can get performance on par with Tesla GPU accelerators.
Do you have a link that confirms all this?

Even more, WWDC2022 videos showed that TensorFlow will soon support Metal 3, which according to Apple should provide a 16x increase in performance.
Can you share the link to the presentation?

I would be very impressed if Apple has managed to improve so much Tensorflow or Pytorch in such a short time. For reference, Apple claimed that Pytorch on the GPU was 8 times faster than Pytorch on the CPU.
pytorch.png


Therefore since both the 48 core GPU and 64 core GPU M1 Mac Studios both have 32 core NEs there's no point in looking at the number of NE cores when shopping for a Mac right?
Clearly a 64-core GPU is recommended if one can purchase it. But what about memory? Do you really need to max memory to 64GB or 128GB?
It would be easier to help you if you explain what you want to do. Are you going to fit a pre-trained model or train a model from scratch? What kind of models do you want to train? How large is your data set? How much time do you have to train a model?
 

leman

macrumors Core
Oct 14, 2008
19,521
19,674
Training on Silicon appears to be on par with many discreet Nvidia GPUs.

This is really not the case. Inference can be fairly fast when you use CoreML and utilise all the available accelerators, especially given the power consumption (e.g. Apple's CoreML implementation of stable diffusion is only two-three times slower compared to high-end Nvidia GPUs). But I haven't seen a single example where Apple Silicon would outperform an Nvidia GPU with the comparable nominal performance in training.

Even more, WWDC2022 videos showed that TensorFlow will soon support Metal 3, which according to Apple should provide a 16x increase in performance. Do you interpret this as meaning that we can now confidently have high performance training on Macs and not have to think about training on systems with discreet Nvidia GPUs?

I interpret this as you probably misunderstanding something. Maybe this 16x was about a specific corner case or something like that.
 

innominato5090

macrumors 6502
Sep 4, 2009
452
71
As much as I like my Apple Silicon, not sure why you'd want to train a model on it. Performance is not as good, support seems limited, and you're missing out on a lot of tools that are available in the CUDA ecosystem.
 

patent10021

macrumors 68040
Original poster
Apr 23, 2004
3,530
809
As much as I like my Apple Silicon, not sure why you'd want to train a model on it. Performance is not as good, support seems limited, and you're missing out on a lot of tools that are available in the CUDA ecosystem.

As an aside, truly big datasets ie >25GB require memory which even the most expensive consumer NVIDIA/AMD GPUs don't have enough of. The advantage of Apple Silicon is that you can work with big data sets as they will all load in memory. 64GB/128GB unified.

This is why *to me, the best of both worlds is using an M1 Ultra 64GB/32/48-core GPU for data science projects and use Colab+Cuda when you need them for deep learning projects: I've been using university accelerators so I didn't know until recently that Cuda could be installed on Colab.
%load_ext nvcc_plugin

To me at least it seems like cloud is the way forward for private workstations since you will have access to whatever you need.

It's worth noting that probably no one in industry actually uses consumer (Apple Silicon/PC) systems for real-world production and training. I imagine most use Colab, SageMaker, multiple A6000's etc. Or wherever there is access to compute accelerators.

I'd love to hear about any other info/news you guys could tell me about since I am new to setting up my own home workstation and shopping around.
 
Last edited:

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
It's worth noting that probably no one in industry actually uses single GPU consumer (Apple Silicon/PC) systems for real-world production and training.
Keep in mind that cloud service providers offer not only the latest data center GPUs, but also best-in-class software for MLOps. MLOps can be difficult to do well on-premises.

ML-pipeline.png

 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.