Apple Silicon deep learning performance

Boomhowler · May 24, 2022

Sterkenburg said:
Hey Luca, nice work! However, when launching the benchmarking script on my M1 Max I am running into the issue described here (which I was indeed able to replicate): https://github.com/pytorch/pytorch/issues/78001.

Using exactly the same setup as in the repo. Did anyone else run into this?

Getting this error (which seems to be the same thing) regardless of sequence length. Running this on m1 max with 64GB

Code:

MPSNDArray.mm:782: failed assertion `[MPSNDArray, initWithBuffer:descriptor:] 
Error: buffer is not large enough. Must be 32768 bytes

Sterkenburg · May 25, 2022

Boomhowler said:
Getting this error (which seems to be the same thing) regardless of sequence length. Running this on m1 max with 64GB

Code:

MPSNDArray.mm:782: failed assertion `[MPSNDArray, initWithBuffer:descriptor:] Error: buffer is not large enough. Must be 32768 bytes

Yeah, sounds very similar, I used the same machine. Happens regardless of hyperparameter settings.

chengengaun · May 25, 2022

Boomhowler said:
Getting this error (which seems to be the same thing) regardless of sequence length. Running this on m1 max with 64GB

Code:

MPSNDArray.mm:782: failed assertion `[MPSNDArray, initWithBuffer:descriptor:] Error: buffer is not large enough. Must be 32768 bytes

Seems like a bug:

`MPSNDArray` or `MPSGraphTensorData` allocated with wrong size · Issue #77851 · pytorch/pytorch

🐛 Describe the bug The following code: import torch mps = torch.device("mps") size = 16 A = torch.rand(size, device=mps) F = torch.rand(size, size, size, device=mps) print(A@F) Produces the followi...

github.com

While I ran into this one:

buffer is not large enough when running pytorch on Mac M1 mps · Issue #77886 · pytorch/pytorch

🐛 Describe the bug The bug seems related to #77851 To reproduce the bug: from transformers import AutoModel from transformers import AutoTokenizer import torch model_ckpt = "distilbert-base-uncased...

github.com

Xiao_Xi · Jun 6, 2022

Apple will have a session on Pytorch and Tensorflow on Friday. 🤩

altaic · Jun 6, 2022

Xiao_Xi said:
Apple will have a session on Pytorch and Tensorflow on Friday. 🤩

View attachment 2014581

Also:

Metal backend for PyTorch
The new Metal backend in PyTorch version 1.12 enables high-performance, GPU-accelerated training using MPS Graph and the Metal Performance Shaders primitives.

GrumpyCoder · Jun 6, 2022

Well, if they could give is reproducible results in Tensorflow that would be a start.

buckwheet · Jun 7, 2022

jerryk said:
Thanks for posting the link. Looks like Apple has a ways to go. The numbers were a bit shocking, especially compared to old GPU cards like the Nvidia 1080. That is 3 to 4 generations old.

Yeah, cuda is Nvidia's secret sauce, but according to the platforms state-of-the-union video Metal 3 will accelerate PyTorch mps significantly. Looking forward to seeing the talk on Friday.

Xiao_Xi · Jun 10, 2022

How is it possible that Apple has not explained this before? Apple needs to learn to write changelogs.

Distributed training is very cool!

Accelerate machine learning with Metal - WWDC22 - Videos - Apple Developer

Discover how you can use Metal to accelerate your PyTorch model training on macOS. We'll take you through updates to TensorFlow training...

developer.apple.com

dgdosen · Jun 10, 2022

Xiao_Xi said:
How is it possible that Apple has not explained this before? Apple needs to learn to write changelogs.

View attachment 2017301

Distributed training is very cool!

Accelerate machine learning with Metal - WWDC22 - Videos - Apple Developer

Discover how you can use Metal to accelerate your PyTorch model training on macOS. We'll take you through updates to TensorFlow training...

developer.apple.com

Does it matter if one is using MacOS Monterey (or a "pre-Ventura OS" and Metal V2(?)) vs MacOS Ventura and Metal V3?
Will these tensorflow or pytorch plugins work over different versions of Metal? Or is that all hidden behind the API surface of MPS Graph?

I'm assuming WWDC 22 is all Ventura and Metal 3 on Apple Silicon.

Xiao_Xi · Jun 15, 2022

dgdosen said:
Does it matter if one is using MacOS Monterey (or a "pre-Ventura OS" and Metal V2(?)) vs MacOS Ventura and Metal V3?

The minimum requirement is macOS 12.0 for Tensorflow and 12.3 for Pytorch.

Get Started

Set up PyTorch easily with local installation or supported cloud platforms.

pytorch.org

I think the performance (and reliability) of Tensorflow and Pytorch on macOS depends heavily on whether the op you want to use is well supported by the GPU and not on your version of Metal. People keep finding ops in Tensorflow that are not yet supported by the GPU.

how to fix this error when trainin… | Apple Developer Forums

developer.apple.com

Apple seems to have focused on improving 3D rendering and gaming with Metal 3.

Metal Overview - Apple Developer

Metal powers hardware-accelerated graphics on Apple platforms by providing a low-overhead API, rich shading language, tight integration between graphics and compute, and an unparalleled suite of GPU profiling and debugging tools.

developer.apple.com

widEyed · Jun 15, 2022

Sterkenburg said:
Not sure Apple will ever want to go the Cloud route but I agree that they need to up the ante for the Mac Pro platform and bring some feature parity on the GPU side, lest they want it to be just a "brand statement" computer confined to a niche of professional video producers. The potential is there with the AS architecture: lots of unified memory that can be accessed by the GPU, high bandwidth, low latency. But it needs software support.

I have always been disappointed at how the quarrel with Nvidia resulted in Apple just letting go of ML/AI computing without even trying anymore. It is even more perplexing when you consider that a majority of the scientists and engineers in the field use a Mac as a work machine... I really hope AS can be the trigger for things to turn around.

does apple shipping silicon with neural net cores count for anything? is it useful to researchers or only for Apple Store app developers that might make use of it?

Craig Federighi says he’s always been fascinated with ML in interviews and hopes/expects Apple pursues it deeper over time.

Xiao_Xi · Jun 21, 2022

widEyed said:
does apple shipping silicon with neural net cores count for anything? is it useful to researchers or only for Apple Store app developers that might make use of it?

Apple hardware is very good at inference, but not so good at training. But Apple is getting better at training, and distributed training across multiple Mac Studio is now possible.

Does the M1/M2 SoC support native bfloat16 arithmetic?

Is there any benchmarking comparing Tensorflow and PyTorch on macOS?

Xiao_Xi · Sep 12, 2022

PyTorch Foundation Formed By Meta, AMD, NVIDIA, & Others To Advance AI - Phoronix

www.phoronix.com

I wish Apple had joined in to show their commitment to the deep learning community.

name99 · Sep 12, 2022

Xiao_Xi said:
Apple hardware is very good at inference, but not so good at training. But Apple is getting better at training, and distributed training across multiple Mac Studio is now possible.

Does the M1/M2 SoC support native bfloat16 arithmetic?

Is there any benchmarking comparing Tensorflow and PyTorch on macOS?

Would PiM improve the situation?
What do you think of my hypothesis here:
https://www.realworldtech.com/forum/?threadid=208595&curpostid=208595
?

It would be helpful if we had some serious M2 teardowns/cross sections, but we do not.
An A16 cross section will help, maybe we'll have one in a month or so.

leman · Sep 12, 2022

name99 said:
Would PiM improve the situation?
What do you think of my hypothesis here:
https://www.realworldtech.com/forum/?threadid=208595&curpostid=208595
?

It would be helpful if we had some serious M2 teardowns/cross sections, but we do not.
An A16 cross section will help, maybe we'll have one in a month or so.

I think that’s pretty far fetched 😅 but who knows? Could the mysterious die simply be the SoC cache or something like that?

name99 · Sep 13, 2022

leman said:
I think that’s pretty far fetched 😅 but who knows? Could the mysterious die simply be the SoC cache or something like that?

SoC cache is on the SoC die -- it's easily visible in die shots.

leman · Sep 13, 2022

name99 said:
SoC cache is on the SoC die -- it's easily visible in die shots.

Makes sense. Is there any evidence of a similar mystery component on M-series packages?

altaic · Sep 13, 2022

leman said:
Makes sense. Is there any evidence of a similar mystery component on M-series packages?

Not on the M1, referencing the SystemPlus sample report.

Edit: added direct wayback link

name99 · Sep 14, 2022

altaic said:
Not on the M1, referencing the SystemPlus sample report.

Edit: added direct wayback link

M1 is not the issue. M1 and A14 go together. The interesting case is M2 (and A16).

mrsavage1 · Sep 16, 2022

buckwheet said:
Yeah, cuda is Nvidia's secret sauce, but according to the platforms state-of-the-union video Metal 3 will accelerate PyTorch mps significantly. Looking forward to seeing the talk on Friday.

Any news on this since Ventura has been released with metal 3. Been trying to find any new benchmarks that show metal3 increases PyTorch mps performance significantly

leman · Sep 16, 2022

mrsavage1 said:
Any news on this since Ventura has been released with metal 3. Been trying to find any new benchmarks that show metal3 increases PyTorch mps performance significantly

Ventura hasn’t been released. And why would you expect Metal 3 to do anything for PyTorch at all? Most improvements in Metal 3 target raytracing and gaming.

Xiao_Xi · Sep 17, 2022

Should Apple adopt this format?

FP8 Formats For Deep Learning

FP8 is a natural progression for accelerating deep learning training inference beyond the 16-bit formats common in modern processors. In this paper we propose an 8-bit floating point (FP8) binary interchange format con…

www.arxiv-vanity.com

mrsavage1 · Sep 17, 2022

leman said:
Ventura hasn’t been released. And why would you expect Metal 3 to do anything for PyTorch at all? Most improvements in Metal 3 target raytracing and gaming.

Metal Overview - Apple Developer

Metal powers hardware-accelerated graphics on Apple platforms by providing a low-overhead API, rich shading language, tight integration between graphics and compute, and an unparalleled suite of GPU profiling and debugging tools.

developer.apple.com

Metal backend for PyTorch

The new Metal backend in PyTorch version 1.12 enables high-performance, GPU-accelerated training using MPS Graph and the Metal Performance Shaders primitives.

In the metal 3 overview PyTorch is mentioned saying it uses metal performance shaders then in the Metal shaders part Apple says there's a performance boost

Mesh shaders

This new geometry pipeline replaces vertex shaders with two new shader stages — object and mesh — that enable more flexible culling and LOD selection, and more efficient geometry shading and generation.

GrumpyCoder · Sep 17, 2022

mrsavage1 said:
Metal backend for PyTorch
The new Metal backend in PyTorch version 1.12 enables high-performance, GPU-accelerated training using MPS Graph and the Metal Performance Shaders primitives.

That refers to the PyTorch 1.12 backend which comes with MPS out of the box. It's been available in nightly releases before though, so there should be nothing new here: https://sebastianraschka.com/blog/2022/pytorch-m1-gpu.html

mrsavage1 · Sep 17, 2022

GrumpyCoder said:
That refers to the PyTorch 1.12 backend which comes with MPS out of the box. It's been available in nightly releases before though, so there should be nothing new here: https://sebastianraschka.com/blog/2022/pytorch-m1-gpu.html

how about the updates to the shaders in metal 3 which PyTorch uses?

Mesh shaders

This new geometry pipeline replaces vertex shaders with two new shader stages — object and mesh — that enable more flexible culling and LOD selection, and more efficient geometry shading and generation.

Apple Silicon deep learning performance

macrumors 6502

macrumors 6502a

macrumors 6502

macrumors 68000

macrumors 6502a

Metal backend for PyTorch​

macrumors 68020

macrumors 6502

macrumors 68000

macrumors 68030

macrumors 68000

macrumors regular

macrumors 68000

macrumors 68000

macrumors 68030

macrumors Core

macrumors 68030

macrumors Core

macrumors 6502a

macrumors 68030

macrumors regular

macrumors Core

macrumors 68000

macrumors regular

Metal Overview - Apple Developer Metal powers hardware-accelerated graphics on Apple platforms by providing a low-overhead API, rich shading language, tight integration between graphics and compute, and an unparalleled suite of GPU profiling and debugging tools. developer.apple.com

Metal backend for PyTorch​

Mesh shaders​

macrumors 68020

Metal backend for PyTorch​

macrumors regular

Mesh shaders​

Our Staff

Metal backend for PyTorch

Metal Overview - Apple Developer

Metal powers hardware-accelerated graphics on Apple platforms by providing a low-overhead API, rich shading language, tight integration between graphics and compute, and an unparalleled suite of GPU profiling and debugging tools.

developer.apple.com

Metal backend for PyTorch

Mesh shaders

Metal backend for PyTorch

Mesh shaders