Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,529
955

That website is misleading. Metal 3 and the Metal backend for Pytorch are two different things. It looks like the marketing people didn't know where to put that Pytorch can use Apple's GPU, and put it with the Metal enhancements for games.

apple_m1_eval.png

 

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,074
2,654
how about the updates to the shaders in metal 3?
That is a potential way of improving performance, but I don't think anything changed for compute with MPS. The focus was more graphics related. They do mention to have higher efficiency, but who knows what that means as they refer to it in the context of raytracing.

I think it's also too early to report on beta versions, as there are too many bugs or unexplained issues going on right now which is unacceptable for productive work and only acceptable for trying a few things. See the latest issue here: https://github.com/pytorch/pytorch/issues/84936

It's frustrating. :(
 

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,074
2,654
Should Apple adopt this format?
While I'm here, let me quickly comment on it. Eventually Apple will have to adapt to it, once others do. This is all highly experimental for now, but the benefits can definitely be there for FP8, particularly when it comes to the whole green AI thing. However, Bill Vass and his team at Amazon AWS already played around with FP8 and also FP4 and there isn't a generic result so far when it comes to "everyone should do this now". I think it's going to be some time before we see any real progress on "new" data formats. There's also some Canadian company working on new hardware tailored for DL specifically and scaling in mind. Haven't heard anything new for a while now. I forget the name, it's a former AMD engineer who started it IIRC.
 
  • Like
Reactions: ingambe and Xiao_Xi

leman

macrumors Core
Oct 14, 2008
19,319
19,336
how about the updates to the shaders in metal 3 which PyTorch uses?

Mesh shaders​

This new geometry pipeline replaces vertex shaders with two new shader stages — object and mesh — that enable more flexible culling and LOD selection, and more efficient geometry shading and generation.

Why would PyTorch be using mesh shaders? It’s not a gaming engine.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,529
955
I think it's also too early to report on beta versions, as there are too many bugs or unexplained issues going on right now which is unacceptable for productive work and only acceptable for trying a few things. See the latest issue here: https://github.com/pytorch/pytorch/issues/84936

It's frustrating. :(
Tensorflow Metal backend runs a little better, but not that much.

Apple is very committed to it, and it gets much better with each iteration. However, you never know what operations have been ported to the GPU or what bugs have been fixed because Apple doesn't write change logs.
 

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,074
2,654
Tensorflow Metal backend runs a little better, but not that much.
Apple is very committed to it, and it gets much better with each iteration. However, you never know what operations have been ported to the GPU or what bugs have been fixed because Apple doesn't write change logs.
I tried Tensorflow on Metal before, but it also produced inconsistent results at the time. That's why I stopped fiddling with TF on Mac and switched to PyTorch. For more serious things I have to use both, but on Nvidia and it's more PyTorch these days than TF. Maybe I should try TF on Mac again, it's been a while.

It would be nice to try a few smaller things on AS machines, especially MBPs. Maybe even inference only, for quickly hooking up some robots in the lab and test some stuff and not having to go via git and other machines. I don't really think it's TF or PyTorch related, it's Apple with Metal which is causing this. As you say, they don't have detailed changelogs, so it's always a ton of work one has to put in to figure out what's going on. George Hotz did a bunch of reverse engineering in the early M1 days to figure out how things work exactly, but that's not really feasible.

It would be really nice to see a list of supported features that actually work as intended for both TF and PyTorch with Metal. That way, people would know what they get. And while we have seen progress in the past years, both frameworks make general non-Metal related progress as well. What we'd really need is a schedule and timeframe when we catch up to things and when we can expect everything to work.
 
  • Like
Reactions: Xiao_Xi

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,529
955
Apple is slowly improving the Tensorflow documentation.

However, the people writing information about new releases seem to have learned marketing rather than how to write proper changelogs.
tensorflow-macos.png
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,529
955
It would be really nice to see a list of supported features that actually work as intended for both TF and PyTorch with Metal.

There is no complete list yet, but at least the Pytorch developers list the new operations added in each release.
  • Added aten::index_add.out operator for MPS backend (#79935)
  • Added aten::prelu operator for MPS backend (#82401)
  • Added aten::bitwise-not operator native support for MPS backend (#83678)
  • Added aten::tensor::index_put operator for MPS backend (#85672)
  • Added aten::upsample_nearest1d operator for MPS backend (#81303)
  • Added aten::bitwise_{and|or|xor} operators for MPS backend (#82307)
  • Added aten::index.Tensor_out operator for MPS backend (#82507)
  • Added aten::masked_select operator for MPS backend (#85818)
  • Added aten::multinomial operator for MPS backend (#80760)
 

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,074
2,654
There is no complete list yet, but at least the Pytorch developers list the new operations added in each release.
That is at least something. But the amount of problems these releases have is beyond crazy. There's usually always some sort of fixing required beyond setting device to mps, especially when using public code from papers while linux just works. Plus it is still very, very slow in some (most) cases. Currently trying RAFT for optical flow (from this paper: https://arxiv.org/pdf/2003.12039.pdf) on M1 Max and it's painfully slow with mps. :(
 
  • Like
Reactions: applesed

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,529
955

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,074
2,654
Does it benefit Apple in any way?
Large scale? No. Software aside, Apple has a hardware problem... the M1 Ultra still doesn't beat a 1080Ti, so in that sense it's years behind. They need to up their game and introduce workstation or server/enterprise type of hardware if they want to play this game. The competition is RTX6000/8000 and at least A100. And once they manage to get close, the software kicks in. They can rely on 3rd party software like PyTorch and that might get better over time, but Nvidia knows this and they already reacted to this years ago.

On a side note, scalability might be another issue. I recently spoke to someone running a 4000 node A100 cluster. It's easy to use, for everyone. How do we scale macOS "servers" to that level? Slurm, OpenMP, ...?

As for Nvidia, they know there are alternatives to Cuda/cuBLAS/cuDNN, but is that really their focus these days? The past few years, Nvidia worked on other software. Omniverse with Metaverse applications took over and everyone who isn't focused on theoretical research will benefit from it. Those with robotic applications will use IsaacSim. Autonomous driving? Nvidia Drive. Genomics? Nvidia Clare. Medial diagnosis? Maybe Nvidia Kaolin. General physical accurate synthetic data generation? Nvidia Replicator. Digital Twins? A mix of all the Omniverse tools. There's literally something for everyone. Outside of that Nvidia eco-system, there's literally nothing equivalent.

Sure I could use CoppeliaSim instead of Isaac Sim for robotics and I do for teaching. Simply because it requires way less hardware and it's learning curve isn't as steep as Isaac Sim. Students can install it on their laptop for introduction level courses. Research and advanced stuff? Not so much. The same argument can be made for Carla vs Drive.

So in addition to much more powerful hardware and well supported software, they need these tool and I doubt they could rely on 3rd party (maybe open source) software for this.

For small home application and playing around with very basic stuff like cats and dogs classifiers, sure, that's an option.
 
  • Like
Reactions: Xiao_Xi

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,529
955
Is it getting easier to use Tensorflow on macOS?
TensorFlow 2.13 is the first version to provide Apple Silicon wheels, which means when you install TensorFlow on an Apple Silicon Mac, you will be able to use the latest version of TensorFlow. The nightly builds for Apple Silicon wheels were released in March 2023 and this new support will enable more fine-grained testing, thanks to technical collaboration between Apple, MacStadium, and Google.

It appears that Tensorflow 2.13 supports FP16 and BF16 on Apple GPU.
tensorflowtensorflow-metalMacOsfeatures
v2.5v0.1.212.0+Pluggable device
v2.6v0.2.012.0+Variable seq. length RNN
v2.7v0.3.012.0+Custom op support
v2.8v0.4.012.0+RNN perf. improvements
v2.9v0.5.012.1+Distributed training
v2.10v0.6.012.1+
v2.11v0.7.012.1+
v2.12v0.8.012.1+
v2.13v1.0.012.1+FP16 and BF16 support
 
Last edited:
  • Like
Reactions: jerryk

white7561

macrumors 6502a
Jun 28, 2016
934
385
World
How is AI training in Apple Silicon nowadays? I'm doing a final project on uni with my M1 and it seems like from what I've seen for some models when training it still bugs out? On some it is not reliable? Anyone has experience with doing training and stuff on the M1 Macs? Thanks!
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,529
955
What about PyTorch?
Pytorch 2.1 was released this week. Unfortunately, the release notes don't indicate whether the MPS backend has stabilized or is still in beta.

  • Add support for MPSProfiler (#100635, #101002, #101692)
  • Enable saved models to be loaded directly to MPS through torch.jit.load (#102204)
  • Introduce torch.mps.Event() APIs (#102121)
 
  • Like
Reactions: applesed

leman

macrumors Core
Oct 14, 2008
19,319
19,336
Microsoft was preparing Macs for OpenAI employees. 🤣🤣

I don't find this surprising. An Ultra is one of the most cost-efficient computers for working with large model development. With models of that size RAM becomes the bottleneck and Apple Silicon has the biggest GPU RAM pools among workstations, while being considerably cheaper. Nvidia has much faster ML hardware, true, but the fastest hardware won't do much if you can't feed it with data.
 
  • Like
Reactions: altaic

senttoschool

macrumors 68030
Nov 2, 2017
2,575
5,338
I don't find this surprising. An Ultra is one of the most cost-efficient computers for working with large model development. With models of that size RAM becomes the bottleneck and Apple Silicon has the biggest GPU RAM pools among workstations, while being considerably cheaper. Nvidia has much faster ML hardware, true, but the fastest hardware won't do much if you can't feed it with data.
All San Francisco / SV software devs use Macs. It’s extremely rare to find one who doesn’t use a Mac.

It doesn’t mean they run any models local. Most likely the models are run in the cloud or some sort of on premise Nvidia setup.
 

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,074
2,654
They certainly won't run anything but toy models on their Macs. These are desktop systems to work on and connect to larger clusters. A Macbook Air would be totally fine for that type of work and I doubt they have availability issues with Nvidia hardware.
 
  • Like
Reactions: jerryk

Appletoni

Suspended
Mar 26, 2021
443
177
I don't find this surprising. An Ultra is one of the most cost-efficient computers for working with large model development. With models of that size RAM becomes the bottleneck and Apple Silicon has the biggest GPU RAM pools among workstations, while being considerably cheaper. Nvidia has much faster ML hardware, true, but the fastest hardware won't do much if you can't feed it with data.
We are still waiting for the MacBook ULTRA with 18- to 20-inch and M3 ULTRA chip.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,529
955
Im highly doubtful that they run models on the Macs. Most likely they have clusters with some flavor of Linux and Nvidia gpus. They probably just write code on the Macs and like safari.
OpenAI models require a lot of computing power and Microsoft was going to provide it.
Our data shows one of OpenAI’s next training supercomputers in Arizona was going to have more than 75,000+ GPUs in a singular site by the middle of next year.

Our data also shows us that Microsoft is directly buying more than 400,000 GPUs next year for both training and copilot/API inference. Furthermore, Microsoft also has tens of thousands of GPUs coming in via cloud deals with CoreWeave, Lambda, and Oracle.
 
  • Like
Reactions: Queen6
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.