Machine learning engineer, PyTorch, ANE, NumPy & Apple silicon question

AirpodsNow · Feb 6, 2024

Anyone who does that can share some experiences with your Macs? For years I (non developer) have wondered what it actually means when Apple keynotes talk about the neural engine that has improved again. This week I stumbled upon videos from machine learning developers that share a bit of stuff like here:

(text version: https://www.mrdbourke.com/apple-m3-machine-learning-test/)

I guess I join the crowd of watching popular tech YouTubers talk about how 'marginally' faster the M3's are. It always felt to me that these YouTubers or tech reviewers can't really 'test' these machines beyond "how many chrome tabs" can it run or how quickly can they encode their video. It was similar to the time when the cheese grater Mac Pro with Xeon was being tested. Now their video demand has been met plenty with 4/8K streams / decoding, it seems that these new Apple silicon doesn't really do much more for those YouTubers.

So seeing those videos linked above, was kinda revealing and it seems that each iteration of Apple Silicon meant a significant boost in performance for those use cases. Especially some of the comments about them having 128GB for VRAM was amazing on a laptop for some work loads and how that actually changed doing work. Before I heard people just using 'raw' RAM with virtual machines, but they seem to benefit from machine learning work. Although a desktop with unlimited power still has the highest performance, but some need it on the go.

Anyone in this line of work care to share? It's just a personal interested to understand what it means for those that actually benefit immediately of such (V)RAM usage and GPU cores with.

whg · Feb 8, 2024

I got interested in AI recently. I also have watched some YT videos and at first started using TensorFlow. Unfortunately, I discovered that Apple's Metal library for TensorFlow is very buggy and just doesn't produce reasonable results. Using the CPU with TensorFlow works well, but is very slow, about a factor 10 slower than the GPU version (tested with PyTorch and the famous NIST dataset).

After the bad experience with TensorFlow, I switched to PyTorch. Here the Metal support works very well. One has to use:

device = torch.device('mps' if torch.backends.mps.is_available() else 'cpu')

I also looked into MLX from Apple which also works very well. On the M3 Max the performance comes close to 50% of a Cuda device for both MLX and PyTorch.

I also tested some Llama models with Llama.cpp and GGUF quantization. I only have 36GB RAM and only about 2/3 of it can be used for GPU acceleration by default. With

sudo sysctl iogpu.wired_limit_mb=29000

one can override this limit to 29GB in the example shown.

Performance for the LLM models I tested ranged from 0.5 to 30 token/s, depending on model size and quantization. The best compromise between quality of output and performance came from the Mixtral8x7B model.

I hope that this information is helpful to you!

Zest28 · Feb 9, 2024

If Apple was so much better than NVIDIA in machine learning, NVIDIA their stock price wouldn't be exploding.

So there is no need to look at benchmarks.

And VRAM is not a limitation as ChatGPT runs on NVIDIA cards no problem in the cloud.

RainbowSWE · Mar 17, 2024

Zest28 said:
If Apple was so much better than NVIDIA in machine learning, NVIDIA their stock price wouldn't be exploding.

So there is no need to look at benchmarks.

And VRAM is not a limitation as ChatGPT runs on NVIDIA cards no problem in the cloud.

Your input is so bad I actually had to log on and congratulate you on how bad it is.

T'hain Esh Kelch · Mar 17, 2024

Zest28 said:
If Apple was so much better than NVIDIA in machine learning, NVIDIA their stock price wouldn't be exploding.

Nvidias stock price has exploded because they sell server AI hardware, not client hardware, the latter which Apple also do. We don't really know how good Apples client AI hardware will be, because they haven't really gone all in yet, which is expected to happen at WWDC. In any event, most end users are not buying hardware for AI, hence client sales are unaffected. this is really an Apple and Oranges comparison.

Zest28 said:
And VRAM is not a limitation as ChatGPT runs on NVIDIA cards no problem in the cloud.

You do realize that AI is quiiite a bit more than ChatGPT right? Nvidias stock price did not explode because of ChatGPT. VRAM is very much a limitation, hence why stocks like SK Hynix has also gone up drastically.

RainbowSWE · Mar 17, 2024

AirpodsNow said:
Anyone in this line of work care to share? It's just a personal interested to understand what it means for those that actually benefit immediately of such (V)RAM usage and GPU cores with.

The 128GB RAM available on an M3 Max is fantastic for running Large Language Models (LLM) that do not fit on a typical consumer GPU which top out at 24 GB VRAM. On my 7950X/4090 24GB, if an LLM does not fit I would have to offload layers to system RAM and at that point it's terribly slow. The 128GB on the M3 is awesome for running combinations of LLM's for specialized task in an AI pipeline. One use case I have is an LLM that consumes and removes irrelevant portions of an ISO standard while another works on understand the relevant portions and generates output that can be further analyzed and formatted to fine-tune another pre-trained model. Think AI working on generating output to train other AI.

I know others prefer uncensored models and prefer the privacy of running them locally. Some of these cloud services are cheap but are not so private. The name of the game in AI (besides NVIDIA GPUS) is data; you need data to consume, train, and compete. I know some digital/video artist/professionals run AI models as part of their work flow and require something mobile. They don't have the option of cloud based solutions. Imagine your mobile bandwidth cost uploading thousands of images/videos and then waiting to get them output back only to go through more rounds to get the right output.

One big CON though with the 128GB M3 Max is portability. For myself, I do not consider such an expensive machine portable. This machine is not for taking to your local coffee shop and the 16" MBP is awful on a plane trip versus my 14" M1 Max. You also will need to be plug in if you're doing any kind of training for several hours.

Can't wait for the M3 Ultra though.

Zest28 · Mar 18, 2024

RainbowSWE said:
Your input is so bad I actually had to log on and congratulate you on how bad it is.

Please, Apple sucks so much, their stock price managed to drop in this entire Machine Learning and AI boom.

That is all you need to know about Apple.

And NVIDIA is dominating in this field not because of the GPU's, but simply because CUDA is so much ahead of anything else.

Flowstates · Mar 18, 2024

Unified memory allows you to have the unique market offering of extremely large caches allowing you to tiker with extremely large datasets normally walled behind extremely pricey commercial HW. The actual compute is an order of magnitude slower, nevertheless.

If you want to do anything with a certain degree of efficiency and customer hardware in hand. A cloud instance will prove unavoidable unless you can get your Org to pony-up 100k+ for your homelab

MRMSFC · Mar 18, 2024

Zest28 said:
Please, Apple sucks so much, their stock price managed to drop in this entire Machine Learning and AI boom.

That is all you need to know about Apple.

And NVIDIA is dominating in this field not because of the GPU's, but simply because CUDA is so much ahead of anything else.

The fact that NVidia sells enterprise accelerators at a massive scale and not discrete desktops seems to have been lost in your analysis.

Zest28 · Mar 18, 2024

MRMSFC said:
The fact that NVidia sells enterprise accelerators at a massive scale and not discrete desktops seems to have been lost in your analysis.

And why you think NVIDIA cards are selling so well? It's not because they are the best GPU's necessary. Google their custom GPU is faster and more power efficient than NVIDIA their GPU in some machine learning applications, but Google cannot go around CUDA so they still are using NVIDIA cards. AMD might come with faster GPU's than NVIDIA soon, but without CUDA it doesn't matter.

If Google with all their machine learning expertise cannot fight against CUDA, you really think Apple with Metal is as good as CUDA? The market has priced in that Apple has no hope to fight against CUDA and NVIDIA, even if Apple were to launch their own datacenter chip. But MacRumors seem to know better than the market?

But have fun using machine learning with Apple and Metal if you believe it is superior to CUDA.

Zest28 · Mar 18, 2024

T'hain Esh Kelch said:
Nvidias stock price has exploded because they sell server AI hardware, not client hardware, the latter which Apple also do. We don't really know how good Apples client AI hardware will be, because they haven't really gone all in yet, which is expected to happen at WWDC. In any event, most end users are not buying hardware for AI, hence client sales are unaffected. this is really an Apple and Oranges comparison.

You do realize that AI is quiiite a bit more than ChatGPT right? Nvidias stock price did not explode because of ChatGPT. VRAM is very much a limitation, hence why stocks like SK Hynix has also gone up drastically.

Who is using Apple with Metal for machine learning really?

And no, NVIDIA has exploded because of CUDA, not because of their GPU's. Their GPU's are nothing special as Google has custom GPU's that can outperform them. The magic is CUDA.

Without CUDA, NVIDIA wouldn't be where they are now.

And there is no way that Metal from Apple is as good as CUDA when even Google cannot do it, who is one of the best companies in the world when it comes to machine learning.

But have fun using machine learning with Metal and Apple if you people from MacRumors think it is superior to CUDA.

Regulus67 · Mar 19, 2024

Zest28 said:
And no, NVIDIA has exploded because of CUDA, not because of their GPU's. Their GPU's are nothing special as Google has custom GPU's that can outperform them. The magic is CUDA.

Without CUDA, NVIDIA wouldn't be where they are now.

Even if I know basically nothing about this subject. I have watched the nVidia presentation of their new Blackwell CPU and their DGX GB200 system. And from what I saw in that 2 hour long presentation, I would suggest you go and watch it as well.
CUDA was/is the key, it seems.

nVidia has built a full hardware ecosystem now.
DXG GB200

MRMSFC · Mar 19, 2024

Zest28 said:
If Google with all their machine learning expertise cannot fight against CUDA, you really think Apple with Metal is as good as CUDA?

If you take the time to reread my post, you’ll notice that I didn’t address CUDA at all.

Nor did you refute my point about Apple and NVidia not competing in the same market at all.

Zest28 said:
The market has priced in that Apple has no hope to fight against CUDA and NVIDIA, even if Apple were to launch their own datacenter chip. But MacRumors seem to know better than the market?

Please,

If market trends predicted future success, Apple would have been dead in the 90’s and Gamestop would be the biggest retailer right now.

The fact is that market trends are just as susceptible to hype as anything.

Zest28 said:
But have fun using machine learning with Apple and Metal if you believe it is superior to CUDA.

Please reread my post and I’ll give you a cookie if you can find where I said anything about CUDA or Metal.

Search

Search

Machine learning engineer, PyTorch, ANE, NumPy & Apple silicon question

AirpodsNow

macrumors regular

whg

macrumors regular

Zest28

macrumors 68030

RainbowSWE

macrumors newbie

T'hain Esh Kelch

macrumors 604

RainbowSWE

macrumors newbie

Zest28

macrumors 68030

Flowstates

macrumors 6502

MRMSFC

macrumors 6502

Zest28

macrumors 68030

Zest28

macrumors 68030

Regulus67

macrumors 6502a

MRMSFC

macrumors 6502

Our Staff