MacBook Pro LLM Performance

LogicalApex · Oct 31, 2024

I'm strongly considering upgrading my 2018 Intel MacBook Pro to the new 16" M4 Max MacBook Pro with 36GB of RAM. The Intel and Apple Silicon differences have been well discussed on the forum and I'm aware of the tradeoffs from a Mac perspective (improved battery life, etc with the loss of bootcamp and Windows x86 virtualization). The main driver for my upgrade is a combination of battery life and access to locally running LLM.

Obviously, no one has the new MacBook Pro yet so we can't directly speak of direct real world performance for this model. We can look at the previous model though.

For owners of the previous MacBook Pro how does the MacBook handle running local LLM models compared to a desktop with a 3090?

komuh · Oct 31, 2024

In my tests using koboldcpp, the M1 Ultra is significantly slower for inference compared to the 2x RTX 3090, ranging from 5 to 15 times slower, depending on the model. However, it’s important to note that software is primarily optimized for NVIDIA GPUs, so if you prioritize the best performance, you should consider stacking multiple RTX 3090 GPUs or waiting for the upcoming RTX 5090 if you plan to upgrade your Mac. Running LLMs is simply a convenient feature that offers good enough performance, albeit at a slower speed. If you don’t have any NVIDIA GPUs, you can get used to the slower performance.

LogicalApex · Oct 31, 2024

Thanks. Those very helpful details.

BeingFree · Nov 1, 2024

I'm kind of wondering the same thing. What's the likely speed diff inferencing between m4 pro and m4 max? How large a model can you handle with 36 or 48 gig? Is 1tb enough storage to carry around?

The m4 pro with 48gig 1tb storage looked like a good medium spec, about $2600, how far can you go with this, would faster CPU be significantly better, or more storage be needed than 1tb? link.

Or get the mini m4 pro w 64gb for $2200. It's a lot more ran than the laptop for the price. Could get a smaller laptop and remote to it. Don't know enough to know ram or cpu needed.

komuh · Nov 1, 2024

BeingFree said:
I'm kind of wondering the same thing. What's the likely speed diff inferencing between m4 pro and m4 max? How large a model can you handle with 36 or 48 gig? Is 1tb enough storage to carry around?

The m4 pro with 48gig 1tb storage looked like a good medium spec, about $2600, how far can you go with this, would faster CPU be significantly better, or more storage be needed than 1tb? link.

Or get the mini m4 pro w 64gb for $2200. It's a lot more ran than the laptop for the price. Could get a smaller laptop and remote to it. Don't know enough to know ram or cpu needed.

36GB is a sufficient amount of memory for most AI applications. You can easily get 60-70 billion parameters with 64GB and low quant.
With 128GB, you can get over 70 billion parameters. If you’re a big AI enthusiast, I would recommend waiting for the M4 Ultra or buying the Max with 128GB as a future-proof option. However, the GPU is still quite slow if you want “real-time” interactions with models larger than 70 billion parameters. In such cases, 64GB can be an optimal choice.

AirpodsNow · Nov 1, 2024

BeingFree said:
I'm kind of wondering the same thing. What's the likely speed diff inferencing between m4 pro and m4 max? How large a model can you handle with 36 or 48 gig? Is 1tb enough storage to carry around?

There are benchmarks available to give an idea of performance between all the apple silicon chips so far https://github.com/ggerganov/llama.cpp/discussions/4167

The main speed determination is bandwidth and gpu (it’s highly correlated with apple silicon, they both increase).

The size of the model depends on how much unified memory is available for the gpu. By default it can assign 75% of all unified memory to gpu. So that should give you the max size of the llm model. One can change this percentage with a terminal command. So LLM models that one can run won’t be that large due to limited memory, so it would fit those 1tb ssd.

How fast exactly the m4 series will run is to be seen, but you can see benchmarks of the previous versions quite clearly.

Search

Search

MacBook Pro LLM Performance

LogicalApex

macrumors 68000

komuh

Suspended

LogicalApex

macrumors 68000

BeingFree

macrumors newbie

komuh

Suspended

AirpodsNow

macrumors regular

Our Staff