What is the best LLM I can run on M4 Pro with 64 GB of memory?

VitoBotta · Saturday at 4:03 AM

I'm getting my new M4 Pro/64GB next week. Right now, with my M3 Pro/36GB, Qwen2.5-32b-Q4 is the best model I can run. The responses are solid and it's pretty fast too, so it’s working out well on this setup. Wondering if the M4 Pro will let me run an even better model?

Gnattu · Saturday at 4:09 AM

Theoretically you can run 70B models with Q4 on it but ultimately you will get bottlenecked by the memory bandwidth for token generation speed and if that is useable to you could be a personal preference.

VitoBotta · Saturday at 4:18 AM

Gnattu said:
Theoretically you can run 70B models with Q4 on it but ultimately you will get bottlenecked by the memory bandwidth for token generation speed and if that is useable to you could be a personal preference.

Would running Llama 3.1 70B Q4 on my new Mac be better or worse compared to Qwen2.5-32B because of the heavy quantization? Which model do you use, and what are your machine specs? Just curious.

Gnattu · Saturday at 9:12 AM

VitoBotta said:
Would running Llama 3.1 70B Q4 on my new Mac be better or worse compared to Qwen2.5-32B because of the heavy quantization? Which model do you use, and what are your machine specs? Just curious.

The larger the model is the quantization loss is less noticeable so the Llama 3.1 70B would generally better. I'm trying deepseek recently but that one requires like 128GB and more RAM to run so cannot recommend that to you

VitoBotta · Saturday at 9:30 AM

I really like the Qwen model. Seems to perform pretty well

Search

Search

What is the best LLM I can run on M4 Pro with 64 GB of memory?

VitoBotta

macrumors 6502a

Gnattu

macrumors 65816

VitoBotta

macrumors 6502a

Gnattu

macrumors 65816

VitoBotta

macrumors 6502a

Our Staff