Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

VitoBotta

macrumors 6502a
Original poster
Dec 2, 2020
888
346
Espoo, Finland
I'm getting my new M4 Pro/64GB next week. Right now, with my M3 Pro/36GB, Qwen2.5-32b-Q4 is the best model I can run. The responses are solid and it's pretty fast too, so it’s working out well on this setup. Wondering if the M4 Pro will let me run an even better model?
 

Gnattu

macrumors 65816
Sep 18, 2020
1,105
1,665
Theoretically you can run 70B models with Q4 on it but ultimately you will get bottlenecked by the memory bandwidth for token generation speed and if that is useable to you could be a personal preference.
 

VitoBotta

macrumors 6502a
Original poster
Dec 2, 2020
888
346
Espoo, Finland
Theoretically you can run 70B models with Q4 on it but ultimately you will get bottlenecked by the memory bandwidth for token generation speed and if that is useable to you could be a personal preference.

Would running Llama 3.1 70B Q4 on my new Mac be better or worse compared to Qwen2.5-32B because of the heavy quantization? Which model do you use, and what are your machine specs? Just curious.
 

Gnattu

macrumors 65816
Sep 18, 2020
1,105
1,665
Would running Llama 3.1 70B Q4 on my new Mac be better or worse compared to Qwen2.5-32B because of the heavy quantization? Which model do you use, and what are your machine specs? Just curious.
The larger the model is the quantization loss is less noticeable so the Llama 3.1 70B would generally better. I'm trying deepseek recently but that one requires like 128GB and more RAM to run so cannot recommend that to you
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.