Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

VitoBotta

macrumors 6502a
Original poster
Dec 2, 2020
888
346
Espoo, Finland
I got the mini with 14 CPU cores, 20 GPU cores, 64 GB of RAM, and 2TB of storage. I'm really glad I didn't go for a model with less memory because I wouldn't have been able to run large language models locally.

Now, I can use LMStudio to run both the standard version and the coder version of Qwen2.5, which has 32 billion parameters. The inference speed is pretty good, around 11-12 tokens per second, which works well for real tasks. It's great that I can keep both models in memory all the time to avoid delays before each response.

I use LLMs a lot and now I prefer using local models for general tasks and coding—I even switched from Github Copilot. I only go to paid models on OpenRouter when the Qwen models don't give me the results I want.

Anyone else running LLMs locally on Apple Silicon Macs?
 
  • Like
Reactions: G5isAlive
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.