I got the mini with 14 CPU cores, 20 GPU cores, 64 GB of RAM, and 2TB of storage. I'm really glad I didn't go for a model with less memory because I wouldn't have been able to run large language models locally.
Now, I can use LMStudio to run both the standard version and the coder version of Qwen2.5, which has 32 billion parameters. The inference speed is pretty good, around 11-12 tokens per second, which works well for real tasks. It's great that I can keep both models in memory all the time to avoid delays before each response.
I use LLMs a lot and now I prefer using local models for general tasks and coding—I even switched from Github Copilot. I only go to paid models on OpenRouter when the Qwen models don't give me the results I want.
Anyone else running LLMs locally on Apple Silicon Macs?
Now, I can use LMStudio to run both the standard version and the coder version of Qwen2.5, which has 32 billion parameters. The inference speed is pretty good, around 11-12 tokens per second, which works well for real tasks. It's great that I can keep both models in memory all the time to avoid delays before each response.
I use LLMs a lot and now I prefer using local models for general tasks and coding—I even switched from Github Copilot. I only go to paid models on OpenRouter when the Qwen models don't give me the results I want.
Anyone else running LLMs locally on Apple Silicon Macs?