@JSRinUK -- so now that you've had your studio for a bit, which models are you going back to again and again?
With the recent Gemma 3 and Mistral 3.1, I'm wondering if I should just grab a 64gb Mac mini and call it a day, or keep my options open get a the Studio that you've got.
I haven’t done a lot of model-comparison just yet. My aim for this machine was to use the largest model I can with a decent context-length and without heavy quantisation (hence Q6, not Q4), and that’s kind of what I’ve been focussing on.
At the moment, my 128GB Studio hits its limit using the 104B model quantised to Q6 and a max context length of 32,000 (when using llama.cpp - closer to 25,000 if using LM Studio with guardrails removed). With other system processes and apps, I see Memory Used at up to 120GB of the 128GB - but, most importantly, no swap.
Ollama seems to have an issue downloading models for me at the moment, so I’ve not been looking at different models through that. I’ll probably compare models through LM Studio later or, if I polish up my own GUI, maybe just with llama.cpp.
I will be looking at the Q4 version of the model I’m using to see if that allows me to open up a much larger context length - but even Q4 for this 104B model would be too much for a 64GB machine.
After this weekend, I won’t have a lot of free time for a couple of weeks so I doubt I’ll be doing any comparing in the near future.
It all comes down to budget and use case. 64GB will still allow you to run a lot of models - 70B models at Q4 would probably be the sweet spot. I think gemma3:27B on Ollama is Q4 and comes in at just 17GB. I’d imagine that would run fine on the base spec Studio (with 36GB RAM) if that’s of interest.
As for Mistral, I see the 123B model of Mistral Large 2 (not Mistral 3.1) on Ollama at Q4 shows 77GB. It might be fun to try a comparison of 123B parameters at Q4 vs 104B parameters at Q6 on my 128GB. I’ve not seen Mistral 3 yet.
The additional benefit of the 128GB RAM is that it allows you to have multiple apps running even when the LLM is a large model (say, 70B). If you won’t be doing much of that, then it’s a non-issue.
As you can tell by this rambling post, I haven’t compared models much right now. My only comparisons have been over the last few months on my 24GB MacBook Pro.
One aside - if you’re looking at using large models, consider your SSD size. With LLMs and image models, the 1TB on my MacBook Pro started to fill up quickly (so 512GB would have been a problem). I couldn’t go higher than 1TB on my Studio, but I’ve been storing most of my models on a couple of external 1TB SSDs. LM Studio, Msty, and llama.cpp are happy to use them from the external SSD (but consider initial load-up times if your external drive is slow). Once the model is loaded into memory, where they came from is not an issue. Even when a model is “unloaded”, it's stored in Cached Files for as long as you’re not doing much else so there’s no big delay when running subsequent queries.