Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
BTW, what's the size of the full largest version of Qwen-Coder? Is it 32B? Or is it 70B? This is the model I'm most interested in at the moment for running it in my also new M4 Max Studio with 128GB. I have not tried to run LLMs yet because I want to first build llama.cpp myself (I'll be using llama.cpp only, I'm a C/C++ guy --besides, if something works in LM Studio, it must also work in llama.cpp if the proper settings are used).
32B
 
  • Like
Reactions: asiga
I’d love to be in that field. I’ve seen how much Nvidia GPUs go for and can only imagine what the cost must run to get a system with sufficient VRAM.

This is why I’m amazed at what we can actually do with the Mac Studio (I go back to the ZX81!). I’m always trying to push to the edge of what the machine I have is capable of. (I still recall pushing that old ZX81 to fill up its 16K RAM pack - for no better reason than “I have to try"!)

For context, my budget for my Studio was originally around £3,200 - and I was kind of hoping 128GB would come in around there. Pushing to £3,800 for 128GB (plus 1TB SSD) went right to the edge of my comfort zone. If the M3 Ultra had started at 128GB, I may *just* have pushed to the £4,200 price (but I’d be arguing with my bank balance every step of the way).

In the end, I’m happy with what I have. 128GB on an M4 Max is of more use to me than 96GB on an M3 Ultra, I feel. I think it’s the right Studio for where I’m at right now. Spending more would just put me in a struggling position financially and, for what amounts to little more than a “hobby” combined with “learning experience” right now, I can’t justify that.
You say "128GB on an M4 Max is of more use to me than 96GB on an M3 Ultra, I feel," which makes total sense. But why would anyone buy an M3 Ultra and hamstring it with only 96 GB RAM when 500 GB is available? IMO in addition to GPU cores of course, the point of an M3 Ultra purchase would be for the available additional RAM and memory bandwidth. Especially for LLMs. Serious question, because I currently have an M2 Max with 96 GB RAM.
 
Last edited:
You say "128GB on an M4 Max is of more use to me than 96GB on an M3 Ultra, I feel," which makes total sense. But why would anyone buy an M3 Ultra and hamstring it with only 96 GB RAM when 500 GB is available? IMO in addition to GPU cores of course, the point of an M3 Ultra purchase would be for the available additional RAM and memory bandwidth. Especially for LLMs. Serious question, because I currently have an M2 Max with 96 GB RAM.
I think you’ve nailed it there. For me, if the M3 Ultra base model had come in with 128GB and was the same price as the base model with 96GB is now, I might have squeezed the extra to pay for it (the extra cores would be worth a couple of hundred quid more). I couldn’t go any more expensive than that, because I’m already at the edge of my budget (for me this is currently more of a hobby to feed my “computer-geek” inner self). Any pricier and it gets too uncomfortable. Going to the next level up (256GB) is well out of my league, and the cost for 512GB is just some other planet.

If I was in the position that money wasn’t my primary concern, I’m with you - the M3U with 512GB would be a beast, and it would be all I’d be looking at. 96GB makes no sense. I also don’t understand the purpose of the base M4 Max Studio with just 36GB RAM either. I’m guessing they just want to hit a price point to tempt those who are looking at the Mini with upgrades, but I still don’t understand who’s buying it.
 
  • Like
Reactions: Allen_Wentz
I’m guessing they just want to hit a price point to tempt those who are looking at the Mini with upgrades, but I still don’t understand who’s buying it.
The base M3 Ultra Studio is fine with 96GB of RAM for those who are just going to have it sit and do video all day. There are people who just need to do video production, and 96GB is fine for that job.
 
@JSRinUK -- so now that you've had your studio for a bit, which models are you going back to again and again?

With the recent Gemma 3 and Mistral 3.1, I'm wondering if I should just grab a 64gb Mac mini and call it a day, or keep my options open get a the Studio that you've got.
 
The base M3 Ultra Studio is fine with 96GB of RAM for those who are just going to have it sit and do video all day. There are people who just need to do video production, and 96GB is fine for that job.
Agreed. I've had the base M3 Ultra for about a week now and it's been fantastic at the video work I've thrown at it whether it's Blender, After Effects, Premiere or Topaz Video AI. Since I'm not focused on using LLMs that much, 96 GB RAM has been sufficient so far for my needs.
 
The base M3 Ultra Studio is fine with 96GB of RAM for those who are just going to have it sit and do video all day. There are people who just need to do video production, and 96GB is fine for that job.
You missed the previous sentence that provided context to the one you quoted, re:
I also don’t understand the purpose of the base M4 Max Studio with just 36GB RAM either. I’m guessing they just want to hit a price point to tempt those who are looking at the Mini with upgrades, but I still don’t understand who’s buying it.
I totally get that if you need all the extra cores and you’re being price-conscious, the 96GB M3U hits the spot. If the sacrifice of 32GB of RAM for the extra cores works for you then, of course, it’s the right machine for you. For my use case, LLMs/AI, I wouldn’t want to sacrifice the RAM for that. That’s all I’m really saying.
 
@JSRinUK -- so now that you've had your studio for a bit, which models are you going back to again and again?

With the recent Gemma 3 and Mistral 3.1, I'm wondering if I should just grab a 64gb Mac mini and call it a day, or keep my options open get a the Studio that you've got.
I haven’t done a lot of model-comparison just yet. My aim for this machine was to use the largest model I can with a decent context-length and without heavy quantisation (hence Q6, not Q4), and that’s kind of what I’ve been focussing on.

At the moment, my 128GB Studio hits its limit using the 104B model quantised to Q6 and a max context length of 32,000 (when using llama.cpp - closer to 25,000 if using LM Studio with guardrails removed). With other system processes and apps, I see Memory Used at up to 120GB of the 128GB - but, most importantly, no swap.

Ollama seems to have an issue downloading models for me at the moment, so I’ve not been looking at different models through that. I’ll probably compare models through LM Studio later or, if I polish up my own GUI, maybe just with llama.cpp.

I will be looking at the Q4 version of the model I’m using to see if that allows me to open up a much larger context length - but even Q4 for this 104B model would be too much for a 64GB machine.

After this weekend, I won’t have a lot of free time for a couple of weeks so I doubt I’ll be doing any comparing in the near future.

It all comes down to budget and use case. 64GB will still allow you to run a lot of models - 70B models at Q4 would probably be the sweet spot. I think gemma3:27B on Ollama is Q4 and comes in at just 17GB. I’d imagine that would run fine on the base spec Studio (with 36GB RAM) if that’s of interest.

As for Mistral, I see the 123B model of Mistral Large 2 (not Mistral 3.1) on Ollama at Q4 shows 77GB. It might be fun to try a comparison of 123B parameters at Q4 vs 104B parameters at Q6 on my 128GB. I’ve not seen Mistral 3 yet.

The additional benefit of the 128GB RAM is that it allows you to have multiple apps running even when the LLM is a large model (say, 70B). If you won’t be doing much of that, then it’s a non-issue.

As you can tell by this rambling post, I haven’t compared models much right now. My only comparisons have been over the last few months on my 24GB MacBook Pro.

One aside - if you’re looking at using large models, consider your SSD size. With LLMs and image models, the 1TB on my MacBook Pro started to fill up quickly (so 512GB would have been a problem). I couldn’t go higher than 1TB on my Studio, but I’ve been storing most of my models on a couple of external 1TB SSDs. LM Studio, Msty, and llama.cpp are happy to use them from the external SSD (but consider initial load-up times if your external drive is slow). Once the model is loaded into memory, where they came from is not an issue. Even when a model is “unloaded”, it's stored in Cached Files for as long as you’re not doing much else so there’s no big delay when running subsequent queries.
 
One aside - if you’re looking at using large models, consider your SSD size. With LLMs and image models, the 1TB on my MacBook Pro started to fill up quickly (so 512GB would have been a problem). I couldn’t go higher than 1TB on my Studio, but I’ve been storing most of my models on a couple of external 1TB SSDs. LM Studio, Msty, and llama.cpp are happy to use them from the external SSD (but consider initial load-up times if your external drive is slow). Once the model is loaded into memory, where they came from is not an issue. Even when a model is “unloaded”, it's stored in Cached Files for as long as you’re not doing much else so there’s no big delay when running subsequent queries.

I had to sacrifice SSD size to afford the ram upgrade in my Studio. Luckily an external NVMe works super quick, I can load large models in a couple of seconds.
 
  • Like
Reactions: JSRinUK
I had to sacrifice SSD size to afford the ram upgrade in my Studio. Luckily an external NVMe works super quick, I can load large models in a couple of seconds.
I’m just using a couple of Crucial X9 Pro 1TB SSDs. As they’re mostly for loading LLM models, I only notice the speed when I’m waiting for 85GB to load into RAM.
 
Can studio owners report the T/S speed and memory usage of Ollama Gemma 3 27b, comparing the default Q4 model with Q8 and FP16? Maybe also comment if the output improves enough to warrant running the larger ones?
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.