Anyone tried LM Studio or DiffusionBee on their Macs? Quite fun.

AirpodsNow · Sep 22, 2024

I went down a rabbit hole about local LLM. It seems that the new apple silicon is especially well suited to run these due to the unified memory (able to assign large amount of 'ram' to the GPU) and high(er than normal pc) bandwidth.

These days it's rather easy to use these LLM without touching any terminal work with apps like LM Studio (text) and Diffusion Bee (picture/images). Within the apps you can download general or specialized models that can aid you to create or respond to your requests. It has been rather fun to try it out. (It's like chatgpt but you can choose a model that is especially suited for a certain coding language or creating pictures as real life as possible or focus on real animals).

I am trying to swap my 16" to a 14" MPB. Initially trying to figure out whether to go for a Pro or Max with 30GB+ ram. Now knowing that these models can be quite useful running locally, I am suddenly more focused on the highest bandwidth and as much ram as I can/want to afford.

senttoschool · Sep 23, 2024

Yes, it's fun to play with but not a very good experience.

Yes, unified memory is an advantage for LLMs but if you want to run a bigger model such as 70b, it's quite slow at 5-8 tokens/s with substantial time to first token.

If you want to run a smaller model, it's well, not as good.

Hopefully Apple is planning much more support for local LLMs in the future by better software support (MLX backend is a start) and more chip optimization for LLMs.

TechnoMonk · Sep 23, 2024

Llama 70B runs great on my 64 GB M1 Max. My Linux workstation with 4090 runs out of memory unless you use a wuantized model. Diffusion bee is a gimmick, pretty useless.

AirpodsNow · Sep 24, 2024

TechnoMonk said:
Llama 70B runs great on my 64 GB M1 Max. My Linux workstation with 4090 runs out of memory unless you use a wuantized model. Diffusion bee is a gimmick, pretty useless.

I will be looking for a MacBook with at least 64gb, knowing how this is rather a hard limit what models you can use.

Diffusion bee gimmick because it’s limited in parameter and use? Can you explain? I also downloaded draw things that was mentioned quite frequently

AirpodsNow · Sep 24, 2024

senttoschool said:
Yes, it's fun to play with but not a very good experience.

Yes, unified memory is an advantage for LLMs but if you want to run a bigger model such as 70b, it's quite slow at 5-8 tokens/s with substantial time to first token.

If you want to run a smaller model, it's well, not as good.

Hopefully Apple is planning much more support for local LLMs in the future by better software support (MLX backend is a start) and more chip optimization for LLMs.

People are expecting some kind of feature in the m4 that would help. I was wondering what can they do except introducing a higher soc bandwidth? Isn’t that really the main bottleneck (number ram and gpu can be chosen by customers wallet size)

senttoschool · Sep 24, 2024

AirpodsNow said:
People are expecting some kind of feature in the m4 that would help. I was wondering what can they do except introducing a higher soc bandwidth? Isn’t that really the main bottleneck (number ram and gpu can be chosen by customers wallet size)

I doubt there's any new feature in M4 that would help. I think Apple just boosted the NPU, which is not normally used for larger local LLMs.

I would like to see if there's a way to combine the AMX, NPU, and GPU units in the SoC together to do LLM inference.

senttoschool · Sep 24, 2024

TechnoMonk said:
Llama 70B runs great on my 64 GB M1 Max. My Linux workstation with 4090 runs out of memory unless you use a wuantized model. Diffusion bee is a gimmick, pretty useless.

What's time to first token like? And tokens/second? What quant?

TechnoMonk · Sep 24, 2024

senttoschool said:
What's time to first token like? And tokens/second? What quant?

4090 has 24 GB memory limitations. It’s my secondary machine now. Just hoping Apple comes with at least 256-512 GB RAM for m4 max/m4 ultra. For most of my use cases tokens/sec on 4090 is a big zero.

TechnoMonk · Sep 24, 2024

AirpodsNow said:
I will be looking for a MacBook with at least 64gb, knowing how this is rather a hard limit what models you can use.

Diffusion bee gimmick because it’s limited in parameter and use? Can you explain? I also downloaded draw things that was mentioned quite frequently

Diffusion bee is not good qualitatively speaking. 64 GB should be good enough for experimenting with open source LLm and vision models. I am just waiting for M4 Max or even M5 max to get to 256 GB. That will save me ton of money for my custom model inference testing. Apple still need to add support for MLX neural engine, but it’s not a show stopper as o don’t use ANE.

AirpodsNow · Sep 25, 2024

TechnoMonk said:
64 GB should be good enough for experimenting with open source LLm and vision models.

do you think that is true with where the open source LLM is heading to? Since I am new to the subject, I did read some posts saying how certain llm size that were great for 32GB ram configs, were slowly diminishing. Is that also true for the 'next' level?

Also, if you say 64GB, do you mean 64GB only for llm, or 70% of it how MacBooks seems to assign as a maximum to GPU?

I would be happy to get a 64GB if that is enough, because it seems on the current m3 max the next option is 128GB, whereas the 96GB is on the 'slower' 300GB/s bandwidth. I assume something like that will be also for the M4 Max.

Algr · Sep 25, 2024

AirpodsNow said:
I will be looking for a MacBook with at least 64gb, knowing how this is rather a hard limit what models you can use.

Diffusion bee gimmick because it’s limited in parameter and use? Can you explain? I also downloaded draw things that was mentioned quite frequently

DiffusionBee and DrawThings both come with a default model that is pretty useless. That frustrated me for a while. But you can get far better results by downloading better models into those programs. Flux (Shnell) is the best right now. Some of the 1.5 models are good for specific tasks.

AirpodsNow · Sep 25, 2024

Algr said:
DiffusionBee and DrawThings both come with a default model that is pretty useless. That frustrated me for a while. But you can get far better results by downloading better models into those programs. Flux (Shnell) is the best right now. Some of the 1.5 models are good for specific tasks.

I’m using Flux model now. I am just using the app as an easy gui. Similar to lm studio, although it seems more complicated with transformer lab. I wonder if their is also a gui app for sound with loading local llm.

Search

Search

Anyone tried LM Studio or DiffusionBee on their Macs? Quite fun.

AirpodsNow

macrumors regular

senttoschool

macrumors 68030

TechnoMonk

macrumors 68030

AirpodsNow

macrumors regular

AirpodsNow

macrumors regular

senttoschool

macrumors 68030

senttoschool

macrumors 68030

TechnoMonk

macrumors 68030

TechnoMonk

macrumors 68030

AirpodsNow

macrumors regular

Algr

macrumors 6502a

AirpodsNow

macrumors regular

Our Staff