I stepped away from local LLM for a little while. Once you’ve tested the big models, you need to have a real use for them to use them in earnest. For me it’s mostly about brainstorming my stories and, once you have your preferred model, you’re kind of already there.
I did do some “coding tests” a little while back in which I pitted frontier AI against large LLMs on my Mac. The results were not much to write about. The local LLMs pretty much failed. They would generate code, but the code would then generally fail. They create the functions you need for your code to work, but then don’t call them - because they’ve forgotten about them by the time it needs to use them. They may have long context, but they have short memory.
Even frontier AI didn’t fare much better. They’re generally so quick to go down the “popular path” of coding, that they trip right up when you tell them “I have a Mac, not CUDA-core stuff”.
You could work with the frontier AI, but you’d find yourself constantly returning to it with errors - often silly things that, if you already know a bit about coding, you can fix yourself. Sometimes it felt like I was teaching the AI - at which point, I’d call the test a failure.
The only one that showed any sign of help (despite not being completely flawless) was Claude Sonnet - I didn’t have Opus at the time (I do now).
Instead, I recently shifted my focus to image generation. As with local LLM, my primary goal is that everything should remain local - no “calling out to the web” once the models are downloaded. I had a poor experience with the “node hell” that was ComfyUI several months ago, so I’m avoiding that for now.
Instead, I’ve been leveraging
https://pypi.org/project/mflux/ which is an MLX port of several generative image models.
Mflux supports the following models:
- Z-Image Turbo,
- Flux.1 (including variants),
- FIBO,
- SeedVR2
- Qwen-Image (including Qwen-Image-Edit).
I see today that they’ve added Flux.2 recently (so I’m going to put that on my “to do list”).
With the help of Claude doing (most of) the coding, I’ve assembled an “image generation toolkit”, which consists of the following python “apps”:
- Prompt Workshop
- This sends your simple prompt (eg. “a helicopter hovering over a hot dog stand”) and leverages a local LLM to enhance it into something more elaborate, to assist the Image Generation model. I’m using a VLM so that I can also include an image, and the “enhanced prompt” will use that as a reference (colours, style, cinematography, etc)
- A second function of this app is simply to have a local LLM describe what it sees in an image I provide. So, if there's an image I like and I want to generate it elsewhere with some modifications, this will assist me with that.
- Multi-Model Image Generator
- The intention here is to take my image generation prompt and generate the image. I can select the size I want, plus alter other parameters, and whatever quantity of images I want (with the seed varying).
- I can also select from up to 6 different image generation models (all the ones mflux supports, but not Flux.2 yet), so that I can select my favourite.
- Multi-Model Image Editor
- Here, I provide one or two images and describe what I want to change in the image.
- I can select from up to 2 different image editing models. If the chosen model can accept both images, it'll use them. Otherwise it'll work on just the first image.
- Image Upscaler
- This does what it says on the tin. I provide an image, pick a size or scale factor, and I'm provided with an upscaled version of the image.
- Background Remover
- A new addition to the toolbox. I provide an image, and within a few seconds, it creates the same image with the background removed (or with a plain black, or plain white background). Optionally, the mask is also generated (which could be useful if I wanted to edit in a graphics app).
Just today, I got Claude to combine each of these scripts into a single “one click and forget” script. You put in your simple prompt, and it automatically gets the enhanced prompt, sends it to the image generator, upscales the image, and removes the background (with all images being saved at every stage).
It’s amazing what we can do on a desktop computer these days.