M4M and M3U for image generation speed (SD, Flux, etc)

pksv · 2025-04-02T06:30:10-0700

There are a ton of tests out there of LLMs performance on Macs but have not found a single one properly done of image generation.

Would greatly appreciate anyone that have M4M or/and M3U to test it.
Easiest, simplest would be to install Draw Things and check generation speed of these models:
- SDXL Base (v1.0)
- Flux.1 [dev]
- Flux.1 [schnell]
- Wan 2.1 I2V 14B 720p

Thank you!

Slartibart · 2025-04-02T13:55:41-0700

you should supply a seed, base model, resolution, additional parameters… maybe even a prompt.

ondioline · 2025-04-02T16:41:51-0700

I wouldn't use Draw Things, it's frankly pretty ******. Stability Matrix will set up any of the major platforms automatically with MPS support:

GitHub - LykosAI/StabilityMatrix: Multi-Platform Package Manager for Stable Diffusion

Multi-Platform Package Manager for Stable Diffusion - LykosAI/StabilityMatrix

github.com

As for performance compared to the CUDA backend for pytorch: it's bad. If this is what you mainly intend to do then using the money for a 3090/4090 is a better use. It's anywhere from 2x to 4x slower depending on the LORAs you're using, from my experience. (I have an M4 Max Studio with 128GB of ram, and a system with an RTX4090 FE fwiw)

picpicmac · 2025-04-02T20:12:53-0700

ondioline said:
Draw Things, it's frankly pretty ******.

On my x86 Mac it tends to crash.

And for any software, frankly, if anything depends upon Python then that thing is not optimized for speed.

Anyone who wants to get the fastest performance out of Apple Silicon needs to use Swift (or something like Juila or even Fortran (!!)), and use the Apple supplied software.

At the current street prices for a 5090, a top-performing AMD base system w/ one 5090 costs more than an M3 Ultra.

pksv · 2025-04-03T00:08:04-0700

ondioline said:
As for performance compared to the CUDA backend for pytorch: it's bad. If this is what you mainly intend to do then using the money for a 3090/4090 is a better use. It's anywhere from 2x to 4x slower depending on the LORAs you're using, from my experience. (I have an M4 Max Studio with 128GB of ram, and a system with an RTX4090 FE fwiw)

You're right of course, Nvidia is unmatched when it comes to diffusion models. However, this machine will also be used for other things, so it has to be a Mac, and I want to understand how far it is from, for example, a 4090.
If we can at least get half as close to, for example, 3090, that would be enough for me.

ondioline said:
I wouldn't use Draw Things, it's frankly pretty ******. Stability Matrix will set up any of the major platforms automatically with MPS support:

GitHub - LykosAI/StabilityMatrix: Multi-Platform Package Manager for Stable Diffusion

Multi-Platform Package Manager for Stable Diffusion - LykosAI/StabilityMatrix

github.com

Thank you for recommendation, I will try that.

pksv · 2025-04-03T00:08:40-0700

Slartibart said:
you should supply a seed, base model, resolution, additional parameters… maybe even a prompt.

Excellent idea to include everything, thank you, I'll do it and edit the original post

pksv · 2025-04-03T00:13:19-0700

picpicmac said:
On my x86 Mac it tends to crash.

And for any software, frankly, if anything depends upon Python then that thing is not optimized for speed.

Anyone who wants to get the fastest performance out of Apple Silicon needs to use Swift (or something like Juila or even Fortran (!!)), and use the Apple supplied software.

At the current street prices for a 5090, a top-performing AMD base system w/ one 5090 costs more than an M3 Ultra.

It's true, you should use Apple's MLX. In contrast to the already quite large amount of LLMs converted to MLX there are very little diffusion models converted to it. I think some solution from exolabs is an exception.

Speaking about Draw Things I wanted it to be simple for someone who is willing to check the performance for me but doesn't have the time or desire for more complicated workflows/installations. DT is literally a few clicks and anyone can test it.

picpicmac said:
At the current street prices for a 5090, a top-performing AMD base system w/ one 5090 costs more than an M3 Ultra.

Yes, I've checked that, it is absolutely wild. I don't see the point of buying 5090 at this point in time, when it is practically guaranteed that prices will probably drop by half in a few months/half a year.
Hence my thinking that I'd rather buy a whole computer like MS M4M/M3U instead, despite the maybe worse performance, and later buy a 5090. If I will need this higher speed and performance for a day or two, there are always cloud solutions like runpod, but the my base will be covered.

ondioline · 2025-04-03T01:01:07-0700

Here is the same exact prompt and settings on my Studio M4 Max (40 core GPU, 128GB ram) vs my system with a 4090. This is IllustriousXL with 5 LORAs and fairly large so it's a pretty good representation:

4090:

M4 Max:

pksv · 2025-04-03T01:46:43-0700

ondioline said:
Here is the same exact prompt and settings on my Studio M4 Max (40 core GPU, 128GB ram) vs my system with a 4090. This is IllustriousXL with 5 LORAs and fairly large so it's a pretty good representation:

View attachment 2498690

4090:
View attachment 2498691
M4 Max:
View attachment 2498692

Thank you sooo much 🙌🙏🙏
I see batch size 8, so for one image around 11 seconds on 4090 and 42 seconds M4M.
Wow, this is indeed v bad. Even half speed would be acceptable for me but not 1/4. The waiting time will add up very quickly.

Edit: can you share your prompt and all the loras? So I might try it on M1M.

I can't find it now, not sure if it was exolabs or something else, but I've seen generation with Flux converted to MLX and the speeds were much much better.

picpicmac · 2025-04-03T04:42:30-0700

pksv said:
it is practically guaranteed that prices will probably drop by half in a few months/half a year.
Hence my thinking that I'd rather buy a whole computer like MS M4M/M3U instead, despite the maybe worse performance, and later buy a 5090.

Well, there's a thread over in the Politics forum about tariffs... and that's all I'll say about expecting a price drop (of any consequence at least in the US.)

The M3 Ultra is still faster in computer if what you're doing can be parallelized. Test numbers I've seen on array multiplication have shown that.

Unfortunately, the diffusion software isn't optimized for Apple Silicon, and certainly not for an Ultra.

picpicmac · 2025-04-03T04:44:12-0700

pksv said:
Speaking about Draw Things I wanted it to be simple for someone who is willing to check the performance for me but doesn't have the time or desire for more complicated workflows/installations.

Are you involved with Draw Things?

pksv · 2025-04-03T05:22:58-0700

picpicmac said:
Are you involved with Draw Things?

Oh no, no, I meant by choosing it in the main post as a form of benchmark, sorry.

picpicmac said:
Well, there's a thread over in the Politics forum about tariffs... and that's all I'll say about expecting a price drop (of any consequence at least in the US.)

I'm not based in US.

Allen_Wentz · 2025-04-03T09:51:27-0700

ondioline said:
Here is the same exact prompt and settings on my Studio M4 Max (40 core GPU, 128GB ram) vs my system with a 4090. This is IllustriousXL with 5 LORAs and fairly large so it's a pretty good representation:

View attachment 2498690

4090:
View attachment 2498691
M4 Max:
View attachment 2498692

Except that you are comparing against an M4 Max limited to 128 GB RAM. Apple's Unified Memory Architecture will likely take advantage of much more than 128 GB of RAM. If true, then comparing against an M3U with lots of RAM will be a more relevant comparison. [I do know that these devices are all pricey, so all the info we can get on different systems is greatly appreciated!]

Allen_Wentz · 2025-04-03T09:53:24-0700

pksv said:
Oh no, no, I meant by choosing it in the main post as a form of benchmark, sorry.

I'm not based in US.

The issue of tariffs impacts will not be limited to the USA.

TechnoMonk · 2025-04-03T10:25:18-0700

For image generation my 4090 Linux workstation smokes the m4 max. Nvidia falls short for anything that needs more than 24 GB VRAM.

heyapples · 2025-04-03T19:42:38-0700

ondioline said:
I wouldn't use Draw Things, it's frankly pretty ******. Stability Matrix will set up any of the major platforms automatically with MPS support:

GitHub - LykosAI/StabilityMatrix: Multi-Platform Package Manager for Stable Diffusion

Multi-Platform Package Manager for Stable Diffusion - LykosAI/StabilityMatrix

github.com

As for performance compared to the CUDA backend for pytorch: it's bad. If this is what you mainly intend to do then using the money for a 3090/4090 is a better use. It's anywhere from 2x to 4x slower depending on the LORAs you're using, from my experience. (I have an M4 Max Studio with 128GB of ram, and a system with an RTX4090 FE fwiw)

may i ask which model your 4090 computer is? I ordered M4 max studio but im not sure if its the best option as i need alot of power to do my statistical simulation

ondioline · 2025-04-03T21:16:56-0700

Allen_Wentz said:
Except that you are comparing against an M4 Max limited to 128 GB RAM.

Literally has nothing to do with having more memory, the RTX 4090 only has 24GB VRAM

heyapples said:
may i ask which model your 4090 computer is? I ordered M4 max studio but im not sure if its the best option as i need alot of power to do my statistical simulation

It's just a custom PC with a 4090, but you can get them from Dell/Alienware or pretty much any other gaming prebuilt with a 5090 or a 4090 still:

Alienware R16 Gaming Desktop with Air Cooling & Liquid Cooling | Dell USA

Shop the Alienware R16 Gaming Desktop featuring with the latest 12th/13th Gen Intel® Core™ processors and NVIDIA® GeForce RTX™ 40/AMD Radeon™ RX graphics.

www.dell.com

Alienware Area-51 Gaming Desktop

The most powerful Alienware desktop, Area-51’s innovative thermal architecture & innovative design create a legendary gaming experience.

www.dell.com

M4M and M3U for image generation speed (SD, Flux, etc)

Suspended

macrumors 68040

macrumors 6502

macrumors 68000

Suspended

Suspended

Suspended

macrumors 6502

Suspended

macrumors 68000

macrumors 68000

Suspended

macrumors 601

macrumors 601

macrumors 68040

macrumors newbie

macrumors 6502

Our Staff