Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

pksv

Suspended
Original poster
Mar 12, 2024
628
1,432
There are a ton of tests out there of LLMs performance on Macs but have not found a single one properly done of image generation.

Would greatly appreciate anyone that have M4M or/and M3U to test it.
Easiest, simplest would be to install Draw Things and check generation speed of these models:
- SDXL Base (v1.0)
- Flux.1 [dev]
- Flux.1 [schnell]
- Wan 2.1 I2V 14B 720p

Thank you!
 
  • Like
Reactions: picpicmac
I wouldn't use Draw Things, it's frankly pretty ******. Stability Matrix will set up any of the major platforms automatically with MPS support:

As for performance compared to the CUDA backend for pytorch: it's bad. If this is what you mainly intend to do then using the money for a 3090/4090 is a better use. It's anywhere from 2x to 4x slower depending on the LORAs you're using, from my experience. (I have an M4 Max Studio with 128GB of ram, and a system with an RTX4090 FE fwiw)
 
  • Like
Reactions: pksv
Draw Things, it's frankly pretty ******.
On my x86 Mac it tends to crash.

And for any software, frankly, if anything depends upon Python then that thing is not optimized for speed.

Anyone who wants to get the fastest performance out of Apple Silicon needs to use Swift (or something like Juila or even Fortran (!!)), and use the Apple supplied software.

At the current street prices for a 5090, a top-performing AMD base system w/ one 5090 costs more than an M3 Ultra.
 
  • Like
Reactions: pksv
As for performance compared to the CUDA backend for pytorch: it's bad. If this is what you mainly intend to do then using the money for a 3090/4090 is a better use. It's anywhere from 2x to 4x slower depending on the LORAs you're using, from my experience. (I have an M4 Max Studio with 128GB of ram, and a system with an RTX4090 FE fwiw)
You're right of course, Nvidia is unmatched when it comes to diffusion models. However, this machine will also be used for other things, so it has to be a Mac, and I want to understand how far it is from, for example, a 4090.
If we can at least get half as close to, for example, 3090, that would be enough for me.

I wouldn't use Draw Things, it's frankly pretty ******. Stability Matrix will set up any of the major platforms automatically with MPS support:
Thank you for recommendation, I will try that.
 
On my x86 Mac it tends to crash.

And for any software, frankly, if anything depends upon Python then that thing is not optimized for speed.

Anyone who wants to get the fastest performance out of Apple Silicon needs to use Swift (or something like Juila or even Fortran (!!)), and use the Apple supplied software.

At the current street prices for a 5090, a top-performing AMD base system w/ one 5090 costs more than an M3 Ultra.
It's true, you should use Apple's MLX. In contrast to the already quite large amount of LLMs converted to MLX there are very little diffusion models converted to it. I think some solution from exolabs is an exception.

Speaking about Draw Things I wanted it to be simple for someone who is willing to check the performance for me but doesn't have the time or desire for more complicated workflows/installations. DT is literally a few clicks and anyone can test it.

At the current street prices for a 5090, a top-performing AMD base system w/ one 5090 costs more than an M3 Ultra.
Yes, I've checked that, it is absolutely wild. I don't see the point of buying 5090 at this point in time, when it is practically guaranteed that prices will probably drop by half in a few months/half a year.
Hence my thinking that I'd rather buy a whole computer like MS M4M/M3U instead, despite the maybe worse performance, and later buy a 5090. If I will need this higher speed and performance for a day or two, there are always cloud solutions like runpod, but the my base will be covered.
 
Last edited:
Here is the same exact prompt and settings on my Studio M4 Max (40 core GPU, 128GB ram) vs my system with a 4090. This is IllustriousXL with 5 LORAs and fairly large so it's a pretty good representation:

Screenshot 2025-04-03 at 2.44.21 AM.png


4090:
Screenshot 2025-04-03 at 2.42.10 AM.png

M4 Max:
Screenshot 2025-04-03 at 2.38.44 AM.png
 
  • Love
Reactions: pksv
Here is the same exact prompt and settings on my Studio M4 Max (40 core GPU, 128GB ram) vs my system with a 4090. This is IllustriousXL with 5 LORAs and fairly large so it's a pretty good representation:

View attachment 2498690

4090:
View attachment 2498691
M4 Max:
View attachment 2498692
Thank you sooo much 🙌🙏🙏
I see batch size 8, so for one image around 11 seconds on 4090 and 42 seconds M4M.
Wow, this is indeed v bad. Even half speed would be acceptable for me but not 1/4. The waiting time will add up very quickly.

Edit: can you share your prompt and all the loras? So I might try it on M1M.

I can't find it now, not sure if it was exolabs or something else, but I've seen generation with Flux converted to MLX and the speeds were much much better.
 
it is practically guaranteed that prices will probably drop by half in a few months/half a year.
Hence my thinking that I'd rather buy a whole computer like MS M4M/M3U instead, despite the maybe worse performance, and later buy a 5090.
Well, there's a thread over in the Politics forum about tariffs... and that's all I'll say about expecting a price drop (of any consequence at least in the US.)

The M3 Ultra is still faster in computer if what you're doing can be parallelized. Test numbers I've seen on array multiplication have shown that.

Unfortunately, the diffusion software isn't optimized for Apple Silicon, and certainly not for an Ultra.
 
  • Like
Reactions: pksv
Speaking about Draw Things I wanted it to be simple for someone who is willing to check the performance for me but doesn't have the time or desire for more complicated workflows/installations.
Are you involved with Draw Things?
 
Here is the same exact prompt and settings on my Studio M4 Max (40 core GPU, 128GB ram) vs my system with a 4090. This is IllustriousXL with 5 LORAs and fairly large so it's a pretty good representation:

View attachment 2498690

4090:
View attachment 2498691
M4 Max:
View attachment 2498692
Except that you are comparing against an M4 Max limited to 128 GB RAM. Apple's Unified Memory Architecture will likely take advantage of much more than 128 GB of RAM. If true, then comparing against an M3U with lots of RAM will be a more relevant comparison. [I do know that these devices are all pricey, so all the info we can get on different systems is greatly appreciated!]
 
For image generation my 4090 Linux workstation smokes the m4 max. Nvidia falls short for anything that needs more than 24 GB VRAM.
 
I wouldn't use Draw Things, it's frankly pretty ******. Stability Matrix will set up any of the major platforms automatically with MPS support:

As for performance compared to the CUDA backend for pytorch: it's bad. If this is what you mainly intend to do then using the money for a 3090/4090 is a better use. It's anywhere from 2x to 4x slower depending on the LORAs you're using, from my experience. (I have an M4 Max Studio with 128GB of ram, and a system with an RTX4090 FE fwiw)
may i ask which model your 4090 computer is? I ordered M4 max studio but im not sure if its the best option as i need alot of power to do my statistical simulation
 
Except that you are comparing against an M4 Max limited to 128 GB RAM.
Literally has nothing to do with having more memory, the RTX 4090 only has 24GB VRAM
may i ask which model your 4090 computer is? I ordered M4 max studio but im not sure if its the best option as i need alot of power to do my statistical simulation
It's just a custom PC with a 4090, but you can get them from Dell/Alienware or pretty much any other gaming prebuilt with a 5090 or a 4090 still:
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.