Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

TzunamiOSX

macrumors 65816
Original poster
Oct 4, 2009
1,067
445
Germany
I am playing a bit with Automatic1111 Stable Diffusion. At the moment, A1111 is running on M1 Mac Mini under Big Sur.

The performance is not very good.
Bildschirmfoto 2023-08-31 um 04.20.27.png

A picture with sees settings need around 5 min. 47 sec. (10.86s/it). My Mac Pro with Windows and an old Titan X give me a picture every 40 seconds.

It looks like GPU and Neural Engine are not working. Did I get a better speed when using a newer system?

Start setting are: ./webui.sh --skip-torch-cuda-test --precision full --no-half --medvram --opt-sub-quad-attention
 
Last edited:
  • Like
Reactions: RashyIsBack
A short feedback...

Same stable Automatic1111 Stable Diffusion with same settings.
I have updated the System to Ventura and now I get better results

Big Sur, Standard A1111: 5 min. 47 sec. (10.86s/it)
Ventura, Standard A1111: 1 min. 40 sec. (3.14s/it)

Ventura, Standard A1111: Slartibarts link 1 min. 19 sec. (2.64s/it)

Looks like the neural engine is not in use, only the GPU.
 
  • Like
Reactions: T'hain Esh Kelch
That's pretty dang slow. Have you tried Draw Things from the app store? Although I suppose Draw Things also uses similar back end
 
That's pretty dang slow. Have you tried Draw Things from the app store? Although I suppose Draw Things also uses similar back end
What is your system and how long do you need?

EDIT: Draw Things needs 1:42 under Monterey
 
Last edited:
M1 mini (base, not pro, not max) 16gb ram Ventura. 512x768 32 steps takes 1:16
This is the MPS optimization from Ventura. Monterey is a little bit slower. My personal problem is, that my Mac Pro from 2008 with a old Titan X Maxwell is up 2 to 3 times faster, My 2010 Mac Pro with Vega Frontier is at nearly the same speed as the M1 Under Windows. Im sure the 5,1 is faster under MacOS (Tests with my Mac Pro 2013), but I don’t have OpenCore on it.

Exactly values coming, when back to my room.
 
Last edited:
This ist the MPS optimization from Ventura. My personal problem ist, that my Mac Pro from 2008 with a old Titan X Maxwell is up 2 to 3 times faster, My 2010 Mac Pro with Vega Frontier is at nearly the same speed Under Windows. Im sure the 5,1 is faster under MacOS (Tests with my Mac Pro 2013), but I don’t have OpenCore on it.
Yeah SD on Silicon is way behind PCs with discrete GPUs, even older Macs with older GPUs, let alone anything like a 4070.
 
32 Sampling steps, 512x768

Mac Pro 2008, Windows 10, Titan X (Maxwell) 12 GB
Bildschirmfoto 2023-09-05 um 04.16.25.png


Mac Pro 2010, macOS Monterey, Vega Frontier 16 GB
2010.png


Mac Mini M1, macOS Monterey
Bildschirmfoto 2023-09-05 um 05.16.06.png


Mac Pro 2013 macOS Monterey, 1x D500, 3 GB
Ohne Titel.png


Mac Pro 2013 Windows 11, 1x D500, 3 GB
over 20 mins, make a test when I have the time
 
Last edited:
That doesn't look to bad. The Titan X is a dedicated GPU with a 250 TDP running CUDA code that has been optimized over the arc of a decade. The M1 GPU is what, 15 Watt?
 
That doesn't look to bad. The Titan X is a dedicated GPU with a 250 TDP running CUDA code that has been optimized over the arc of a decade. The M1 GPU is what, 15 Watt?

This is a GPU from 2015 with 28nm, so it looks not so nice.
 
I have added my Mac Pro 2010 who is also faster than my M1. On my 6,1 I can't finish the picture under Windows, because I got a low VRAM error.
 
Last edited:
Is here anyone with a Rx 5700 or newer on a Intel Mac, who can run automatic 1111? Hope to see results of newer GPUs.
 
Hey folks! I am a passionate DrawThings and Auto1111 WebUI user, next to Blender 4.2 and Davinic Resolve. I would like to keep this topic alive and get more comparisons, especially between the M1-M4 Max chips. I have the 16" with 24C M1 Max / 32GB and consider doing the upgrade to the 30C/36GB M3 Max (cheaper) or 32C/36GB M4 Max (much better NPU, faster h264/265 encoders could benefit me in Resolve).

I have two setups: DrawThings with a SD1.5 model and Auto1111 WebUI with a SDXL based one. Because I couldn't get that SDXL one running in DrawThings, eh. Also, I am aware that Flux became the newest ****, but I prefer to stay with the tools I know and love, especially for Anime and Furry stuff. Here are some results:

Auto1111 WebUI, SDXL (ChromaMixXL), Euler A (Beta), 25 Steps:

≈ 2:00min per image for 1024x1536 (regular power)
≈ 2:40min per image for 1024x1536 (low power mode)


DrawThings, SD1.5 (IndigoFurryMix), DPM++ 2M Karras, 20 Steps):

≈ 0:33min per image for 768x1152 (regular power)
≈ 0:40min per image for 768x1152 (low power)

≈ 1:00min per image for 1024x1536 (regular power)
≈ 1:20min per image for 1024x1536 (low power)


Package power draw for both apps and models is 30-35W in regular and 18-20W in low power mode. When doing multiple renders / batches, the fans start to spin in regular mode and stay around 40-50%, while remaining almost idle in low power mode. That makes "rendering overnight" pretty convenient.

I would be interested what the newer Max chips, especially the binned ones, use in package power during Stable Diffusion rendering, using the GPU. You can check it with TGPro or the terminal command sudo powermetrics, for anyone how would like to join this thread!
How to use low power mode on A1111?


Take a look at webui-forge.
Interface is nearly the same as A1111, but it is more optimized and GPU settings and it also supports flux.

Here is the interface:
Bildschirmfoto 2025-01-14 um 18.27.52.png
 
  • Like
Reactions: RashyIsBack
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.