Apple Silicon in AI (2023)

TechnoMonk · Mar 26, 2023

GrumpyCoder said:
Ah ok. Hey, maybe you should get a PhD in the field and a professorship at a leading university teaching this stuff. But of course it's always the others that have no idea.

I didn't say you're using FCP, I said the M series is optimised for workflow similar to FCP. You're using video workflow and maybe, just maybe you should check what exactly happens in these models and the model output. How many of these models have you created yourself? How many have you published at peer-reviewed conferences? None. You can't even get your 4090 going. 'Nuff said.

There is no A8000, only a A6000. The 8000 is a RTX8000, no A there. Good thing you know your stuff and don't have to rely on people who don't know what they're talking about. Oh wait...

For what is worth, I do have a PhD. Sure a typo of RTx 8000 typing on phone. Lol. I have checked my models, I have used my own custom models trained on A100.
It’s not hard to understand there are issues with Nvidia just like apple or any other vendor. Unlike most folks all I care is fixing my work flow. I use Apple silicon for certain tasks, Nvidia for others.

TechnoMonk · Mar 26, 2023

sunny5 said:
lol, who even use ColeML to run WebUI? Using GPU is the most fastest way to generate AI images and I have no idea what you are talking about?

You just posted screen shots of CPU cores being used by Automatic1111. Let’s see your GPU.

sunny5 · Mar 26, 2023

TechnoMonk said:
You just posted screen shots of CPU cores being used by Automatic1111. Let’s see your GPU.

Seriously, do you even care to check GPU history at all with blue bars? You did not. Also, CPU ALWYAS use with or without WebUI so you just prove yourself ignorant. 2 efficiency cores are really meaningless and do you really think2 cores is enough to generate images?

Since you don't know what you are talking about from the beginning, I'll just ignore you as you are wasting my precious time on AI.

TechnoMonk · Mar 26, 2023

sunny5 said:
Seriously, do you even care to check GPU history at all with blue bars? You did not. Also, CPU ALWYAS use with or without WebUI so you just prove yourself ignorant. Since you don't know what you are talking about from the beginning, I'll just ignore you as you are wasting my precious time on AI.

Ok. I was on phone, and it wasn't clear in the pic till I flipped it to landscape mode. how long did it take you to generate the image? Looks like automatic1111 is poorly optimized, if at all for apple silicon. How much memory do you have? WHy do I use Coreml, coz I have my own inference, and use dynamic batching.

Poor Performance:

Currently GPU acceleration on macOS uses a lot of memory. If performance is poor (if it takes more than a minute to generate a 512x512 image with 20 steps with any sampler) first try starting with the --opt-split-attention-v1 command line option (i.e. ./webui.sh --opt-split-attention-v1) and see if that helps. If that doesn't make much difference, then open the Activity Monitor application located in /Applications/Utilities and check the memory pressure graph under the Memory tab. If memory pressure is being displayed in red when an image is generated, close the web UI process and then add the --medvram command line option (i.e. ./webui.sh --opt-split-attention-v1 --medvram). If performance is still poor and memory pressure still red with that option, then instead try --lowvram (i.e. ./webui.sh --opt-split-attention-v1 --lowvram). If it still takes more than a few minutes to generate a 512x512 image with 20 steps with with any sampler, then you may need to turn off GPU acceleration. Open webui-user.sh in Xcode and change #export COMMANDLINE_ARGS="" to export COMMANDLINE_ARGS="--skip-torch-cuda-test --no-half --use-cpu all".

This fix apparently reduces but still doesn't look like Automatic1111 has any AS optimizations in the code.

How to improve performance on M1 / M2 Macs · AUTOMATIC1111 stable-diffusion-webui · Discussion #7453

There have been several additions and changes made recently that can improve performance on macOS: Half precision support (using web UI without --no-half) with --upcast-sampling. This significantly...

github.com

Xiao_Xi · Apr 19, 2023

Would it make sense for Apple to manufacture its own AI chip?

Microsoft Building Its Own AI Chip on TSMC's 5nm Process

Code-named Athena, Microsoft has reportedly been working on the chip since 2019.

www.tomshardware.com

quarkysg · Apr 19, 2023

Xiao_Xi said:
Would it make sense for Apple to manufacture its own AI chip?

Microsoft Building Its Own AI Chip on TSMC's 5nm Process

Code-named Athena, Microsoft has reportedly been working on the chip since 2019.

www.tomshardware.com

Isn't the A-Series SoC already geared towards AI? They named their SoC Bionic for a reason.

Xiao_Xi · Apr 19, 2023

Would it make sense for Apple to buy Cerebras?

GPT Model Training Competition Heats Up - Nvidia Has A Legitimate Challenger

Cerebras Is Now Cost Competitive For Training GPT-like Large Language Models

www.semianalysis.com

leman · Apr 19, 2023

quarkysg said:
Isn't the A-Series SoC already geared towards AI? They named their SoC Bionic for a reason.

Apple’s AI accelerators are geared towards energy-efficient, low-power ML inference to support app needs. These are relatively small devices, running small models. The chips @Xiao_Xi is talking about are dedicated cloud computing ML, for demanding applications. If Apple builds something like that, it would be for internal consumption (like Siri). Does it make sense? No idea.

quarkysg · Apr 19, 2023

leman said:
Apple’s AI accelerators are geared towards energy-efficient, low-power ML inference to support app needs. These are relatively small devices, running small models. The chips @Xiao_Xi is talking about are dedicated cloud computing ML, for demanding applications. If Apple builds something like that, it would be for internal consumption (like Siri). Does it make sense? No idea.

IMHO edge computing is where it's at. Apple is likely skating to where the puck is going.

leman · Apr 19, 2023

quarkysg said:
IMHO edge computing is where it's at. Apple is likely skating to where the puck is going.

Sure, but does Apple want to become an edge computing provider? It might be cheaper (and simpler) for them to just buy it from somewhere else... after all, building good ML hardware for smartphones or even desktops is not the same as building good ML hardware for cloud computing.

quarkysg · Apr 19, 2023

leman said:
Sure, but does Apple want to become an edge computing provider? It might be cheaper (and simpler) for them to just buy it from somewhere else... after all, building good ML hardware for smartphones or even desktops is not the same as building good ML hardware for cloud computing.

Well, 10 years ago, Apple doesn't have anything that can power macOS. Maybe 10 years from now, whatever the iPhone will morph into will be good enough to for ML.

senttoschool · Apr 19, 2023

quarkysg said:
IMHO edge computing is where it's at. Apple is likely skating to where the puck is going.

Depends. It's possible that the best models have to run in the cloud because of how big current and future LLMs can be.

Also, AIs aren't very latency sensitive.

And quite honestly, who knows what the future computing device actually is? It could be just a big screen that connects to a giant AI in the cloud and nothing else.

Xiao_Xi · Apr 19, 2023

quarkysg said:
IMHO edge computing is where it's at. Apple is likely skating to where the puck is going.

Even if Apple didn't do cloud inference, it needs to train models in the cloud. Training models is very expensive and Apple could save a lot of money by using its own chips.

ocimpean · Apr 27, 2023

Zest28 said:
Interesting. I was going to build a new PC with a RTX 4090 for AI, but based on what I am reading it sounds like a bad idea?

Is it not a driver issue that can simply be fixed with a software / firmware update?

I’m in a similar position. My current laptop has a 1060 with 6GB vram and an old icore7 and 16gb ram.
Running locally Automatic 1111 with Stable Diffusion 1.5, and other models that can generate text to image batches of maximum 512x512 pixels. More than that I’m running out of memory. I was ready to build a desktop with an Rtx 4090, when I decided to run Stable Diffusion on an iPad Pro with an M1 chip. Imagine my surprise when I was able to generate 1024x1024 images on the iPad. That got me thinking and I postponed getting the 4090.
Second thing that bothers me: I’m running locally Alpaca 7b 4bit, with NPX, and a variant via a web interface, that can be adjusted to run on GPU, or CPU, first case using Vram, second using regular Ram. The models are loaded ok, but after about 20 lines of dialogue I’m getting the dreaded out of memory message on the web ui, regardless of CPU or GPU choice. The NPX behaves better, but memory issues arrive sooner or later.
Llama 13b cannot be used, not to talk about 65b.

After the surprise I got with the iPad running Stable Diffusion, I started to think that the shared memory could bring more to the table than the brutal power that Nvidia offers and maybe allows the Ai to use a 64-128 ram as resources for GPU on Mac, and as such, permitting the local installation of large models like Llama 65b.

I see very knowledgeable people on this forum, so I would like to ask if any of you managed to load one of the large 65b models on your machine, be it a PC or a Mac, what specifications your computer has, also how was the performance?

Thank you.

arinamichel911 · May 4, 2023

As of 2023, Apple Silicon has made significant strides in terms of its support for AI frameworks such as PyTorch and TensorFlow. Apple has invested heavily in optimizing these frameworks for their hardware, and both frameworks are now fully supported on Apple Silicon. Many developers have reported significant performance improvements when running AI workloads on Apple Silicon-based Macs, especially for tasks involving image and video processing.

senttoschool · May 4, 2023

arinamichel911 said:
As of 2023, Apple Silicon has made significant strides in terms of its support for AI frameworks such as PyTorch and TensorFlow. Apple has invested heavily in optimizing these frameworks for their hardware, and both frameworks are now fully supported on Apple Silicon. Many developers have reported significant performance improvements when running AI workloads on Apple Silicon-based Macs, especially for tasks involving image and video processing.

This reads like a response from an LLM. ChatGPT?

Anyways, the age of LLM internet spam is here.

Let's cherish the remaining days we have of talking to real people on the internet.

Xiao_Xi · May 4, 2023

arinamichel911 said:
both frameworks are now fully supported on Apple Silicon

What does "fully supported" mean? Are all ops now GPU-based?

Xiao_Xi · May 4, 2023

Alex Ziskind made a very interesting video with several ML benchmarks on different Macs a while ago.

arinamichel911 · May 8, 2023

Xiao_Xi said:
What does "fully supported" mean? Are all ops now GPU-based?

yeah, you can say that TensorFlow has these things including GPU, CPU and neural engine

arinamichel911 · May 8, 2023

senttoschool said:
This reads like a response from an LLM. ChatGPT?

Anyways, the age of LLM internet spam is here.

Let's cherish the remaining days we have of talking to real people on the internet.

If it looks you like a chatbot or whatever another ai, over the internet lot of detectors, are avail

Xiao_Xi · May 8, 2023

arinamichel911 said:
TensorFlow has these things including GPU, CPU and neural engine

Do you have a link that explains this? Last time I checked there were some operations only available on CPU and only CoreML could access the neural engine.

arinamichel911 · May 9, 2023

check these: https://www.nvidia.com/en-us/glossary/data-science/tensorflow/

Hope this works for you

japansish · May 10, 2023

Does the unified memory offer some advantages since most consumer gpu's top out at 24gb?

TechnoMonk · May 10, 2023

japansish said:
Does the unified memory offer some advantages since most consumer gpu's top out at 24gb?

Absolutely. My 64 GB M1 Max uses 40-48 GB running some of the inferences 4090 runs out of memory. M1 Max would be slower with lack of RT core and lower T flops.

Xiao_Xi · Jun 5, 2023

On Thursday:

Optimize machine learning for Metal apps
Discover the latest enhancements to accelerated ML training in Metal. Find out about updates to PyTorch and TensorFlow, and learn about Metal acceleration for JAX. We'll show you how MPS Graph can support faster ML inference when you use both the GPU and Apple Neural Engine, and share how the same API can rapidly integrate your Core ML and ONNX models. For more information on using Metal for machine learning, check out “Accelerate machine learning with Metal” from WWDC22.

WWDC24

Join the worldwide developer community online for a week of technology and creativity.

developer.apple.com

It seems that Apple has created a Metal backend for JAX.

Accelerated JAX on Mac - Metal - Apple Developer

JAX uses the Metal plug-in to provide Metal acceleration on Mac platforms.

developer.apple.com

Apple Silicon in AI (2023)

macrumors 68040

macrumors 68040

Suspended

macrumors 68040

Poor Performance:​

macrumors 68000

macrumors 65816

macrumors 68000

macrumors Core

macrumors 65816

macrumors Core

macrumors 65816

macrumors 68030

macrumors 68000

macrumors newbie

macrumors member

macrumors 68030

macrumors 68000

macrumors 68000

macrumors member

macrumors member

macrumors 68000

macrumors member

macrumors newbie

macrumors 68040

macrumors 68000

Optimize machine learning for Metal apps​

Our Staff

Poor Performance:

Optimize machine learning for Metal apps