Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

TechnoMonk

macrumors 68030
Oct 15, 2022
2,604
4,112
Interesting. I was going to build a new PC with a RTX 4090 for AI, but based on what I am reading it sounds like a bad idea?

Is it not a driver issue that can simply be fixed with a software / firmware update?
It’s pretty good GPU for gaming and something like stable diffusion. I built a maxed out workstation with AMD thread ripper and 3090, now upgraded to 4090.
The reality is somewhere in the middle, this place likes to pretend Apple has all the problems and others don’t, but it is expected with the technology.
 

TechnoMonk

macrumors 68030
Oct 15, 2022
2,604
4,112
And then what? 512GB is great (we already have 640GB with Nvidia in a single box though and super fast connections for multiple boxes). But then we still have (likely) slow performance. Keep in mind the 1080Ti is running circles around the M1 Ultra... so that potential Mac Pro would have to be much, much, MUCH faster than the Ultra. They need something to compete with (multiple) A100 level cards to be interesting for the cloud.

There have been a bunch of CUDA to <insert-favorite-hype-alternative-here-that-dies-after-2-months> projects, none of which worked as expected and all were abandoned after a while. Like it or not, 99% of the market is Nvidia and it won't change anytime soon. A super fast desktop Mac for fiddling around locally would be nice though.
I don’t see apple competing in segment of training or creating AI models. Nvidia has that market in the bag. Inference and tools using these AI models is a different conversation. Apple with unified memory can take a 4090 to cleaners in some workflows.
 
  • Like
Reactions: jerryk

TechnoMonk

macrumors 68030
Oct 15, 2022
2,604
4,112
Apple could create a tool to convert CUDA code to Metal code as Intel and AMD have done, but I don't think they can legally convert compiled CUDA code at runtime. I have read that neither AMD nor Intel have tried due to the Oracle/Google lawsuit.
Converting cuda at run time will be inefficient and slow. I rather optimize my models for Apple Silicon and then run inference. Not to mention data compatibility issues between models at run time.
 
  • Like
Reactions: Xiao_Xi

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
What's stopping one from implementing a CUDA-compatible toolkit? Are there provisions that prohibit copying an API? That would be extremely odd.
So it seems. Google got off the hook for copying the Java API because it claimed fair use. I'm not sure Apple will be so lucky if it tries to copy the CUDA API.

I think/hope Apple will put the "Extreme" version of their chips in the cloud. Imagine an Extreme chip packaged together with 512GB of VRAM available for training or inference in the cloud.
Not even nVidia offers that. In fact, nVidia just unveiled its most powerful GPU, a dual-GPU card that offers 188 GB of HBM3 memory (94 GB per card). Would HBM3 be suitable for the Mx Extreme or would Apple need more advanced technology?
 

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,126
2,706
Inference and tools using these AI models is a different conversation. Apple with unified memory can take a 4090 to cleaners in some workflows.
But what workflow? Video work with Final Cut? Sure, that's what the Mx is optimized for, among a few other things.

I'm not so sure about tools, I mean, what tools would I need once I've arrived at the inference and left development of the model behind? It's just running at that point. Integration? Sure. And inference, I somehow can't see Apple providing cloud solutions for pure inference tasks or who would even use that. On some computer on-site, maybe..., but if I need a large A100 cluster for inference, something else is wrong.

But even if that's the case, the M1 Ultra with 64Gb is still beaten by a V100 (not A100!) with 16GB when it comes to inference (https://github.com/lucadiliello/pytorch-apple-silicon-benchmarks). The old A100 is already beaten by the 3080, except when memory is an issue or using multiple GPUs (https://bizon-tech.com/gpu-benchmarks/NVIDIA-Tesla-V100-vs-NVIDIA-RTX-3080/535vs578). And even the 1080Ti is faster than the Ultra for inference (https://sebastianraschka.com/blog/2022/pytorch-m1-gpu.html). For every single task? Maybe not, but for general models I wouldn't trade a 4090 for anything Apple has right now. Time will tell if that might change one day.

If we look at production applications (robotics, digital twins, etc.), wouldn't a Nvidia Jetson with it's dedicated hardware/pipelines that are not even available in GPGPUs be a better option than some computer?

In the end, I don't think that's what Apple want anyway. I still hope we'll see a more powerful Mac Pro in the future.

Not even nVidia offers that. In fact, nVidia just unveiled its most powerful GPU, a dual-GPU card that offers 188 GB of HBM3 memory (94 GB per card). Would HBM3 be suitable for the Mx Extreme or would Apple need more advanced technology?
I think that's just their response to the ChatGPT hype. We already have cards with 80GB per card and are able to put up to 8 of these into a workstation and server with SXM being fast enough to utilize the 640GB very well. It's "good enough". And bluntly speaking, when someone scans the whole internet, throws in attention to create a model that understands input and creates the best fitting average answer, which is often not fully correct or simply wrong in other cases (because the internet as a source only contains the truth and facts...), then yes, more memory is always better. Or in other words, they upped the memory a little from 80 to 94GB and doubled the specs to create a tighter "package".

One thing is for certain, the upcoming Mac Pro will give us a good idea of where Apple heading, even if it falls short of what Nvidia is offering.
 

senttoschool

macrumors 68030
Original poster
Nov 2, 2017
2,626
5,482
Not even nVidia offers that. In fact, nVidia just unveiled its most powerful GPU, a dual-GPU card that offers 188 GB of HBM3 memory (94 GB per card). Would HBM3 be suitable for the Mx Extreme or would Apple need more advanced technology?
I mean, the big advantage of Apple Silicon is unified memory. So I hope Apple sees what they have and put out something that others simply can't compete with in the professional space.

Anyways, Apple denying the Apple Silicon leaders from creating server versions of their chips might be one of the bigger mistakes they made. These leaders eventually left to form Nuvia, which Qualcomm bought and will bring to market for PCs, smartphones, and servers.
 

senttoschool

macrumors 68030
Original poster
Nov 2, 2017
2,626
5,482

xcode_performance_report_comparison.png
 

Pressure

macrumors 603
May 30, 2006
5,179
1,544
Denmark
Looks like it will be much easier running something like ChatGPT locally on a Mac soon.

Imagine Siri but it actually works 😂
 
  • Like
Reactions: ocimpean

TechnoMonk

macrumors 68030
Oct 15, 2022
2,604
4,112
But what workflow? Video work with Final Cut? Sure, that's what the Mx is optimized for, among a few other things.

I'm not so sure about tools, I mean, what tools would I need once I've arrived at the inference and left development of the model behind? It's just running at that point. Integration? Sure. And inference, I somehow can't see Apple providing cloud solutions for pure inference tasks or who would even use that. On some computer on-site, maybe..., but if I need a large A100 cluster for inference, something else is wrong.

But even if that's the case, the M1 Ultra with 64Gb is still beaten by a V100 (not A100!) with 16GB when it comes to inference (https://github.com/lucadiliello/pytorch-apple-silicon-benchmarks). The old A100 is already beaten by the 3080, except when memory is an issue or using multiple GPUs (https://bizon-tech.com/gpu-benchmarks/NVIDIA-Tesla-V100-vs-NVIDIA-RTX-3080/535vs578). And even the 1080Ti is faster than the Ultra for inference (https://sebastianraschka.com/blog/2022/pytorch-m1-gpu.html). For every single task? Maybe not, but for general models I wouldn't trade a 4090 for anything Apple has right now. Time will tell if that might change one day.

If we look at production applications (robotics, digital twins, etc.), wouldn't a Nvidia Jetson with it's dedicated hardware/pipelines that are not even available in GPGPUs be a better option than some computer?

In the end, I don't think that's what Apple want anyway. I still hope we'll see a more powerful Mac Pro in the future.


I think that's just their response to the ChatGPT hype. We already have cards with 80GB per card and are able to put up to 8 of these into a workstation and server with SXM being fast enough to utilize the 640GB very well. It's "good enough". And bluntly speaking, when someone scans the whole internet, throws in attention to create a model that understands input and creates the best fitting average answer, which is often not fully correct or simply wrong in other cases (because the internet as a source only contains the truth and facts...), then yes, more memory is always better. Or in other words, they upped the memory a little from 80 to 94GB and doubled the specs to create a tighter "package".

One thing is for certain, the upcoming Mac Pro will give us a good idea of where Apple heading, even if it falls short of what Nvidia is offering.
FCP? What year is this? 2010? Bunch of meaningless benchmarks don’t mean much to my pipelines. I was talking about my real life experience using Ai pipelines and workflows. More and more tools have started using AI models. I mostly use Topaz Video AI for upscaling to 4k or 8k. Adobe Premier pro has content aware erase, which uses AI models. Not to mention video generative models, 4090 chokes at generating anything more than 720P. This is small subset of my workflow.
I am not paying ridiculous rates for running inferences in cloud, my compute power for running inferences is predictable. I use cloud for training, using those Nvidia GPU clusters.
 
Last edited:

TechnoMonk

macrumors 68030
Oct 15, 2022
2,604
4,112

xcode_performance_report_comparison.png
This was turning point on why I started looking at Apple for inferences. Apple still needs lot of work but they are headed in the right direction.
 

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,126
2,706
FCP? What year is this? 2010? Bunch of meaningless benchmarks don’t mean much to my pipelines. I was talking about my real life experience using Ai pipelines and workflows. More and more tools have started using AI models. I mostly use Topaz Video AI for upscaling to 4k or 8k. Adobe Premier pro has content aware erase, which uses AI models. Not to mention video generative models, 4090 chokes at generating anything more than 720P. This is small subset of my workflow.
I am not paying ridiculous rates for running inferences in cloud, my compute power for running inferences is predictable. I use cloud for training, using those Nvidia GPU clusters.
Your bottleneck is not the inference part, it's the actual video/image part which, similar to FCP, Topaz and Premier is using. That's what AS is optimzed for. Run those models in inference stand alone and compare them to a Nvidia GPU.

Oh and btw, there's no general defect in 4090 cards. There is however an issue with specific chipsets or rather manufacturers implementation of that chipset in their mainboards. So if anything, if the 4090 isn't defective (happens), change the mainboard rather than the GPU. I've seen that myself when we tried a 4090 with a Ryzen 7 series CPU and an Asus mainboard. Bad performance, crashes after a while, bad temp control, in Linux we couldn't read sensor information and so on. Other boards were fine, but required a little tweaking. I'd always stay away from AMD CPUs for these new builds. AMD systems only seems to be really stable after a year or so, particularly in Linux. Intel is the much better choice.

Other than that, many people seem to have problems with the power supply not supplying enough power for spikes and the 12vhpwr pin in general. The sense pins/cable seem to break easily which can result in all sort of problems.

But if you're so heavily doing video work, a RTX8000 or A6000 would be a better choice over a 4090 anyway.
 

sunny5

macrumors 68000
Jun 11, 2021
1,835
1,706
I don’t see apple competing in segment of training or creating AI models. Nvidia has that market in the bag. Inference and tools using these AI models is a different conversation. Apple with unified memory can take a 4090 to cleaners in some workflows.
Only in theory, but in reality? It's not. I'm using WebUI and it use ALL unified memory and it's way slower than mobile RTX 3080 or maybe mobile RTX 3060. In 3D and AI, having unified memory seems meaningless as nobody barely use and optimize unified memory and doesn't really perform great.
 
  • Like
Reactions: GrumpyCoder

senttoschool

macrumors 68030
Original poster
Nov 2, 2017
2,626
5,482
Only in theory, but in reality? It's not. I'm using WebUI and it use ALL unified memory and it's way slower than mobile RTX 3080 or maybe mobile RTX 3060. In 3D and AI, having unified memory seems meaningless as nobody barely use and optimize unified memory and doesn't really perform great.
RAM is RAM. There is no way to create more physical RAM from software. There clearly exists some bottleneck elsewhere for Apple Silicon in both training and inference. If and when these bottlenecks get fixed, then unified memory can shine.
 

leman

macrumors Core
Oct 14, 2008
19,520
19,670
RAM is RAM. There is no way to create more physical RAM from software. There clearly exists some bottleneck elsewhere for Apple Silicon in both training and inference. If and when these bottlenecks get fixed, then unified memory can shine.

Spot on. Nvidia has more matmul flops, and that’s the beginning and the end of the story. That said, the CoreML version of stable diffusion published by Apple runs fairly decently and indicates what Apple Silicon should be capable of in this domain if Apple commits to shipping larger accelerators.
 
  • Like
Reactions: TechnoMonk

senttoschool

macrumors 68030
Original poster
Nov 2, 2017
2,626
5,482
Spot on. Nvidia has more matmul flops, and that’s the beginning and the end of the story. That said, the CoreML version of stable diffusion published by Apple runs fairly decently and indicates what Apple Silicon should be capable of in this domain if Apple commits to shipping larger accelerators.
If they haven’t, they will definitely focus on AI acceleration in Apple Silicon now.

I’m going to guess that by 2030, we are going to be buying huge NPUs with a CPU attached to it. Currently, it’s the other way around. AI inference power will be much more important than CPU speed.
 

TechnoMonk

macrumors 68030
Oct 15, 2022
2,604
4,112
Only in theory, but in reality? It's not. I'm using WebUI and it use ALL unified memory and it's way slower than mobile RTX 3080 or maybe mobile RTX 3060. In 3D and AI, having unified memory seems meaningless as nobody barely use and optimize unified memory and doesn't really perform great.
WebUI uses CPU, have you converted the models to CoreML?
 

TechnoMonk

macrumors 68030
Oct 15, 2022
2,604
4,112
RAM is RAM. There is no way to create more physical RAM from software. There clearly exists some bottleneck elsewhere for Apple Silicon in both training and inference. If and when these bottlenecks get fixed, then unified memory can shine.
He is using WebUI, which doesn’t use GPU. It’s very poor comparison.
 
  • Haha
Reactions: sunny5

TechnoMonk

macrumors 68030
Oct 15, 2022
2,604
4,112
Your bottleneck is not the inference part, it's the actual video/image part which, similar to FCP, Topaz and Premier is using. That's what AS is optimzed for. Run those models in inference stand alone and compare them to a Nvidia GPU.

Oh and btw, there's no general defect in 4090 cards. There is however an issue with specific chipsets or rather manufacturers implementation of that chipset in their mainboards. So if anything, if the 4090 isn't defective (happens), change the mainboard rather than the GPU. I've seen that myself when we tried a 4090 with a Ryzen 7 series CPU and an Asus mainboard. Bad performance, crashes after a while, bad temp control, in Linux we couldn't read sensor information and so on. Other boards were fine, but required a little tweaking. I'd always stay away from AMD CPUs for these new builds. AMD systems only seems to be really stable after a year or so, particularly in Linux. Intel is the much better choice.

Other than that, many people seem to have problems with the power supply not supplying enough power for spikes and the 12vhpwr pin in general. The sense pins/cable seem to break easily which can result in all sort of problems.

But if you're so heavily doing video work, a RTX8000 or A6000 would be a better choice over a 4090 anyway.
You have no idea what you talking about in workflows. I know exactly where my bottleneck are in the workflow. I don’t use any fcp or video tools till Generative AI models create frames for videos. The only thing used there is Ai inference. It’s not just my workflow but I have seen people easily create the issue with other open-source generative videos AI like Deforum. Install deform on webui and try using it at higher resolutions.
How much does A6000 or A8000 cost? Pretty much the cost of a maxed-out RAM and processer in MacBook pro.
 
Last edited:

TechnoMonk

macrumors 68030
Oct 15, 2022
2,604
4,112
Spot on. Nvidia has more matmul flops, and that’s the beginning and the end of the story. That said, the CoreML version of stable diffusion published by Apple runs fairly decently and indicates what Apple Silicon should be capable of in this domain if Apple commits to shipping larger accelerators.
Exactly this, Apple needs to commit and provide accelerators and library support for most commonly used their party libraries. The stable diffusion coreML isn’t even optimized by best performance. Writing your inference and dynamic batching speeds up even more on Apple silicon.
 

TechnoMonk

macrumors 68030
Oct 15, 2022
2,604
4,112
Do you even have Automaitc1111 WebUI? It heavily uses both GPU cores and VRAM. CPU? Are you kidding me?
I tried it couple months ago, it was using cpu. I would love to see some screen shots and the link pointing me the version of automatic1111 we Uk using Coreml.
This is the last one from feb 27, the screen shots in the link shows CPU argument and still using default ckpt models.

 
  • Haha
Reactions: sunny5

sunny5

macrumors 68000
Jun 11, 2021
1,835
1,706
I tried it couple months ago, it was using cpu. I would love to see some screen shots and the link pointing me the version of automatic1111 using Coreml.
HAhaha, automatic1111's WebUI does not even use CoreML. You clearly know nothing and you just proven yourself. Go and check Stable Diffusion subreddit to see how it works first.

I see your link and that also leads to what I'm using so you clearly don't know what you are saying. I've been using WebUI for almost 5 months.
 
Last edited:

TechnoMonk

macrumors 68030
Oct 15, 2022
2,604
4,112
HAhaha, automatic1111's WebUI does not even use CoreML. You clearly know nothing and you just proven yourself. Go and check Stable Diffusion subreddit to see how it works first.
Lol. I know how it works, I run it on my 4090. Show me automatic1111 working on Apple GPU with coreml models. What you have shown is ignorance of how Apple silicon runs these models.

show me the coreml version of Automatic1111 like stable diffusion

I would love to see screen shots of automatic1111 running in GPU mode on your apple silicon.
 
  • Haha
Reactions: sunny5

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,126
2,706
You have no idea what you talking about in workflows. I know exactly where my bottleneck are in the workflow.
Ah ok. Hey, maybe you should get a PhD in the field and a professorship at a leading university teaching this stuff. But of course it's always the others that have no idea.
I don’t use any fcp or video tools till Generative AI models create frames for videos.
I didn't say you're using FCP, I said the M series is optimised for workflow similar to FCP. You're using video workflow and maybe, just maybe you should check what exactly happens in these models and the model output. How many of these models have you created yourself? How many have you published at peer-reviewed conferences? None. You can't even get your 4090 going. 'Nuff said.
How much does A6000 or A8000 cost? Pretty much the cost of a maxed-out RAM and processer in MacBook pro.
There is no A8000, only a A6000. The 8000 is a RTX8000, no A there. Good thing you know your stuff and don't have to rely on people who don't know what they're talking about. Oh wait...
 

sunny5

macrumors 68000
Jun 11, 2021
1,835
1,706
Lol. I know how it works, I run it on my 4090. Show me automatic1111 working on Apple GPU with coreml models. What you have shown is ignorance of how Apple silicon runs these models.

show me the coreml version of Automatic1111 like stable diffusion

I would love to see screen shots of automatic1111 running in GPU mode on your apple silicon.
Screenshot 2023-03-26 at 5.06.48 PM.jpg

lol, who even use ColeML to run WebUI? Using GPU is the most fastest way to generate AI images and I have no idea what you are talking about? I guess you dont eve use custom models at all. WebUI use all GPU core while generating images.
 
  • Haha
Reactions: Ursadorable
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.