Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

TechnoMonk

macrumors 68030
Oct 15, 2022
2,604
4,112
Inference on Nvidia hardware is very good... they have tons of optimizations for it and even more depending on which framework like PyTorch/tensorflow you use. I have had zero issues with inference. It's very solid.

Conversion to CoreML is still a nightmare. I dread it the most. It is the most painful god forsaken part of dealing with AI in Apple's ecosystem. The python package Coremltools developed by Apple is riddled with bugs, missing features, terrible documentation. I have submitted countless bugs and they go unanswered. Take for instance this INCREDIBLY basic tensorflow function unravel_index. This bug was submitted almost 2 years ago and was never addressed!!!! https://github.com/apple/coremltools/issues/1195

The irony is Apple internally uses Nvidia and PyTorch for their own AI workflows. They'd never use their own ecosystem.
4090 is the most unstable GPU i have worked with in past few years. Its good if you are running some basic Diffusion models like stable diffusion, or some text based models. Anything with sustained load, it crashes, reboots the workstation, or just runs out of memory. I almost exclusively use AI models for 4k/8K upscaling, its like saying a prayer on 4090. You may run something for 30 mins, and then it crashes after 25 mins. M1 max may not run the same in 30 mins, it may take 35-37 mins, but is stable and reliable. Nvidia need to make their GPUs efficient, 4090 easily draws 600 W, then starts throttling or just crashes after sustained load.
Good luck using 4090 for removing content, or adding content to a video using AI models with resolution more than 720p.
I have a different approach when using CoreML or Apple GPU for running inferences, I am not trying to map CoreML one to one with Cuda, or use the PyTorch or TF as such. Ofcourse I would have to use some custom libraries or custom code to make it work on AS. If you are expecting everything to work the same way as CUDA, then it is a different battle.
 
  • Like
Reactions: levanid and Xiao_Xi

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
I have a different approach when using CoreML or Apple GPU for running inferences, I am not trying to map CoreML one to one with Cuda, or use the PyTorch or TF as such.
Why is the conversion to CoreML so error-prone? Doesn't ONNX solve the problem of inter-library conversion?
 

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,663
OBX
4090 is the most unstable GPU i have worked with in past few years. Its good if you are running some basic Diffusion models like stable diffusion, or some text based models. Anything with sustained load, it crashes, reboots the workstation, or just runs out of memory. I almost exclusively use AI models for 4k/8K upscaling, its like saying a prayer on 4090. You may run something for 30 mins, and then it crashes after 25 mins. M1 max may not run the same in 30 mins, it may take 35-37 mins, but is stable and reliable. Nvidia need to make their GPUs efficient, 4090 easily draws 600 W, then starts throttling or just crashes after sustained load.
Good luck using 4090 for removing content, or adding content to a video using AI models with resolution more than 720p.
I have a different approach when using CoreML or Apple GPU for running inferences, I am not trying to map CoreML one to one with Cuda, or use the PyTorch or TF as such. Ofcourse I would have to use some custom libraries or custom code to make it work on AS. If you are expecting everything to work the same way as CUDA, then it is a different battle.
Are you running an FE card or a AIB model?
 

teagls

macrumors regular
May 16, 2013
202
101
4090 is the most unstable GPU i have worked with in past few years. Its good if you are running some basic Diffusion models like stable diffusion, or some text based models. Anything with sustained load, it crashes, reboots the workstation, or just runs out of memory. I almost exclusively use AI models for 4k/8K upscaling, its like saying a prayer on 4090. You may run something for 30 mins, and then it crashes after 25 mins. M1 max may not run the same in 30 mins, it may take 35-37 mins, but is stable and reliable. Nvidia need to make their GPUs efficient, 4090 easily draws 600 W, then starts throttling or just crashes after sustained load.
Good luck using 4090 for removing content, or adding content to a video using AI models with resolution more than 720p.
I have a different approach when using CoreML or Apple GPU for running inferences, I am not trying to map CoreML one to one with Cuda, or use the PyTorch or TF as such. Ofcourse I would have to use some custom libraries or custom code to make it work on AS. If you are expecting everything to work the same way as CUDA, then it is a different battle.
That honestly sounds like you are letting the 4090 overheat. Are you cooling it adequately?
 

TechnoMonk

macrumors 68030
Oct 15, 2022
2,604
4,112
That honestly sounds like you are letting the 4090 overheat. Are you cooling it adequately?
I monitor the temperature,It’s not heating issue though, I have plenty of cooling. It’s more of power issue. I cap the power to 70%, just like I did on my retired 3090. This is not uncommon for folks who run sustained workloads. I can rent a 4090 or A5000 in cloud and replicate the issue pretty easily.
It’s a love hate relationship with my 4090, it’s amazing doing certain tasks but unreliable at others.
I just hope Apple invests in this segment of GPU market, both in hardware and software.
 

Kimmo

macrumors 6502
Jul 30, 2011
266
318

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,126
2,706
I have some small 4090 and dual 4090 workstations, no problem with the GPUs in these with the usual TF/PyTorch/Omniverse applications. Rock stable. Of course there's the issue with memory on these, as it's very limited. That's what RTX6000/8000 cards are for. The next step is V100/H100, but I personally think using these in the cloud is pointless. They're too expensive for continuous workload and when used all the time, buying is much cheaper than a cloud service.
 

teagls

macrumors regular
May 16, 2013
202
101
I monitor the temperature,It’s not heating issue though, I have plenty of cooling. It’s more of power issue. I cap the power to 70%, just like I did on my retired 3090. This is not uncommon for folks who run sustained workloads. I can rent a 4090 or A5000 in cloud and replicate the issue pretty easily.
It’s a love hate relationship with my 4090, it’s amazing doing certain tasks but unreliable at others.
I just hope Apple invests in this segment of GPU market, both in hardware and software.
Something doesn't seem right. Are you also saying you have power issues with A5000? We regularly run many A6000's in a single workstation for weeks under full load and never have issues. You might want to check your PSU or connections. Maybe even check your electrical outlets.
 

leman

macrumors Core
Oct 14, 2008
19,520
19,671
Some evidence that a new neural engine design might be incoming:


A quick uninformed summary from someone who has no clue about these things: it seems to describe an architecture that consists of two types of hardware units, optimised for different types of operations (convolutions vs. vector processing) and a dependency management engine that asynchronously invokes these units. From what I understand, the current NPU is primarily a convolution engine.
 
  • Like
Reactions: Xiao_Xi

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,663
OBX
I monitor the temperature,It’s not heating issue though, I have plenty of cooling. It’s more of power issue. I cap the power to 70%, just like I did on my retired 3090. This is not uncommon for folks who run sustained workloads. I can rent a 4090 or A5000 in cloud and replicate the issue pretty easily.
It’s a love hate relationship with my 4090, it’s amazing doing certain tasks but unreliable at others.
I just hope Apple invests in this segment of GPU market, both in hardware and software.
That is interesting because 100% power is 450W for most 4090's with like 5 or 6 cards able to actually draw more than that in power (as in a lot of cards are power limited to 450W). I am going to assume the place you are renting is using FE cards which can pull up to 600W, but have to be overclocked (run at more than 100% power limit) to get there.


It will be interesting to see how Apple competes in this space as renting compute appears to be the overwhelming choice for folks these days.
 

TechnoMonk

macrumors 68030
Oct 15, 2022
2,604
4,112
Why is the conversion to CoreML so error-prone? Doesn't ONNX solve the problem of inter-library conversion?
That was the bigger problem. You don’t need to convert to ONNX any more, at least not what I do now. I would say lot of it is not plug and play, I use custom Python code leveraging apple core ml to do the conversion.
That is interesting because 100% power is 450W for most 4090's with like 5 or 6 cards able to actually draw more than that in power (as in a lot of cards are power limited to 450W). I am going to assume the place you are renting is using FE cards which can pull up to 600W, but have to be overclocked (run at more than 100% power limit) to get there.


It will be interesting to see how Apple competes in this space as renting compute appears to be the overwhelming choice for folks these days.
i am not renting 4090, it’s in my workstation. There is no peer limiting in most cards, NVidia claims 450 W cap, and the whole board draw is much higher depending on the manufacturer. I under clock and power limit mine to 70%. I am not a gamer so my use case could be very different than most of folks who use it for gaming.
 

TechnoMonk

macrumors 68030
Oct 15, 2022
2,604
4,112
Something doesn't seem right. Are you also saying you have power issues with A5000? We regularly run many A6000's in a single workstation for weeks under full load and never have issues. You might want to check your PSU or connections. Maybe even check your electrical outlets.
It’s same in the cloud(A5000) and 4090(local). I don’t run in to same issues on a100. I am not wasting money on A100 for inferences though.
 

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,126
2,706
Something is wrong there. The 4090 average power draw is around 400-450W, overclocked cards draw a little more, somewhere between 450-500W. Only for short bursts should it reach 600W. The board limit is around 650W. So if a capped 4090 at 70% is reaching that, something isn't working properly in the system. Not a single of our 4090 cards is behaving like that. All of our systems are configured at the supplier and tested for 48 hours under full load before they're shipped. Our larger Dell systems with 6000/8000 are tested as well and so are the Dell servers with A100/H100 for our GPU clusters.

Another issue, using PyTorch with VGG16, even an old 1080Ti is running circles around a M1 Ultra (leave alone Max) for both training and inference. So if a 4090 needs 30 minutes and crashes before it reaches that and a M1 Max can do the same job in under 40 minutes, something is terribly wrong here. As it happens, I'm at a conference and then at Nvidia for the rest of next week, so I'll ask them about the power draw of 600W with a power cap at 70%.
 

TechnoMonk

macrumors 68030
Oct 15, 2022
2,604
4,112
Something is wrong there. The 4090 average power draw is around 400-450W, overclocked cards draw a little more, somewhere between 450-500W. Only for short bursts should it reach 600W. The board limit is around 650W. So if a capped 4090 at 70% is reaching that, something isn't working properly in the system. Not a single of our 4090 cards is behaving like that. All of our systems are configured at the supplier and tested for 48 hours under full load before they're shipped. Our larger Dell systems with 6000/8000 are tested as well and so are the Dell servers with A100/H100 for our GPU clusters.

Another issue, using PyTorch with VGG16, even an old 1080Ti is running circles around a M1 Ultra (leave alone Max) for both training and inference. So if a 4090 needs 30 minutes and crashes before it reaches that and a M1 Max can do the same job in under 40 minutes, something is terribly wrong here. As it happens, I'm at a conference and then at Nvidia for the rest of next week, so I'll ask them about the power draw of 600W with a power cap at 70%.
Great! I would love to hear what NVidia guys say about it. Just for clarification 600 W is with out limiting power to 70%. The GPU board draw is around at 400-425 with power limits and under locking the GPU 20%. I tried multiple vendors.
Ask them if there are stability issues running at sustained loads. I see the same in a 5000.
 

Zest28

macrumors 68030
Jul 11, 2022
2,581
3,933
Interesting. I was going to build a new PC with a RTX 4090 for AI, but based on what I am reading it sounds like a bad idea?

Is it not a driver issue that can simply be fixed with a software / firmware update?
 
  • Like
Reactions: ocimpean

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
It will be interesting to see how Apple competes in this space as renting compute appears to be the overwhelming choice for folks these days.
I think deep learning and scientific library developers would care more about Apple hardware if it were available in the cloud at a competitive price.
 

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,663
OBX
That was the bigger problem. You don’t need to convert to ONNX any more, at least not what I do now. I would say lot of it is not plug and play, I use custom Python code leveraging apple core ml to do the conversion.

i am not renting 4090, it’s in my workstation. There is no peer limiting in most cards, NVidia claims 450 W cap, and the whole board draw is much higher depending on the manufacturer. I under clock and power limit mine to 70%. I am not a gamer so my use case could be very different than most of folks who use it for gaming.
Making another assumption. Unplug 1 of the 8-pin to 16-pin adapter leads, that will force the card to cap at 450W. 70% of that should bring the power down to ~315W.

The below is from this YT video where the person compared 35 4090's to find "the best" one(s). Was just to reiterate that most cards have a stock power limit of 450W (and there are quite a few that can't go over that limit for whatever reason). You were right most can go over the limit, but that usually means pushing the slider past 100%, which you said you were not doing. Note all this is ignoring transient power spikes which can be much higher, but are only for a fraction of a second.

Screenshot 2023-03-19 at 7.54.13 PM.png
 

senttoschool

macrumors 68030
Original poster
Nov 2, 2017
2,626
5,482
I think deep learning and scientific library developers would care more about Apple hardware if it were available in the cloud at a competitive price.
Did somebody say Apple Silicon Cloud?


I think/hope Apple will put the "Extreme" version of their chips in the cloud. Imagine an Extreme chip packaged together with 512GB of VRAM available for training or inference in the cloud.
 

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,663
OBX
I think deep learning and scientific library developers would care more about Apple hardware if it were available in the cloud at a competitive price.

Did somebody say Apple Silicon Cloud?


I think/hope Apple will put the "Extreme" version of their chips in the cloud. Imagine an Extreme chip packaged together with 512GB of VRAM available for training or inference in the cloud.
I think if Apple had a way to convert CUDA to Metal in the backend it would make the cloud service more approachable Maybe along with a notice that you could get way better performance writing Metal code directly versus the conversion/translation layer.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
I think if Apple had a way to convert CUDA to Metal in the backend it would make the cloud service more approachable
Can anyone confirm if that would be legal? I've read that other GPU manufacturers can't use CUDA code because nVidia restricts the use of CUDA to their GPUs. To alleviate the problem, AMD released Orochi.
 

leman

macrumors Core
Oct 14, 2008
19,520
19,671
Can anyone confirm if that would be legal? I've read that other GPU manufacturers can't use CUDA code because nVidia restricts the use of CUDA to their GPUs. To alleviate the problem, AMD released Orochi.

What's stopping one from implementing a CUDA-compatible toolkit? Are there provisions that prohibit copying an API? That would be extremely odd.
 

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,663
OBX
Can anyone confirm if that would be legal? I've read that other GPU manufacturers can't use CUDA code because nVidia restricts the use of CUDA to their GPUs. To alleviate the problem, AMD released Orochi.
IIRC ROCm has tools to port CUDA code to HIP, so I see no reason why Apple couldn't do the same (and even do it at runtime ala Rosetta 2).
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
Apple could create a tool to convert CUDA code to Metal code as Intel and AMD have done, but I don't think they can legally convert compiled CUDA code at runtime. I have read that neither AMD nor Intel have tried due to the Oracle/Google lawsuit.
 

GrumpyCoder

macrumors 68020
Nov 15, 2016
2,126
2,706
I think/hope Apple will put the "Extreme" version of their chips in the cloud. Imagine an Extreme chip packaged together with 512GB of VRAM available for training or inference in the cloud.
And then what? 512GB is great (we already have 640GB with Nvidia in a single box though and super fast connections for multiple boxes). But then we still have (likely) slow performance. Keep in mind the 1080Ti is running circles around the M1 Ultra... so that potential Mac Pro would have to be much, much, MUCH faster than the Ultra. They need something to compete with (multiple) A100 level cards to be interesting for the cloud.
Can anyone confirm if that would be legal? I've read that other GPU manufacturers can't use CUDA code because nVidia restricts the use of CUDA to their GPUs. To alleviate the problem, AMD released Orochi.
There have been a bunch of CUDA to <insert-favorite-hype-alternative-here-that-dies-after-2-months> projects, none of which worked as expected and all were abandoned after a while. Like it or not, 99% of the market is Nvidia and it won't change anytime soon. A super fast desktop Mac for fiddling around locally would be nice though.
 
  • Like
Reactions: leman

jdb8167

macrumors 601
Nov 17, 2008
4,859
4,599
Apple could create a tool to convert CUDA code to Metal code as Intel and AMD have done, but I don't think they can legally convert compiled CUDA code at runtime. I have read that neither AMD nor Intel have tried due to the Oracle/Google lawsuit.
Apple translates compiled x86-64 code to AArch64 with Rosetta 2 without any legal problems despite Intel threatening both Microsoft and Apple previously. If Intel isn't suing Apple then I doubt that there is any case for Nvidia to sue over binary translation either.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.