4090 is the most unstable GPU i have worked with in past few years. Its good if you are running some basic Diffusion models like stable diffusion, or some text based models. Anything with sustained load, it crashes, reboots the workstation, or just runs out of memory. I almost exclusively use AI models for 4k/8K upscaling, its like saying a prayer on 4090. You may run something for 30 mins, and then it crashes after 25 mins. M1 max may not run the same in 30 mins, it may take 35-37 mins, but is stable and reliable. Nvidia need to make their GPUs efficient, 4090 easily draws 600 W, then starts throttling or just crashes after sustained load.Inference on Nvidia hardware is very good... they have tons of optimizations for it and even more depending on which framework like PyTorch/tensorflow you use. I have had zero issues with inference. It's very solid.
Conversion to CoreML is still a nightmare. I dread it the most. It is the most painful god forsaken part of dealing with AI in Apple's ecosystem. The python package Coremltools developed by Apple is riddled with bugs, missing features, terrible documentation. I have submitted countless bugs and they go unanswered. Take for instance this INCREDIBLY basic tensorflow function unravel_index. This bug was submitted almost 2 years ago and was never addressed!!!! https://github.com/apple/coremltools/issues/1195
The irony is Apple internally uses Nvidia and PyTorch for their own AI workflows. They'd never use their own ecosystem.
Good luck using 4090 for removing content, or adding content to a video using AI models with resolution more than 720p.
I have a different approach when using CoreML or Apple GPU for running inferences, I am not trying to map CoreML one to one with Cuda, or use the PyTorch or TF as such. Ofcourse I would have to use some custom libraries or custom code to make it work on AS. If you are expecting everything to work the same way as CUDA, then it is a different battle.