And no. My numbers are not off.
Funny, some benchmarks, especially in gaming disagree with you and put the gaming performance around 50% in comparison with a 1080Ti. But that also depends on the game of course and where the bottle neck is.
About the last part: moving the goalpost, eh?
Not at all. It's actually what a lot of people are looking for in a local machine. For the serious number crunching a big cluster is needed anyway. Neither AMD or NVIDIA have single card that does it all.
Funnier even: AMD Vega does not need 32 GB of RAM because it has HBCC which helps with ginormous data sets.
Are we back at the RAM doubler days we had with the G3 and G4? I can't even believe we're discussing this. When your dataset you're currently working on is over 30GB then the one thing you need is memory. Sure, you can make smaller and more batches. There are advantages and disadvantages doing this, also when you update your weights and how. It's a mood point discussing this here as it's a current hot research topic with plenty of papers published and also a lot of unsolved problems, especially when it comes to uncertainties in bayesian nets.
Have you actually ported anything from OpenCL to Metal, or what you have written is your opinion based on your assumption that that has to be the case?
I have, my research group has, my students during their regular courses and thesis' have and researchers around the world I'm in contact with have. But thanks for asking.
No, you don't have to port your application from OpenCL to Metal per-se. Metal is very close to OpenCL in its philosophy and the code OpenCL code can be executed in Metal easily.
So what your saying is, I can download any arbitrary code from a GitHub repository, let's say written in C++/OpenCL, push a button on a Mac and it just runs using Metal? No touch up, no code changes required? This would be the holy grail for reproducing results from other research groups (if they're using OpenCL which is unlikely). Sadly most of the time it doesn't even work with the same libraries. We've had our share of trouble running stuff from Google using Keras+Tensorflow and when we tried to run some stuff using C++/OpenCV/Tensorflow which worked flawlessly on Intel/NVIDIA it became a massive problem running it on a Jetson board. Solving these problems is wasting time no researcher or student has, especially if you have to publish x papers per year.
There is no gold standard Here.
Have you set foot in a university or research center in the past couple of years? How many clusters running AMD cards have you seen? Where's the service from AMD that NVIDIA offers? I get regular invites from NVIDIA to bring my students to their research/compute centers to use their resources and they'll even help doing it. For free. When we buy compute clusters, they're there to help (a lot). When we need small boards for autonomous drone projects, they slice 50% off their Jetson boards for education. I'd say there is a gold standard, one that AMD does not offer. I wish they would, but going with AMD instead of NVIDIA in education and research is pretty much suicide. You can do both if you want, but you NEED NVIDIA.
Nvidia by not opening the platform did great for themselves, but f****** up the whole industry, in essence.
Oh I agree, they should not have done that. In a perfect world CUDA would be available for AMD cards.
Mindshare is too strong, that is why people oppose any changes, people cannot even COMPREHEND that there can be a better way than CUDA.
Leaving performance aside, it doesn't matter what's better or not. What matters is what people use and in my field, it's just not 100% possible to get around CUDA unless you want to reinvent the wheel over and over again and waste a lot of time. If I'd be in the business to develop an application from scratch and sell it, that's another story.
Single GPU costs at least 3000$, and that is 6 times more than single Vega 64.
Oh I agree it is too expensive. It's cheaper to rent a VM in the cloud than to buy. The problem is, once the prototyping and test runs are done, you need to move to a cluster because a single card isn't enough. That's why it's called Big Data which runs on clusters, see above. And again, most of the work out there is done with CUDA. You'd be surprised how many researchers there are prototyping in Matlab and bring it to CUDA with the help of the Parallel Computation Toolbox. Similar attempts have been made for OpenCL and it's pretty much dead. Doing it for MPI from Matlab works better and is more wide spread that OpenCL. I'd happily switch to OpenCL (in fact I've tried years ago) or Metal2. The problem is, the rest of the world would have to do the same and that's just not going to happen anytime soon.