I installed prebuilt binary of TensorFlow 1.13.0 on Ubuntu Server 18.04.2 LTS running on MacPro5,1
Just out of curiosity: why are you running a server with a 1080Ti in it? Are you by any chance Google Stadia?
I installed prebuilt binary of TensorFlow 1.13.0 on Ubuntu Server 18.04.2 LTS running on MacPro5,1
It is the first application that I have this issue, maybe a lot more to come!
It's very sad my Windows PC with old intel Core i7 2600k have AVX but my monster Mac pro with 12 core Xeon can't support it!
Another performance concern besides unaligned data issues is that mixing legacy XMM-only instructions and newer Intel AVX instructions causes delays, so minimize transitions between VEX-encoded instructions and legacy Intel SSE code. Said another way, do not mix VEX-prefixed instructions and non-VEX-prefixed instructions for optimal throughput. If you must do so, minimize transitions between the two by grouping instructions of the same VEX/non-VEX class. Alternatively, there is no transition penalty if the upper YMM bits are set to zero via VZEROUPPER or VZEROALL, which compilers should automatically insert. This insertion requires an extra instruction, so profiling is recommended.
I installed prebuilt binary of TensorFlow 1.13.0 on Ubuntu Server 18.04.2 LTS running on MacPro5,1, and got the following error.
I found that the prebuilt binary of TensorFlow 1.6 or later uses AVX instructions.Code:root@ubuntuserver:~# python3 Python 3.6.8 (default, Jan 14 2019, 11:02:34) [GCC 8.0.1 20180414 (experimental) [trunk revision 259383]] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import tensorflow as tf 2019-07-04 05:03:17.612831: F tensorflow/core/platform/cpu_feature_guard.cc:37] The TensorFlow library was compiled to use AVX instructions, but these aren't available on your machine. Aborted (core dumped) root@ubuntuserver:~#
Since TensorFlow is an Open Source software, I can compile it without AVX instructions though..
I have a couple of MacPro5,1 with NVIDIA GPUs, and I used to use them for Machine Learning, especially with GTX 1080 Ti as CUDA device.Just out of curiosity: why are you running a server with a 1080Ti in it? Are you by any chance Google Stadia?
Thanks for the info about Intel's MKL DNN.Tensorflow can also be built against the mkl dnn backend. If they did this by default, it wouldn't be hard to support SSE, as mkl-dnn has a runtime dispatch. Granted, it's a really weird one, as it's almost built on top of Xbyak, but user code doesn't really interact with that layer.
I tried to build TensorFlow from source code, but could successfully install TensorFlow 1.13.1 with conda install.Note: MKL was added as of TensorFlow 1.2 and currently only works on Linux. It also does not work when also using --config=cuda.
In addition to providing significant performance improvements for training CNN based models, compiling with the MKL creates a binary that is optimized for AVX and AVX2. The result is a single binary that is optimized and compatible with most modern (post-2011) processors.
I have a couple of MacPro5,1 with NVIDIA GPUs, and I used to use them for Machine Learning, especially with GTX 1080 Ti as CUDA device.
However, macOS Mojave (and later) does not support those NVIDIA GPUs and CUDA.
Also, I bought 32GB memory modules, but I found that macOS does not support them.
So, I keep macOS Mojave on one of MacPro5,1 as my desktop machine, replacing NVIDIA GPU by Vega Frontier Edition, and installed Linux (Ubuntu Server) on another MacPro5,1 for Machine Learning, putting 192GB (6x 32GB) memory.
Thanks for the info about Intel's MKL DNN.
I tried to build TensorFlow from source code, but could successfully install TensorFlow 1.13.1 with conda install.
Anyway, I use TensorFlow with CUDA on GTX 1080 Ti, so AVX and MKL does not matter on my configuration.
Also, I installed prebuilt binary of PyTorch 1.1.0, and it was fine (AVX instructions are not used).
Thanks for interesting information....
I haven't tried these, but you can look up the recipes used to build various packages. The link just goes to a repo that aggregates various conda recipes. Conda is just a package manager, so it's not too opaque in that regard. I did this with llvmlite a while ago.
As far as PyTorch is concerned, it supports a few different back ends. They have also been working on their own back end for inference purposes. It does something akin to lowering a network graph directly, which allows for a fairly high level of optimization. They wrote a paper on it as well, which is quite good overall, but I spotted a couple bugs in their reasoning.
Tensorflow could have probably been better integrated with various vector math libraries from an earlier stage than it's at today, as it's built on top of Eigen, which has always used various intrinsics in lower level code sections. This should be enough to make it semi-opaque to tensorflow, but Eigen can be a bit moody at times so I'm not completely sure. It tends to redefine a lot of things.
SIMD code generation in itself is also not always straightforward. GCC and Clang have various flags that help with auto-vectorization as well as support for openmp pragma simd, but most of these don't typically result in optimal assembly code generation. There's the issue of potential aliasing which has to be considered in a direct translation of C like languages in addition to the lack of universally supported unroll pragmas. Efficient simdized code basically requires some amount of unrolling and proper loop ordering to achieve high throughput, since there are a lot of areas where you're lightly penalized for the use of unoptimized simd. For example, loads crossing 2 cache lines incur a penalty. Reordering incurs additional shuffle instructions, which do not contribute to arithmetic operations in any way. I tend to wish the majority of these could be abstracted a bit better at the compiler level. Right now GCC, MSVC, and various forks of Clang often disagree.
Yup, still tracking awesome sessions on this beast. Pro Tools 2024.6As mentioned in another thread, run your PC as a slave to your Mac under Vienna Ensemble Pro.
This is so not true. At least in the audio world.