Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

webbp

macrumors member
Original poster
May 26, 2021
31
4
Single GPU options
  • 1x GTX 980 Ti, € 400 each, 250W TDP, requires Pixla's PSU mod
  • 1x GTX 1080 Ti, € 500, 250W TDP, ML performance ~150% of 980 Ti, requires Pixla's PSU mod
  • Thunderbolt 3 eGPU ~25% decrease in performance vs internal
  • External PCIe chassis with a PCIe expander TDB
Multi-GPU options
4 NVIDIA CUDA GPUs. Because there are only two PCIe power sockets, each GPU should only use slot power, so 75W TDP
  • 4x GTX 1050 Ti, € 57 each, € 228 total, pooled ML performance ~75% of one 250W GTX 980 Ti
  • 4x GTX 1650, € 170 each, € 680 total, pooled ML performance ~85% of one 250W GTX 980 Ti
  • 4x Quadro P2000, € 400 each, € 1600 total, pooled ML performance ~200% of one 250W GTX 980 Ti
  • 4x Quadro P2200, € 520 each, € 2080 total, pooled ML performance ~240% of one 250W GTX 980 Ti
  • 4x Telsa P4, € 670 each, € 2680 total, pooled ML performance ~280% of one 250W GTX 980 Ti
  • 4x Telsa T4, € 2500 each, € 10,000 total, pooled ML performance ~360% of one 250W GTX 980 Ti
Performance estimates based on
  1. https://ai-benchmark.com/ranking_deeplearning.html
  2. https://alex-vaith.medium.com/this-...ning-on-your-mobile-work-station-a26dc9ed8673
  3. https://gpu.userbenchmark.com/Compare/Nvidia-GTX-1650-vs-Nvidia-GTX-1050-Ti/4039vs3649
[edited in response to comments]
 
Last edited:
Single slot GPUs with decent performance always cost more and have reality problems in the long run.

Why you don't look at an used external PCIe chassis with a PCIe expander, adapt it to use external PSUs. There are thread about this, people did this successfully in the past for rendering rigs.

Btw, for your intended usage you probably would spend less with a dedicated PC with a mining motherboard. Seems you don't want to run macOS, no?
 
maybe of interest for your mulit-GPU setup @webbp -- a double pixla example (aka mikas mod) methodically documented in the following thread:

 
  • Like
Reactions: webbp
What exactly are you attempting to do machine learning wise? Having the largest amount of VRAM possible for optimal batching is best way to go. Doing tiny batches split across several smaller GPUs will be worse in long run.

If you are gonna throw down € 10,000, get a single RTX Quadro 8000, Titan RTX or A6000.
 
What exactly are you attempting to do machine learning wise? Having the largest amount of VRAM possible for optimal batching is best way to go. Doing tiny batches split across several smaller GPUs will be worse in long run.
I work in speech recognition and natural language (text) processing, and am trying to get into doing my own research and open-source development. For work, II use Google Cloud vms with one to four T4 GPUs per vm. These have 16GB of memory each, and 2xT4 perform almost 4x as fast as a single T4. This is likely because I use fairly small batch size, 10 or 20, which causes slower convergence but a better final result in my case. I find the T4 ~50% as performant as the V100 at 1/5 the cost (on gcp at least). Also, according to ai-benchmark, T4 is 90% as performant as a 980 Ti, but at 75W vs 250W TDP.

If you are gonna throw down € 10,000, get a single RTX Quadro 8000, Titan RTX or A6000.
Hehe I'm not really going to spend more than € 1,500 or so, but thought I would include that option in case someone else might be wondering.
 
Why you don't look at an used external PCIe chassis with a PCIe expander, adapt it to use external PSUs. There are thread about this, people did this successfully in the past for rendering rigs.

Btw, for your intended usage you probably would spend less with a dedicated PC with a mining motherboard. Seems you don't want to run macOS, no?
A dedicated PC with a mining motherboard is probably smarter, and maybe I will save up for one. I really like Mac Pros, though, and I do better work when I have an aesthetic or emotional connection with my tools. Also, 12-cores of xeon and 96 gb of ram is extremely helpful for multicore but not GPU-accelerated data analysis on large datasets.

Is an external PCIe chassis with a PCIe expander different from an eGPU because it connects directly to a PCIe slot rather than going through a Thunderbolt card? I haven't been able to find an affordable one except for this dubious number for ~ € 400 on ebay sold as-is, untested:
s-l1600.jpeg


I do have a flashed Thunderbolt 3 card on the way and am looking at eGPU though, despite the performance hit.

Why external GPUs instead of pixla's mod? I don't like how hot the GPUs make the entire interior of the Mac Pro during machine learning, and when I'm not doing a training run, there's no reason for them to be on at all, but I don't want to have to connect and disconnect them internally every time I switch between macOS and linux. I may try editing the NVIDIA GPU BIOS to underclock, limit max power, and use a more aggressive fan policy, and that might turn out to have less performance cost than an eGPU.

I am planning to do pixla's mod for my internal RX 580, though, which has better macOS support and is the GPU I use for actual graphics and monitor plug. My hopefully sufficient relevant experience consists of soldering the power socket back to the logic board on my PowerBook G3 in 1999, replacing broken iphone components, and crimping ethernet connectors. I hope that's sufficient. Until I get around to doing that, I'm being careful not to push the RX 580, and it has stayed quiet and cool, hopefully with no high power draws.

Just a bit of brainstorming now: what about connecting one or two CUDA GPUs internally to the PCIe slots, but connecting an external power supply to their power sockets, one with its own power switch, and then only switching it on before and off after booting into and shutting down linux? I (probably) have an extra Mac Pro 4,1 power supply I could use. I could in principle stick that in the optical drive area and run the power cord through an unused PCI bracket hole.

The addresses the power problem, but not the internal heat problem. Could I address the heat problem by just setting a much more aggressive fan policy on the Mac Pro? E.g., normal/quiet during normal usage, but fans at max >35 C. Alternatively, maybe I shouldn't worry about the ambient interior temperature of the Mac Pro because the affected components have their own fan control and overheating shutdown protection.

I unfortunately don't know enough about electrical engineering to know if these ideas are completely stupid and/or dangerous to me or to the hardware, so feedback is appreciated :)
 
  • Like
Reactions: Flint Ironstag
@webbp I feel you on the need to have an aesthetically pleasing (or at least inoffensive) rig to partner with. I'm right there with you. If you feel lucky, that Cubix Xpander is a bargain. Snag it.

I had similar thoughts on my rig re: heat and maximum expansion. Guess I'm finished with the Experimental Phase now and need to make a thread, but here's what I have as my personal box:

- Hackintosh Experts $99 method (Open Core based, support included)
- HP Z840
- 2x Xeon E5-26xx v3 10 Core 3.1 GHz
- 64GB
- Z Turbo Drive Quad PCIe (single 1TB NVME, 3 slots open)
- PCIe bluetooth & WiFi
- Thunderbolt 2 PCIe card
- PCIe host adapter card to:
- OSS CUBE3 PCIe expansion chassis
- 2x Radeon Vega Frontier Edition 16GB

Windows will boot the expansion chassis with no BIOS fiddling (PCIe bifurcation) with 4 GPUs all day long.

MacOS refuses to boot with any more than 2 so far. I can live with it for now. Have 2 more GPUs ready to go and would love to get them working.

Major downsides? only 3 so far:

0. only 2 GPUs recognized. WHY????????
1. no sleep (OK it's in a dedicated office)
2. you can definitely hear the PCIe chassis - may need to check out quiet fans.

Will attach pic.
 
  • Like
Reactions: webbp
A few test results to report: I edited the 980 Ti BIOS in Windows 10 to lower power consumption using
The default 8-pin EVGA 980 Ti Hybrid firmware allows the 8-pin PCIe socket to draw up to 175W, the 6-pin socket to draw up to 108W, and total draw up to 275W, which probably explains how I killed my logic board (or PSU). Max slot power was already 75W. I changed the 8-pin and 6-pin power settings to match the slot power settings: max 75W each, and changed the total max to 225W.

I am now running furmark at 1080p in windows while observing power consumption with GPU-Z, started 30 minutes ago. TDP is holding steady between 98-99%, PCIe slot power is holding steady ~61W, and 6-pin power and 8-pin power are now balanced and holding steady ~66W each. Temperature increased from ~23 C to ~41 C in ~5 minutes and has held steady after that. (The EVGA Hybrid has a liquid cooling system with external fan and heatsink.) Additionally, the furmark framerate is very steady ~132 fps, so changing the power settings this way has not caused any jerky throttling issues. CPU fan and temp also increased to 53 C and are holding steady.

I repeated the test at 4k. Framerate dropped to 27 fps, i.e., ~1/4 the framerate for 4x the pixels, while power and temperature held steady*. *CPU temperature rose from 53 C to 63 C and then stabilized.

Note: it was not possible to use the EVGA PowerLink because there was not enough clearance to the Mac Pro's PCIe fan.

Note 2: I plan to perform the dual-GPU variant of pixla's mod and use a 980 Ti and a 1080 Ti, but I am glad to see that the BIOS modification works, because it allows me to use the GPU for ML immediately, and a similar BIOS modification will be required after doing the dual-GPU pixla's mod to make sure power and temperature use stay within acceptable limits for continuous operation.

Note 3: I can provide more details about flashing the GPU BIOS on request.

Note 4: cross-posted at https://forums.macrumors.com/thread...ad-after-gpu-overheated.2297886/post-29953525 . Please reply about Mac Pro GPU machine learning here vs Mac Pro hardware failures caused by GPUs there.
 
Last edited:
Updates
  1. Windows 10 was freezing almost immediately after login until I uninstalled all boot camp drivers. That fixed it with no side-effects so far.
  2. CUDA-accelerated machine learning in pytorch on macOS is broken, although there is, as far as I can tell, exactly one person who got it to work: https://github.com/TomHeaven/pytorch-osx-build/releases
    I didn't try TomHeaven's macOS pytorch yet, but I do wonder what happens when an underlying torch function tries to use a cuDNN function that's only in 8+, not 7.6.5.
  3. CUDA-accelerated machine learning in julia is broken on macOS because it required cuDNN 8+, whereas 7.6.5 is the last and probably final macOS version
  4. I tried Pop!_OS, an Ubuntu-derived Linux distro which is supposed to have better NVIDIA support, but it had too many problems with my NVIDIA 980 Ti, so I switched to Ubuntu 21.04, which works well, though, unsurprisingly, it has a less pleasant desktop environment than macOS. (I use Ubuntu professionally, but on servers, not generally the desktop environment.).
  5. A little setup was required to make Ubuntu display with the 980 Ti.
  6. I was not able to get Ubuntu to display through the little GT 120 and use the 980 Ti only for machine learning, probably because they require mutually exclusive NVIDIA driver versions. It's likely possible, but I didn't pursue it further. It's likely also possible to display on an AMD GPU on use an NVIDIA GPU only for machine learning.
  7. I ran the julia CUDA-accelerated conv net mnist training sample for 100 epochs. It worked well, used the 980 Ti to nearly full capacity without any trouble, and temp stayed low and max power consumption stayed below 200W according to nvtop and nvidia-smi.
  8. I hope to run ai_benchmark https://ai-benchmark.com/ranking_deeplearning.html to compare machine learning training performance for a normal vs my power-limited BIOS 980 Ti GPU. (Have to downgrade some software first to run it though.)
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.