Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Flint Ironstag

macrumors 65816
Original poster
Dec 1, 2013
1,334
744
Houston, TX USA
One on order; but likely no macOS driver support (yet?) from what know. Chip is GV100-400-A1. List price is not reality on QTY or sample for larger customer, as usual. We paid way less than 3k.

Will put it on water as hybrid (if needed for VRM/chokes, as HBM is on DIE anyway); the air cooler is junk as always.

A Tesla V100 (PCIe, same TDP, passive cooler) costs around 10k list (8k realistic) and has no video outputs, you do however gain:

- ECC enabled
- 16GB HBM2 @ 4096Bit instead of 12GB @ 3072Bit (900GB/s mem BW vs. 650GB/s)
- 45Mhz more base but 75Mhz less boost (irrelevant)
- 29Mhz more memory clock (irrelevant)
- Power connectors on back for server case usage
- NVLink support, currently (either SW or HW) disabled on the Titan (current status unclear)

At this time unless for major AI - which is what we want to test - i would absolutely not buy it and go, if absolutely needed, with multiple Pascal cards.
 
Really?

Apple is doing everything that they can to eliminate Nvidia from the Apple realm.

It's funny that
Really?

Apple is doing everything that they can to eliminate Nvidia from the Apple realm.

You know what I think is funny, Apple released a machine learning framework to create models for coreML on iOS, tvOS and macOS. But to use GPU computing for training, which is basically a necessity you can only use CUDA and Nvidia GPUs. What a joke...
 
One on order; but likely no macOS driver support (yet?) from what know. Chip is GV100-400-A1. List price is not reality on QTY or sample for larger customer, as usual. We paid way less than 3k.

Will put it on water as hybrid (if needed for VRM/chokes, as HBM is on DIE anyway); the air cooler is junk as always.

A Tesla V100 (PCIe, same TDP, passive cooler) costs around 10k list (8k realistic) and has no video outputs, you do however gain:

- ECC enabled
- 16GB HBM2 @ 4096Bit instead of 12GB @ 3072Bit (900GB/s mem BW vs. 650GB/s)
- 45Mhz more base but 75Mhz less boost (irrelevant)
- 29Mhz more memory clock (irrelevant)
- Power connectors on back for server case usage
- NVLink support, currently (either SW or HW) disabled on the Titan (current status unclear)

At this time unless for major AI - which is what we want to test - i would absolutely not buy it and go, if absolutely needed, with multiple Pascal cards.

Does a multi card setup works for ai training in your case , or it’s largely locked to one GPU at a time ? Just curious.
 
Most AI/deep learning software can use multi GPUs fine separately (think async compute/DX12 in gaming) or by NVLink (or even SLI/CF, but we have not used this).

NVLink has advantages with massive speed but is not implemented in most frameworks (yet) that you see outside of universities or directly Nvidia sponsored/supported companies. Opensource software mostly is just based on scaling by GPUs local and some options to use Infiniband for "slower" workload interconnect.
 
I see. Not sure nvidia wants to put nvlinks on ‘GeForce’ cards because that way it can spike the price on its ‘pro’ cards.

But without nvlinks , do ai training programs use multi GPU setups ? Also how much does the size of vram matter for such purposes ?
 
I see. Not sure nvidia wants to put nvlinks on ‘GeForce’ cards because that way it can spike the price on its ‘pro’ cards.

But without nvlinks , do ai training programs use multi GPU setups ? Also how much does the size of vram matter for such purposes ?
NVlink (or other fast intra-GPU links) are only useful if the job requires very low latency communication between different GPUs, or heavy access to the combined RAM of all GPUs as a single memory pool. For minor communication needs, DMA requests through the PCIe bus (either peer-to-peer or through host RAM).

If you look at the HPC supercomputers, the big ones have thousands of servers - so obviously NVlink isn't being used at that scale. Network links are used between servers.

Summit will deliver more than five times the computational performance of Titan’s 18,688 nodes, using only approximately 4,600 nodes when it arrives in 2018.

Like Titan, Summit will have a hybrid architecture, and each node will contain multiple IBM POWER9 CPUs and NVIDIA Volta GPUs all connected together with NVIDIA’s high-speed NVLink. Each node will have over half a terabyte of coherent memory (high bandwidth memory + DDR4) addressable by all CPUs and GPUs plus 800GB of non-volatile RAM that can be used as a burst buffer or as extended memory.

To provide a high rate of I/O throughput, the nodes will be connected in a non-blocking fat-tree using a dual-rail Mellanox EDR InfiniBand interconnect.

https://www.olcf.ornl.gov/summit/

A lot of programming effort goes into making sure that the most used data is in the GPU's RAM, the next tier of data is within the RAM of other GPUs in the same server, third tier in the local server's RAM, and the next tier in remote servers' RAM.
 
NVlink (or other fast intra-GPU links) are only useful if the job requires very low latency communication between different GPUs, or heavy access to the combined RAM of all GPUs as a single memory pool. For minor communication needs, DMA requests through the PCIe bus (either peer-to-peer or through host RAM).

If you look at the HPC supercomputers, the big ones have thousands of servers - so obviously NVlink isn't being used at that scale. Network links are used between servers.



A lot of programming effort goes into making sure that the most used data is in the GPU's RAM, the next tier of data is within the RAM of other GPUs in the same server, third tier in the local server's RAM, and the next tier in remote servers' RAM.
Thanks. Except for low latency communication between two or more GPUs where it is has speed advantages, nvlink isn’t crtitical for multi GPU setup for ai training. In other words , 4 cards
( say 1080tis ) without nvlink can perform better than two Voltas (with nvlink )yes ?

Except for medical imaging and cad/cam/DCC apps where you need a single powerful GPU to drive the viewports and the display hardware, most other compute use cases would benefit with multi GPU setups.

I wonder why Apple suggested its customer base were looking for a single large GPU ( vs a multi GPU setup)
 
Last edited:
Not all but a great number of machine learning deep learning and scientific computer is server based. That is not something Apple is in the field off, most big servers run some kind of linux. Super computers with big nodes of GPU's lots and lots and lots of them.

Apple doesn't make those, its a specalised niche market, as for multi gpu's once again it is software specific and task specific.

Nvidia is slowing offering less and less support for SLI, they basically said a while back that they wont officially support 3 way or 4 way any more.

NVLink is specialised as well.

The Titan V that we are talking about here doesn't support SLI or NVLink, thats not coming from me that is coming from Nvidia who said that when they released the card.

On an interesting note I saw pictures of a PC build the other day with 4 Titan V's in it.
 
because that's the only product (iMac Pro) they had in the pipeline to sell.

Yeah. That April talk suggested something was very off in the Pro dept. perhaps the iMac was supposed to be the replacement for the tcMP but their sample customers gave it a thumbs down, large enough that they had to call that conference and suggesting that customers wanted single large GPUs when the trend for any GPU based usage suggests that multi GPU is the preferred option, given a choice.

I think AMD also had a hand in that dual GPU feature of the tcMP ( read somewhere they too were betting on low powered multi GPU future vs Monolithical ones )

Turns out the trend is for multi monolithic GPUs, as many as the customer can afford.
 
Last edited:
  • Like
Reactions: Flint Ironstag
perhaps the iMac was supposed to be the replacement for the tcMP but their sample customers gave it a thumbs down

It's not just their customers, take a larger view of the Mac content creation software world, everyone is running for the lifeboats and going cross-platform. Reminds me of back in the day when Desktop Publishing was Apple's bread & butter, Adobe went all-in on windows versions, and Quark went NT-first. Macs weren't price or option competitive then, either.

I think AMD also had a hand in that dual GPU feature of the tcMP ( read somewhere they too were betting on low powered multi GPU future vs Monolithical ones )

Yup, like Apple is selling its users a future of computing arbitrarily defined by the products they want to sell, AMD sold Apple a future vision that was defined by what AMD were capable of making, and had no connection with reality aside from that.

Turns out the trend is for multi monolithic GPUs, as much as the customer can afford.

Bingo. And that's the thing with the generic slotbox paradigm - it's inherently anti-fragile in terms of future developments.
 
We are talking about trends here, I think it is important for people to specify their use cases when talking about trends.

The big monolithic GPU has become king, but how many of them is very very user specific. GPU based machine learning will use lots, where as an editor or whatever may not, depending on what software they use.
 
We are talking about trends here, I think it is important for people to specify their use cases when talking about trends.

The big monolithic GPU has become king, but how many of them is very very user specific. GPU based machine learning will use lots, where as an editor or whatever may not, depending on what software they use.
I think Apple and AMD were right, but failed to execute on nMP (lack of updates): 1 UI / 1 compute. If you can afford / need more compute GPUs, great! Plenty of iMac Pros are getting eGPUs to go along with them. That's where the trend is going. It's how I like to work: heavy lifting going on in the background with sufficient resources to maintain a fluid, lag-free user interface. Of course big jobs go to a cluster.

@singhs.apps nailed it. Anecdote, small sample size, etc. incoming: I know plenty of small business owners who game on their Macs. Some of them work from home, and a job could well be rendering, training, what have you overnight on a single monolithic GPU, tying up their primary machine. [edit] In fact the Titans are targeted at precisely this audience.

If you have spare cores, why not add another powerful GPU to game on at the same time? When you’ve finished playing, that GPU goes back into the compute pool. Win / win.

Adding 2 Titan Xps to a nMP gets you ~30TFLOPs of power for a whole lot less than an iMac Pro, too. Go with 1080 Tis to save some serious $ and game at 4k all day long. Plus you have the flexibility of multiple cards to say: run an image through 3 versions of a neural network simultaneously, each on its own GPU.

While I'm daydreaming here, I wouldn't be surprised to see a GPU of their own design integrated across the lineup strictly for display. Then choose your flavor of compute GPU.
 
Last edited:
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.