Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

aytan

macrumors regular
Dec 20, 2022
161
110
The nicest solution would be TB with a first party interface to manage load balancing for you.

The least nice solution would be to not even bother connecting them and just load balance your scene manually - for example if you have 2 studios and want to render a 1000x500 image just render the left 500x500 pixels on Studio 1, and the right 500x500 pixels on Studio 2.
endless possibilities ahead :) I wish Apple enable TB daisy chaining again like 10 years ago or external GPU/extended GPU/compute module solution. Maybe in another universe they already did. Who knows
 
  • Like
Reactions: iPadified

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101

innerproduct

macrumors regular
Jun 21, 2021
222
353
to me the issue is mostly price. Mac studio is the the imac replacement and is competitive with what we would have had if apple stayed in intel/amd. Fully loaded ultra is very similar in perf as a 13900 with a w7900. But now the price for the whole machine including 5k screen is twice the price as the loaded imac. (Since you can’t get around the insane prices for ram anymore etc)
An m2 ultra 60 core should really be priced at sub 3000 and the screen at 1200 or so.
But oc, The bigger problem is the lack of a real pro tier machine.
Anyway, this is what we got for now. At least the studio is finally working(it scales) as it should have already in the m1 ultra
 
  • Like
Reactions: iPadified and aytan

mi7chy

macrumors G4
Oct 24, 2014
10,622
11,294
Let's say you want to render a 10 frame animation on 10 Mac Studios. You have your scene on a shared drive, each Mac Studio reads the scene into memory (sure, over 10Gb ethernet). Then Mac Studio 1 renders frame 1, Mac Studio 2 renders frame 2. Congratulations you've now rendered 10 frames in the time it takes 1 Mac Studio to render 1 frame. Perfect 10x scaling. 100 Studios would give you 100x scaling etc.

(Obviously this is a very simple load balancing solution and you'd probably want a better one in practice)

Compare that to the PC where even if you've managed to load your scene into the VRAM of every single card, your single CPU is going to be sending commands back and forth over PCIe continuously to each GPU constantly, meanwhile with the Mac you've got 10 CPUs - 1 CPU per 1 GPU, so you're never going to get CPU bottlenecked.

Each second of an animation is at least 24 frames * 60 seconds * number of minutes with the average 90 minute animated film at 24 fps * 60 * 90 = 129,600 frames so far from 10 frames. You need fast I/O to distribute the workload, move the scene assets to render workers and move the finished rendered scene back to a central node to combine into an animation. Isn't Thunderbolt 4 a ring topology so an ever increasing bottleneck beyond a few nodes if it has to recopy data from node to node along the ring to the destination node? Furthermore, supposedly only 22Gbit/s of Thunder 4 40Gbit/s is usable for data transfer. For comparison, PCIe 4.0 x16 is 31.5Gbyte/s (not Gbit/s) and if you need more than 24GB VRAM you can upgrade without throwing out the whole system to 300W 48GB RTX A6000 with 115.2Gbyte/s NVLink between a pair and with a 64-core Threadripper Pro 5995WX as used in the YouTube video it has 128 PCIe 4.0 lanes so plenty of I/O bandwidth. Nevertheless, it'll be interesting to see an M2 Ultra render farm on TB 4 ring for science.
 

jmho

macrumors 6502a
Jun 11, 2021
502
996
Each second of an animation is at least 24 frames * 60 seconds * number of minutes with the average 90 minute animated film at 24 fps * 60 * 90 = 129,600 frames so far from 10 frames. You need fast I/O to distribute the workload, move the scene assets to render workers and move the finished rendered scene back to a central node to combine into an animation. Isn't Thunderbolt 4 a ring topology so an ever increasing bottleneck beyond a few nodes if it has to recopy data from node to node along the ring to the destination node? Furthermore, supposedly only 22Gbit/s of Thunder 4 40Gbit/s is usable for data transfer. For comparison, PCIe 4.0 x16 is 31.5Gbyte/s (not Gbit/s) and if you need more than 24GB VRAM you can upgrade without throwing out the whole system to 300W 48GB RTX A6000 with 115.2Gbyte/s NVLink between a pair and with a 64-core Threadripper Pro 5995WX as used in the YouTube video it has 128 PCIe 4.0 lanes so plenty of I/O bandwidth. Nevertheless, it'll be interesting to see an M2 Ultra render farm on TB 4 ring for science.
It doesn’t matter if there are 130k frames because there is no temporal dependence between each frame. Rendering frame 2 doesn’t need any information from frame 1 so those machines don’t need to talk to each other.

There is no real reason to put them in a ring either. You’d be better off connecting them all to a network.

If you have 1 CPU talking to 8 GPUs then yes the CPU needs to be able to talk to 8 GPUs at once. If you have 8 computers, each computer doesn’t need to know that the other 7 even exist.
 

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,664
OBX
It doesn’t matter if there are 130k frames because there is no temporal dependence between each frame. Rendering frame 2 doesn’t need any information from frame 1 so those machines don’t need to talk to each other.

There is no real reason to put them in a ring either. You’d be better off connecting them all to a network.

If you have 1 CPU talking to 8 GPUs then yes the CPU needs to be able to talk to 8 GPUs at once. If you have 8 computers, each computer doesn’t need to know that the other 7 even exist.
I do have a question about that, if there are frames that are easier to process/render how does the cluster know to not complete them out of order (I guess we are assuming this is a video animation we are rendering)? Or does the order not matter then either?
 

leman

macrumors Core
Oct 14, 2008
19,521
19,675
I do have a question about that, if there are frames that are easier to process/render how does the cluster know to not complete them out of order (I guess we are assuming this is a video animation we are rendering)? Or does the order not matter then either?

Why would the order matter? The rendered frames can be assembled into a video afterwards.
 
  • Like
Reactions: sirio76

leman

macrumors Core
Oct 14, 2008
19,521
19,675
What would you need to be sending over 10Gb ethernet after the initial load?

Let's say you want to render a 10 frame animation on 10 Mac Studios. You have your scene on a shared drive, each Mac Studio reads the scene into memory (sure, over 10Gb ethernet). Then Mac Studio 1 renders frame 1, Mac Studio 2 renders frame 2. Congratulations you've now rendered 10 frames in the time it takes 1 Mac Studio to render 1 frame. Perfect 10x scaling. 100 Studios would give you 100x scaling etc.

(Obviously this is a very simple load balancing solution and you'd probably want a better one in practice)

Compare that to the PC where even if you've managed to load your scene into the VRAM of every single card, your single CPU is going to be sending commands back and forth over PCIe continuously to each GPU constantly, meanwhile with the Mac you've got 10 CPUs - 1 CPU per 1 GPU, so you're never going to get CPU bottlenecked.

What stops you from implementing the same distributed schema on the PC?
 
  • Like
Reactions: singhs.apps

leman

macrumors Core
Oct 14, 2008
19,521
19,675
Absolutely nothing. It's just that in my mind Mac Studios are begging to be stacked neatly on top of each other :D

And now imagine a Mac Pro with multiple slotted SoC boards (each with a healthy amount of private RAM), connected by a common 128-PCIe lane backplane that also hosts access to a large pool of shared memory. It would be ideal for the kind of application you describe.
 

senttoschool

macrumors 68030
Nov 2, 2017
2,626
5,482
Apple demonstrated ML training with four M1 Ultra.
1687432938920.png



Scales pretty well. Seems to be using Horovod: https://en.wikipedia.org/wiki/Horovod_(machine_learning)
 
  • Like
Reactions: aytan

jmho

macrumors 6502a
Jun 11, 2021
502
996
And now imagine a Mac Pro with multiple slotted SoC boards (each with a healthy amount of private RAM), connected by a common 128-PCIe lane backplane that also hosts access to a large pool of shared memory. It would be ideal for the kind of application you describe.
Potentially, but at the same time there is something to be said for a render-farm of smaller independent nodes.

If you have 10 nodes and one of them breaks, you still have 9 frames completed, and you can just swap out the broken Mac Studio with a new one while the other ones continue working.

If you have a Mac Pro with 10 SoCs and something breaks your entire render farm is out of action until you fix things (even if that hopefully is just pulling out 1 dead SoC)
 

senttoschool

macrumors 68030
Nov 2, 2017
2,626
5,482
And now imagine a Mac Pro with multiple slotted SoC boards (each with a healthy amount of private RAM), connected by a common 128-PCIe lane backplane that also hosts access to a large pool of shared memory. It would be ideal for the kind of application you describe.
I'm a cloud guy. You know that. If Apple is going go through the trouble of doing that, don't you think it makes more sense for Apple to create a cloud version of this sort of setup where people can rent it?

The market for local workstations like that is smaller and smaller every day. The market for cloud workstations is growing every day. Does Apple really want to be on the side of a declining trend?
 

leman

macrumors Core
Oct 14, 2008
19,521
19,675
I'm a cloud guy. You know that. If Apple is going go through the trouble of doing that, don't you think it makes more sense for Apple to create a cloud version of this sort of setup where people can rent it?

No, I don't. Because this means competing on a very different market with very different margins. It's not Apple's business and I doubt they could make it profitable with their technology. They are simply not positioned for this kind of push.

The market for local workstations like that is smaller and smaller every day. The market for cloud workstations is growing every day. Does Apple really want to be on the side of a declining trend?

This is a fair point. Very possible that you are right. But it's also possible that by offering a compelling product for a reasonable price a certain niche can be carved out. Apple's technology would work well in this scenario. Whether they are interested is a whole different question.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,627
1,101
it's also possible that by offering a compelling product for a reasonable price a certain niche can be carved out. Apple's technology would work well in this scenario.
Apple doesn't compete on price, it offers features that no one else has. What could Apple offer that no one else does? What would make an Apple render farm a success?
 

jmho

macrumors 6502a
Jun 11, 2021
502
996
Yeah I wasn't sure about that, so the rendering can just spit out a bunch of PNG and you can make a video of them afterwards?
Yeah. Thats the preferred way to do things again so that if one render gets messed up you can just fix one frame instead of needing to render everything again.
 
  • Like
Reactions: diamond.g

leman

macrumors Core
Oct 14, 2008
19,521
19,675
Apple doesn't compete on price, it offers features that no one else has. What could Apple offer that no one else does? What would make an Apple render farm a success?

Exactly. I don't think there is anything.
 

senttoschool

macrumors 68030
Nov 2, 2017
2,626
5,482
Apple doesn't compete on price, it offers features that no one else has. What could Apple offer that no one else does? What would make an Apple render farm a success?
Same reason why Macs can sell even when the value was poor - macOS. In this case, both macOS and possible integration with local macOS.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,675
Same reason why Macs can sell even when the value was poor - macOS. In this case, both macOS and possible integration with local macOS.

MacOS is great on local desktop. When running things in the cloud you often don’t even know (or care) what OS you are running on. How would that be a selling point?

If you are talking about service integration for personal Mac computers - that’s even worse. It’s a very small market. If I am an academic AI researcher with a tight grant budget I care about minimizing costs, not maximizing convenience. There is of course space for services like Xcode cloud, which cannot be easily replicated, but Apple doesn’t even need in-house hardware for that.
 
  • Like
Reactions: Xiao_Xi

sirio76

macrumors 6502a
Mar 28, 2013
578
416
If there is a little hope for daisy chaining I m sure I will go for it with a couple of Max and Ultra. Wish this could happen in my life time, not soon in a far far away galaxy...
Getting more computer to render the same scene at the same time have been possible for ages, for example you can run Vray DR over a number of slaves (Mac or PC doesn’t matter) and you will have all the speed up you need, scaling is very good too, especially using bucket rendering.
 
  • Like
Reactions: aytan and jmho

aytan

macrumors regular
Dec 20, 2022
161
110
Getting more computer to render the same scene at the same time have been possible for ages, for example you can run Vray DR over a number of slaves (Mac or PC doesn’t matter) and you will have all the speed up you need, scaling is very good too, especially using bucket rendering.
Sure, I have been used C4D Team Render with couple of MacPro's and iMac's several times but never used VRay for this. Also there is another problem shiny magnificent superior magical ''Subscription Model'' :). For Maxon/Redshift you had to buy another license for each machine no matter what. I have no idea how it is work with VRay DR, if it is reasonable price it could work in logical budgets.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.