Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

exoticSpice

Suspended
Jan 9, 2022
1,242
1,952
debut in Mac Pro? Errr, a tether-less , battery only powered, VR/AR headset likely has much higher pressing need for a hyper low power consumption RT hardware.

Several of these "work way smarter , not harder" patents smell far more driven by being limited by a small battery than on primarily focused on constructing some Nvidia x090/x080 or AMD x900/x800 'killer' large GPU.

It would trickle out the Macs ( and other SoCs) , but it won't be surprising if it doesn't start there.
Yes it will debut in a headset but all rumours point to M2 being in headset and M2 has no Hardware accelerated RT.
 

mi7chy

macrumors G4
Oct 24, 2014
10,625
11,298
4090 is projected to be about 7.36x faster than 64GPU M1 Ultra based on Nvidia claiming 4090 is 1.7x the performance of 3080ti. 3080ti scores 5937.93 according to Blender Open Data so 1.7x is 10094.48 vs 1371.46 for 64GPU M1 Ultra. Blender supports multi-GPUs so you can always create a $40K render farm of eight 64GPU Mac Studios to match one 4090.

Turns out 4090 is even faster than projected at ~9x over M1 Ultra 64GPU.

https://opendata.blender.org/benchmarks/query/
1665690795528.png

1665690850667.png
 

jmho

macrumors 6502a
Jun 11, 2021
502
996
The 4090 is an incredible card, and I hope Apple are sweating and rolling up their sleeves for the work ahead of them.
 
  • Like
Reactions: singhs.apps

mi7chy

macrumors G4
Oct 24, 2014
10,625
11,298
No ****. It's got more cores and a massive 400 watt power.

Not only faster but more efficient at 2.2x greater performance per watt. 11/2/2022 will be interesting to see if AMD RDNA3 can answer. Guessing things will remain the same with Nvidia #1, AMD #2 and Apple last behind $329 Intel Arc A770.

4090
12287.86 points / 450W = 27.3 points/W

64GPU M1 Ultra
1371.46 points / ~110W = 12.5 points/W

1665695711883.png

1665690850667-png.2094319
 
Last edited:

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,665
OBX
Not only faster but more efficient at 2.2x greater performance per watt. 11/2/2022 will be interesting to see if AMD RDNA3 can answer. Guessing things will remain the same with Nvidia #1, AMD #2 and Apple last.

4090
12287.86 points / 450W = 27.3 points/W

64GPU M1 Ultra
1371.46 points / ~110W = 12.5 points/W
I didn't think there were any loads to draw the full wattage from the M1 Ultra.
 

deconstruct60

macrumors G5
Mar 10, 2009
12,493
4,053
Not only faster but more efficient at 2.2x greater performance per watt. 11/2/2022 will be interesting to see if AMD RDNA3 can answer. Guessing things will remain the same with Nvidia #1, AMD #2 and Apple last behind $329 Intel Arc A770.

4090
12287.86 points / 450W = 27.3 points/W

64GPU M1 Ultra
1371.46 points / ~110W = 12.5 points/W

View attachment 2094381
1665690850667-png.2094319



There is a context missing there though. Nvidia 3rd gen specialized hardware RT (and 2nd-3rd gen software library). AMD/Intel partial RT hardware . Apple little fixed function hardware at all.


CardblenderWattspts/W
Nvidia 4090
12265​
450​
27​
Nvidia 3080
5055 320 16
Intel 770
1630​
225​
7​
Intel 750
1607​
225​
7​
Intel 750M
1379​
150​
9​
M1 Ultra 64
1371​
110​
12​
M1 Ultra 48
1242​
83​
15​
M1 Max 32
818​
55​
15​
M1 Max 24
712​
41​
17​
AMD 6800
1531​
250​
6​
AMD 6700
1308​
175​
7​
AMD 6650
1021​
175​
6​
AMD 6600
1008​
132​
8​
AMD 6700M
1082​
135​
8​


In the pts/W, Apple is clearly out in front of the Intel/AMD mobile offerings. With no specialized hardware RT they are approximately matching 2nd Gen 3080 metrics. So spending no extra die space on this and doing better than Intel and AMD who did.

I highly doubt Apple is using the desktop 3090 or 4090 as a 'goal metric' for this next iteration. If Apple could cover the desktop AMD 6800 with a Max sized die with 32-40 core GPU with no TDP Watt increase, then I suspect they would declare that a 'win'. [NOTE: the 3080Ti laptop is 3802 ] If there is any 4000 series target then it would be the 4080M/4070M. Apple is stalking the largest mobile dGPUs run inside of larger laptops. Whatever they scale up from there is what they get to cover the upper desktop range. Where the Intel 750M is in mobile GPU deployment ( roughly equal power to the M1 Ultra ) the result isn't all that different. I don't think they are going to 100% catch the mobile 4070M, but covering a W6800X duo with an "two die SoC" would likely get trumpeted as a win.


If Apple adds some specific RT hardware in next iteration or two then the pts/W isn't going to be very far off at all.
that '27' is largely fixed-function hardware difference where have taken the bulk of the work off of general computation cores. Nvidia actually has less room to move "more" to fixed function at this point. ( Nvidia tossed NVLink from the die to toss in more specialed hardware. For the Studio/Mac Pro destine dies it is unlikely Apple is going to toss UltraFusion for more specialized cores. So likely mostly waiting on higher transistor allocation budgets (e.g., TSMC N3 and after ) for major moves.
 

leman

macrumors Core
Oct 14, 2008
19,522
19,679
No ****. It's got more cores and a massive 400 watt power.

Only twice as many cores though. The performance advantage is mostly because of hardware RT acceleration and of course the much more mature backend. Would be interesting to know the actual power usage
 
  • Like
Reactions: jmho

sirio76

macrumors 6502a
Mar 28, 2013
578
416
There are already tests about power consumption (for the FE card), when heavily utilized it will consume from 400 up to 450W, likely more for other 4090 producer. People that like to compare that to an M1 should not forget that this is just for the GPU, realistically the system consumption will be easily 700W or more. Nvidia recommends at least an 850W PSU and a bare minimum of 1000W for a TR system. Non saying that performance are not good, but it’s just in a different league when it comes to power consumption, and noise, and heat.
 

quarkysg

macrumors 65816
Oct 12, 2019
1,247
841
One advantage (probably not a lot I supposed) Apple's UMA has over the traditional design is that all IP (CPU, GPU, NPU, AMX, ISP) blocks can work on the same dataset to solve a problem simultaneously.
 

leman

macrumors Core
Oct 14, 2008
19,522
19,679
One advantage (probably not a lot I supposed) Apple's UMA has over the traditional design is that all IP (CPU, GPU, NPU, AMX, ISP) blocks can work on the same dataset to solve a problem simultaneously.

It's a massive advantage, it's just that most software is designed with dGPU model in mind so it doesn't take advantage of it. Another big advantage is the ability to work with very large datasets.

And efficiency at high performance because the more capable an Apple GPU is, the less efficient it is.

Can you elaborate this?
 

leman

macrumors Core
Oct 14, 2008
19,522
19,679
The higher the number of cores, the lower the points/W of Apple's GPU.

GPUPointsConsumption(W)Points / W
M1 Ultra 64137111012
M1 Ultra 4812428315
M1 Max 328185515
M1 Max 247124117

This only illustrates that there is a software problem with how Blender currently utilizes the hardware.
 

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,665
OBX
The higher the number of cores, the lower the points/W of Apple's GPU.

GPUPointsConsumption(W)Points / W
M1 Ultra 64137111012
M1 Ultra 4812428315
M1 Max 328185515
M1 Max 247124117
What is even more interesting is the 3090 (base) gets the same 17 points per watt as the M1 Max 24. For the 4090 to double the 3090 score yet only use 29% more power feels like that should be impressive for sure.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,628
1,101
For the 4090 to double the 3090 score yet only use 29% more power feels like that should be impressive for sure.
That's the magic of TSMC.
 

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,665
OBX
Only twice as many cores though. The performance advantage is mostly because of hardware RT acceleration and of course the much more mature backend. Would be interesting to know the actual power usage
I noticed there are no 3090ti or 4090 CUDA scores for the Blender Bench. With what is there, the 3080 is twice as fast as the M1 Ultra 64. AFAICT CUDA doesn't use the RT backend so it is more of a "fair fight" with Metal.
 

Xiao_Xi

macrumors 68000
Oct 27, 2021
1,628
1,101
I noticed there are no 3090ti or 4090 CUDA scores for the Blender Bench. With what is there, the 3080 is twice as fast as the M1 Ultra 64. AFAICT CUDA doesn't use the RT backend so it is more of a "fair fight" with Metal.
CUDA cores are slower than RT cores, so people don't use them.

nvidia.png

 
Last edited:

leman

macrumors Core
Oct 14, 2008
19,522
19,679
I noticed there are no 3090ti or 4090 CUDA scores for the Blender Bench. With what is there, the 3080 is twice as fast as the M1 Ultra 64. AFAICT CUDA doesn't use the RT backend so it is more of a "fair fight" with Metal.

The 3080 has 30TFLOPS, the M1 Ultra has 20TFLOPS, so that makes a difference of 50%. Whether the other 50% are because of the software or hardware maturity or some other factor, we don't know. What's quite interesting though that if the 24-core M1 Max scaled linearly we would expect 712/24*64 = ~1900 points, and multiplying it by 1.5 (the 50% of TFLOPS difference to 3080) yields 2850, which is oddly similar to the CUDA score (2900) that the 3080 has. Sure, it's just some random napkin arithmetics, but the sheer coincidentally of all this almost suggests like the Metal Blender backend has some massive thread scaling issues.

P.S. Your wattage estimates for the M1 blender seem quite off. I am not even breaking 30W running blender benchmark on my 32-core Max...
 

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,665
OBX
The 3080 has 30TFLOPS, the M1 Ultra has 20TFLOPS, so that makes a difference of 50%. Whether the other 50% are because of the software or hardware maturity or some other factor, we don't know. What's quite interesting though that if the 24-core M1 Max scaled linearly we would expect 712/24*64 = ~1900 points, and multiplying it by 1.5 (the 50% of TFLOPS difference to 3080) yields 2850, which is oddly similar to the CUDA score (2900) that the 3080 has. Sure, it's just some random napkin arithmetics, but the sheer coincidentally of all this almost suggests like the Metal Blender backend has some massive thread scaling issues.

P.S. Your wattage estimates for the M1 blender seem quite off. I am not even breaking 30W running blender benchmark on my 32-core Max...
I dunno where the wattage estimates come from, I barely see 20W on my 24 core model.

Going by TFLOP is a poor metric as the Ultra is "slower" than a 3060 Laptop GPU which is a 7 TFLOP card (10 if you use the boost number). But I see your point otherwise.
 

leman

macrumors Core
Oct 14, 2008
19,522
19,679
Going by TFLOP is a poor metric as the Ultra is "slower" than a 3060 Laptop GPU which is a 7 TFLOP card (10 if you use the boost number). But I see your point otherwise.

Those results are weird anyway since the mobile 3060 is somehow faster than the desktop one?
 

sirio76

macrumors 6502a
Mar 28, 2013
578
416
CUDA cores are slower than RT cores, so people don't use them.
it doesn’t work like that, the GPU will use all the available cores when rendering, both traditional and RTX, and RTX hardware is not “faster“ it’s just designed to speedup some aspect of the rendering. Depending on the scene RT cores may accelerate the rendering a lot, in other case not so much.
 
  • Like
Reactions: Xiao_Xi

diamond.g

macrumors G4
Mar 20, 2007
11,438
2,665
OBX
it doesn’t work like that, the GPU will use all the available cores when rendering, both traditional and RTX, and RTX hardware is not “faster“ it’s just designed to speedup some aspect of the rendering. Depending on the scene RT cores may accelerate the rendering a lot, in other case not so much.
What is the point of the optix renderer if the cuda renderer also uses the RT hardware?
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.