If the XBox-X has a 12 TFLOP GPU on its SoC, what stops Apple doing the same?

diamond.g · Jul 19, 2020

leman said:
Metal has different managed API layers and is not directly comparable to the DX version model. Since we don’t know how GFXbench implements their tests we need to treat all these as approximations anyway. You won’t get accurate results with any if theses, but you can get a general idea. So yes, you can run the DX11 benchmark, I doubt that the rendering setup is that much different from the a Metal one.

Newer AMD hardware (GCN4 and newer) appear to have better DX12 and Vulcan performance than DX11. But as you said it does depend on what they are doing.

it is hard to say for sure, but on cards where there is a metal and directx api it looks like directx is faster than metal.

leman · Jul 19, 2020

diamond.g said:
it is hard to say for sure, but on cards where there is a metal and directx api it looks like directx is faster than metal.

Would not surprise me at all. Windows gaming drivers are usually better quality, not to mention that the drivers include application-specific optimizations.

JacobHarvey · Jul 20, 2020

leman said:
If this is the case, then UL have messed things up. What’s the point of a benchmark if it does different things on different platforms?

Yes they run differently across platforms, but the benchmarks are still useful for directly comparing performance across different mobile devices with different SoC GPUs

diamond.g · Jul 20, 2020

Does Metal support Mesh/Primative shaders? That seem like modern IMR GPUs are going to get a speed boost from them.

leman · Jul 20, 2020

diamond.g said:
Does Metal support Mesh/Primative shaders? That seem like modern IMR GPUs are going to get a speed boost from them.

Not in the same form as DX12. You can generate geometry on the GPU in Metal and submit flexible drawing commands using compute shaders, but I don’t think you can use local cache to submit data to the rasterizer like with mesh shaders. I’m also not sure it would work with tiled shading anyway - you’ll have to hit the memory to construct the geometry lists in any case.

diamond.g · Jul 20, 2020

leman said:
Not in the same form as DX12. You can generate geometry on the GPU in Metal and submit flexible drawing commands using compute shaders, but I don’t think you can use local cache to submit data to the rasterizer like with mesh shaders. I’m also not sure it would work with tiled shading anyway - you’ll have to hit the memory to construct the geometry lists in any case.

Ah, okay. It seems like for the IMR crowd it was supposed to be a big deal with Vega10 and Turing. Though in my readings, some devs laugh because it is a return to what the PS2 did with it's VU.

In respects to the A12X being as powerful as XB1S, it looks like they were referring to TFLOPs based on this post. The following post seems to clarify that while knowing ALUs is nice, FMAs are what is counted and that the Apple part doesn't appear to dual issue FP16 vs FP32 so you don't get "double the performance" for half precision shaders.

leman · Jul 20, 2020

diamond.g said:
In respects to the A12X being as powerful as XB1S, it looks like they were referring to TFLOPs based on this post. The following post seems to clarify that while knowing ALUs is nice, FMAs are what is counted and that the Apple part doesn't appear to dual issue FP16 vs FP32 so you don't get "double the performance" for half precision shaders.

I can't find the reference right now, but I am pretty sure that that Apple GPUs can execute FP16 on a faster rate than the FP32. I doubt that they can dual issue them, that sounds like a waste of silicon — they probably reuse the ALU to do one FP32 ops or 2 FP16 ops per cycle. Anyway, the A12X should be somewhere around 1 TFLOP FP32 (counting MADD).

All of this should be testable though...

diamond.g · Jul 20, 2020

leman said:
I can't find the reference right now, but I am pretty sure that that Apple GPUs can execute FP16 on a faster rate than the FP32. I doubt that they can dual issue them, that sounds like a waste of silicon — they probably reuse the ALU to do one FP32 ops or 2 FP16 ops per cycle. Anyway, the A12X should be somewhere around 1 TFLOP FP32 (counting MADD).

All of this should be testable though...

Yeah I see where I got confused, you are right, Apple should have double the FP16 perf as FP32. Dual issue would allow you to do both FP32 and FP16 in a cycle. I got it conflated. So I guess we are back to not knowing if Apple was talking FP32 performance or FP16 then. We know the 8th Gen consoles don't support FP16.

PortoMavericks · Jul 20, 2020

leman said:
This could be solved with high bandwidth system RAM (e.g. HBM), a large enough cache and more memory controllers. Such approach won’t be cheap, but costs are less of an issue for Apple - their products are already priced at a premium and they don’t have to compete with other chip makers as they produce for themselves only.

Are you referring to TBDR? That’s an interesting topic. From my layman understanding, TBDR renderers didn’t establish themselves in the desktop segment because they are much more complex and because with a larger thermal budget a forward renderer can just brute force its way through. A criticism often brought up with TBDR is poor geometry throughput - less of an issue with mobile applications and their traditionally lower polygon counts, but critical for high-poly PC games. But that was the state of the art ten years ago. Apple seems to have solved it by utilizing the unified shader pipeline - since geometry, compute and fragment processing runs asynchronously on the same hardware, it’s easier to balance out the eventual bottlenecks. As to why Nvidia and co don’t use it - well, probably because they were not interested in this tech. Their stuff works well enough and they did borrow some ideas like tiling (but without deferred fragment shading) to make their GPUs more efficient. Revolutions sometimes come simply because someone has tried (and succeeded) something that others thought would not work. Again, think about MacBook Air or HiDPI screens. Those were laughed at in the beginning.

The question is, will Apple use HBM?

JMacHack · Jul 20, 2020

I'd say die size would stop Apple from doing the same. The XB1X has a SoC of over 300mm2, the A12 is just 12mm2. That's over an order of magnitude larger, and an order of magnitude larger than current Intel desktop chips. I don't think the yield on chips of that size would be high enough to justify it.
Also, the size of the heatsink on the Xbox is gargantuan. doubtful that form-factor-conscious Apple wants to use heatsinks that big.

On the topic of TFlops translating to performance, I'd say it's never accurate to real-world performance. The best example I can think of is AMD's Vega GPUs. The Radeon Vega 7 managed to output over 14Tflops, yet got beat by the 1080ti (a card from two years beforehand) which managed only 11.3 TFlops. In fact, the 2080s only does 11 or so TFlops and outperforms the 1080ti (and needless to say, every AMD Card).

Personally I'm fascinated by Vega since it's a compute monster architecture, yet managed only middling graphics performance. And oddly enough it's very happy and efficient as an integrated GPU.

PortoMavericks · Jul 20, 2020

diamond.g said:
Yeah I see where I got confused, you are right, Apple should have double the FP16 perf as FP32. Dual issue would allow you to do both FP32 and FP16 in a cycle. I got it conflated. So I guess we are back to not knowing if Apple was talking FP32 performance or FP16 then. We know the 8th Gen consoles don't support FP16.

Only PS4 Pro and the Nintendo Switch.

diamond.g · Jul 20, 2020

PortoMavericks said:
Only PS4 Pro and the Nintendo Switch.

Thanks, I completely missed they added, what became, RPM to the PS4 Pro. On the desktop side it didn't show up till Vega. The switch is interesting because I have seen folk say it has 1 tflop of power, but that apparently is for fp16 not fp32.

Erehy Dobon · Jul 20, 2020

Let's not forget that the console videogame industry has a different profitability model than the PC industry.

Game console hardware is sold at a slight loss or break-even point. The profit comes from content sales (mostly games). This is not a new concept, it has been going on for decades. Nintendo stated that the average Super NES or Nintendo64 owner would buy something like 10-12 games during their ownership of the console.

These days, XBox and PS have jumped aboard the subscription model as well.

Apple and other PC manufacturers cannot expect users to spend $500, $1000, $2000 on computer hardware and expect to see equivalent software revenue at each hardware price point.

In a similar manner, consumer-grade printers are pretty much sold as loss leaders. The profit comes from consumables (ink cartridges and special photo papers).

leman · Jul 20, 2020

JMacHack said:
I'd say die size would stop Apple from doing the same. The XB1X has a SoC of over 300mm2, the A12 is just 12mm2.

According to my google, A12 is 83.27mm2, A12Z (iPad Chip) is 120mm2 and A13 is 100mm2.

jerwin · Jul 20, 2020

JMacHack said:
The XB1X has a SoC of over 300mm2, the A12 is just 12mm2.

The A12z, which is the more relevant chip, has an area of 127.26 mm2. source

endlessike · Jul 20, 2020

Waragainstsleep said:
The luxury Apple now has with their own silicon is they can make one SoC with a giant GPU on the die for Mac pro customers who need one, but they can also make an SoC that focuses more on regular CPU power like a Xeon for people who need that.

Aren’t increasingly large, increasingly complicated SoCs going to have yield issues?

Much better for business to take the 60% yield on high end GPU and pair it with the 60% yield on high end CPU, than to live with a 40% yield on a combined massive CPU/GPU SoC.

magbarn · Jul 20, 2020

Waragainstsleep said:
The luxury Apple now has with their own silicon is they can make one SoC with a giant GPU on the die for Mac pro customers who need one, but they can also make an SoC that focuses more on regular CPU power like a Xeon for people who need that.

Apple can cater to the markets it perceives are there or the ones it believes it can improve or disrupt. They could build a super basic, super efficient iMac for admin use in offices. Something for the reception desk with a chip like the A12Z that runs MS Office and Safari and email and calendar apps and isn't built to do much else besides keep the electric bill down. They could also build a 30" Gaming iMac with something more like the console style ~250W TDP GPU heavy SoC that can play high performance games on a really beautiful screen. If they want to.

Hard to cool 250 watt SOC with Apple’s obsession with thinness. You might be able to get away with it with Mac Pro chassis but not an iMac. Rumors are flying the new iMac is going to be thinner with the upcoming refresh

if people here are salivating over potentiak AAA gaming on Armac don’t hold your breath. If Apple is really going different in the GPU design vs nvidia/AMD, good luck in getting games as even pc gamers have had to deal with stepchild status the last several years with also getting ports much later and with severe Bugs on release. You think the gaming industry will support a new platform with a comparatively minuscule numbers Of gamers?

dmccloud · Jul 20, 2020

magbarn said:
You think the gaming industry will support a new platform with a comparatively minuscule numbers Of gamers?

I think you overlooked one important thing here. This is not a "new" platform in the traditional sense given that studios have been developing apps for iOS/iPadOS for years. Furthermore, given the move to unify the iOS/MacOS development process in conjunction with the move to ARM, there is already a sizeable audience for gaming ready and waiting for the game studios to follow suit.

the8thark · Jul 20, 2020

johngwheeler said:
If the XBox-X has a 12 TFLOP GPU on its SoC, what stops Apple doing the same?

Apple being Apple stops them from doing the same. Apple isn't after pushing a certain number of tflops. Apple is after a better, higher quality, more power efficient end user experience. Apple could do a better job with a lower tflop solution. Also that solution could be vastly different from the Xbox One X.

Moving forward, raw benchmark numbers are going to mean less and less when we approach the physical limits of the current way things are done. The real improvements will be fundamentally changing how things are done to a much better way or making better use of what we have today. As in, not a higher spec solution, but a more efficient solution so it's overall the better option.

Waragainstsleep · Jul 21, 2020

magbarn said:
Hard to cool 250 watt SOC with Apple’s obsession with thinness. You might be able to get away with it with Mac Pro chassis but not an iMac. Rumors are flying the new iMac is going to be thinner with the upcoming refresh

Current iMac Pro has a 250W GPU so the 27" chassis can do it. Use a 30" with a lower power CPU and the extra space from the screen size and you should be able to shave a few mm off the thickness.
[automerge]1595326364[/automerge]

endlessike said:
Aren’t increasingly large, increasingly complicated SoCs going to have yield issues?

Much better for business to take the 60% yield on high end GPU and pair it with the 60% yield on high end CPU, than to live with a 40% yield on a combined massive CPU/GPU SoC.

More complexity generally means lower yield but it depends on the scalability of the architecture. If you can just cut and paste extra cores and you have a decent yield on the original template then you can at least calculate what the compound yield is likely to be when you add more cores. Plus we don't know how far along Apple is with their Mac CPUs. I can't imagine they would jump before they know they are going to stick a decent landing and they jumped the second they announced.

diamond.g · Jul 21, 2020

the8thark said:
Apple being Apple stops them from doing the same. Apple isn't after pushing a certain number of tflops. Apple is after a better, higher quality, more power efficient end user experience. Apple could do a better job with a lower tflop solution. Also that solution could be vastly different from the Xbox One X.

Moving forward, raw benchmark numbers are going to mean less and less when we approach the physical limits of the current way things are done. The real improvements will be fundamentally changing how things are done to a much better way or making better use of what we have today. As in, not a higher spec solution, but a more efficient solution so it's overall the better option.

At the end of the day it matters what the performance needs are. For GPGPU compute performance (or TFLOPs) matter. For gaming (or rasterization) compute performance isn’t important. Generally speaking, Apple has been more interested in compute instead of rasterization, will that change with Apple Silicon? Maybe. IMO the best thing they could do is keep console performance out their mouth if they are not going to have any games that are actually on that level on their machines.

magbarn · Jul 21, 2020

dmccloud said:
I think you overlooked one important thing here. This is not a "new" platform in the traditional sense given that studios have been developing apps for iOS/iPadOS for years. Furthermore, given the move to unify the iOS/MacOS development process in conjunction with the move to ARM, there is already a sizeable audience for gaming ready and waiting for the game studios to follow suit.

No, if AAA developers were so interested in the iOS platform we’d already see rdr2/doom eternal running on the appletv/iPad at this point as supposedly the iPad Pro has the same power as an Xbox one X. Instead we just get more iterations of candy crush and farming sim games.

Another case in point, I have yet to find a game that runs better in MacOS vs Bootcamp in all my Macs iGPU/dGPU doesn't matter. Either the developer doesn't care to optimize it for MacOS or Apple doesn't make good enough drivers to fully support the GPU in MacOS for gaming. In either case, switching to another new CPU architecture and even more importantly, a GPU significantly different in how it renders than AMD/Nvidia means even less support in the future.

JMacHack · Jul 21, 2020

leman said:
According to my google, A12 is 83.27mm2, A12Z (iPad Chip) is 120mm2 and A13 is 100mm2.

Ah, so you're right. I thought 12mm2 was way too small at first look but rolled with it. Mea culpa.

That still puts the XBX die at 3x the size of the iPad chip though. And the Intel chips around 150mm2 for the upper Core I series. A little on the big side I'd say, but I'd bet the Mac series silicon is around that size.

leman · Jul 21, 2020

JMacHack said:
That still puts the XBX die at 3x the size of the iPad chip though. And the Intel chips around 150mm2 for the upper Core I series. A little on the big side I'd say, but I'd bet the Mac series silicon is around that size.

We should also keep in mind that Mac silicon is going to be at 5nm. I have no idea what this will mean for the hip size though.

Waragainstsleep · Jul 21, 2020

Quick aside: Whoever is naming the Xbox models at Microsoft deserves to roast in the least savoury pit of hell.

If the XBox-X has a 12 TFLOP GPU on its SoC, what stops Apple doing the same?

macrumors G5

macrumors Core

macrumors regular

macrumors G5

macrumors Core

macrumors G5

macrumors Core

macrumors G5

macrumors 6502

Suspended

macrumors 6502

macrumors G5

Suspended

macrumors Core

Suspended

macrumors member

macrumors 68040

macrumors 68040

macrumors 601

macrumors 6502a

macrumors G5

macrumors 68040

Suspended

macrumors Core

macrumors 6502a

Our Staff