Apple GPUs speculation: TBDR, RT and the benefit of wide ALUs

leman · Nov 30, 2020

jeanlain said:
Isn't that specific to MSAA? Is FXAA or SMAA less costly on Apple GPUs?
MSAA isn't as frequent as it used to be.

Yes, it's specific to MSAA. But since MSAA on Apple GPUs is programmable, I wouldn't be surprised if you can use it to implement more advanced modern temporal AA techniques. Can't really comment too much here since it has been years I last looked how modern AA shaders work

diamond.g said:
It also makes me wonder, why SSAO and Shadows aren't free (or very low cost) on TBDR as well.

Apple gives you low-level control over how GPU does rendering, so you can do a lot of interesting things to make things more efficient and/or faster. One problem though is that many of these techniques are limited to processing a single tile at once. If you need to examine neighborhoods of pixels, like in the SSAO techniques, it might introduce artifacts on tile boundaries. Not sure.

diamond.g · Dec 1, 2020

I feel like this has been asked, but does Apple Metal have a DLSS equivalent? The lack of one on the AMD side has DF scratching their heads.

leman · Dec 1, 2020

diamond.g said:
I feel like this has been asked, but does Apple Metal have a DLSS equivalent? The lack of one on the AMD side has DF scratching their heads.

I don’t think that Apple offers a ready solution like DLSS. I’ve read that AMD and co. are planning to release an open-source model for this, so maybe it can be later ported to Apple platform...

diamond.g · Dec 1, 2020

leman said:
I don’t think that Apple offers a ready solution like DLSS. I’ve read that AMD and co. are planning to release an open-source model for this, so maybe it can be later ported to Apple platform...

Not sure why Apple isn't leading in this since they started the whole resolution scaling thing. Why render a game (or anything really) at 2160P native when you can do your fancy resolution scaling thing and pump up the 1080P image to 2160P with no loss in quality.

leman · Dec 1, 2020

diamond.g said:
Not sure why Apple isn't leading in this since they started the whole resolution scaling thing. Why render a game (or anything really) at 2160P native when you can do your fancy resolution scaling thing and pump up the 1080P image to 2160P with no loss in quality.

Probably because you’ll need a ton of training data from actual games and it’s not like Apple is a gaming hardware company. It makes a lot of sense for Nvidia, not so much for someone like Apple. Besides, it’s not like ML upscaling is free you need non-trivial ml compute performance. Is M1 fast enough to do real-time upscale from HD to its native resolution? I’m not so sure...

gnomeisland · Dec 4, 2020

leman said:
Probably because you’ll need a ton of training data from actual games and it’s not like Apple is a gaming hardware company. It makes a lot of sense for Nvidia, not so much for someone like Apple. Besides, it’s not like ML upscaling is free you need non-trivial ml compute performance. Is M1 fast enough to do real-time upscale from HD to its native resolution? I’m not so sure...

But a significant part of SoC (perhaps more than the GPU) is devoted to the NPU and ML accelerators. I'm not saying it *can* but it seems like anything else could it would be M1.

leman · Dec 4, 2020

gnomeisland said:
But a significant part of SoC (perhaps more than the GPU) is devoted to the NPU and ML accelerators. I'm not saying it *can* but it seems like anything else could it would be M1.

I don't disagree with you. It's just that Apple quotes the M1 NL performance at 11 TFLOPS, where even Nvidia 2060 (the "lowest" GPU that support DLSS) has over 50 TFLOPS throughput on it's tensor cores. It is entirely possible that NL is fast enough to do this kind of upscaling with acceptable performance, but this needs to be tested.

diamond.g · Dec 6, 2020

@leman does Apple have any examples of using SDF in their Metal documentation?

leman · Dec 6, 2020

diamond.g said:
@leman does Apple have any examples of using SDF in their Metal documentation?

If you mean signed distance fields, not that I know of.

diamond.g · Dec 6, 2020

leman said:
If you mean signed distance fields, not that I know of.

Yeah. I found 2 PS4 games that use it Dreams and Claybook. Seems like an interesting technique that would need all the pipeline tools to catch up for support (since everyone else seem to work in polygons).

leman · Dec 6, 2020

diamond.g said:
Yeah. I found 2 PS4 games that use it Dreams and Claybook. Seems like an interesting technique that would need all the pipeline tools to catch up for support (since everyone else seem to work in polygons).

With tile shaders etc. I think Metal on Apple GPUs is well suitable for these techniques. I am using SDFs to draw 2D caves for my hobby game project, but it’s a bit different since I am still using polygons, I am just computing distances from the nearest cave edge to draw visually complex boundaries without needing extra geometry. So more of a hybrid technique.

diamond.g · Dec 6, 2020

leman said:
With tile shaders etc. I think Metal on Apple GPUs is well suitable for these techniques. I am using SDFs to draw 2D caves for my hobby game project, but it’s a bit different since I am still using polygons, I am just computing distances from the nearest cave edge to draw visually complex boundaries without needing extra geometry. So more of a hybrid technique.

Yeah that was talked about on Beyond3d. You should check out Dreams if you have a PS4 (or PS5).

theorist9 · Dec 6, 2020

leman said:
They are hiding a lot of hardware details: clocks, RAM type, TDP... RAM will most likely be LPDDR4/5 with some sort of wide multi-channel configuration. Probably around 80GBps bandwidth, something that will be plenty for a chip of this spec.

awesomedeluxe said:
...I'm pretty sure this is dual channel LPDDR4X. Compare the image you saw today to this image of the A12X..
Which uses LPDDR4X. I think, frankly, LPDDR5 supplies were more limited than they expected.

Boil said:
Grain of salt...?!?

Memory type: LPDDR4X-4266
LPDDR5-5500 Max. Memory: 16 GB
Memory channels: 2 ECC: No

Jouls said:
Where did they get the info like the frequency from? Do they have a preview model?

The following is from ifixit's Nov. 19 teardown of the 8 GB M1 Air and M1 MBP (https://www.ifixit.com/News/46884/m1-macbook-teardowns-something-old-something-new). Not sure if LPDDR4X-4266 is also used in the 16 GB machines:

awesomedeluxe · Dec 8, 2020

theorist9 said:
The following is from ifixit's Nov. 19 teardown of the 8 GB M1 Air and M1 MBP (https://www.ifixit.com/News/46884/m1-macbook-teardowns-something-old-something-new). Not sure if LPDDR4X-4266 is also used in the 16 GB machines:

View attachment 1688142
View attachment 1688145

Yeah, anandtech calls the 16GB of RAM in the Mac Mini LPDDR4X-4266-class, so I'm pretty sure it's LPDDR4X everywhere.

In light of the recent Bloomberg report about "32 core" graphics in laptops, I'm again wondering what Apple will do with their GPU memory.

I guess it's plausible that a 16-core CPU and 32-core GPU could be married into a giant APU. The article does imply that Apple is prepared for abysmal yield on these things, and yield could definitely be a blood bath at that size. But if it's a big APU, 4x LPDDR5-6400 modules would probably cut it. That would be a threefold increase in bandwidth relative to a fourfold increase in core count. Bandwidth starved, sure, but still performant.

But it really sounds like Apple is making separate GPUs. Chips that, at the very least, are separate enough to need their own memory controller. A 16 firestorm core part, minus the GPU cores, plus some I/O is probably under 200mm2. 32 GPU cores plus a memory controller is around 150mm2, and it just makes more sense for Apple to target parts this size. But it's weird to imagine a discrete GPU plugging back into the same memory pool the CPU is using, not to mention challenging. I can't think of a good way to go about it.

Pressure · Dec 8, 2020

awesomedeluxe said:
Yeah, anandtech calls the 16GB of RAM in the Mac Mini LPDDR4X-4266-class, so I'm pretty sure it's LPDDR4X everywhere.

In light of the recent Bloomberg report about "32 core" graphics in laptops, I'm again wondering what Apple will do with their GPU memory.

I guess it's plausible that a 16-core CPU and 32-core GPU could be married into a giant APU. The article does imply that Apple is prepared for abysmal yield on these things, and yield could definitely be a blood bath at that size. But if it's a big APU, 4x LPDDR5-6400 modules would probably cut it. That would be a threefold increase in bandwidth relative to a fourfold increase in core count. Bandwidth starved, sure, but still performant.

But it really sounds like Apple is making separate GPUs. Chips that, at the very least, are separate enough to need their own memory controller. A 16 firestorm core part, minus the GPU cores, plus some I/O is probably under 200mm2. 32 GPU cores plus a memory controller is around 150mm2, and it just makes more sense for Apple to target parts this size. But it's weird to imagine a discrete GPU plugging back into the same memory pool the CPU is using, not to mention challenging. I can't think of a good way to go about it.

Here is a die shot of the current M1 chip.

Scaling it straight up to 12 Firestorm cores and a 32-core GPU would take up less than 260mm2. That’s not a giant chip by any means.

diamond.g · Dec 8, 2020

Pressure said:
Here is a die shot of the current M1 chip.

Scaling it straight up to 12 Firestorm cores and a 32-core GPU would take up less than 260mm2. That’s not a giant chip by any means.

Do we know how they disable the 8th GPU core?

awesomedeluxe · Dec 8, 2020

Pressure said:
Here is a die shot of the current M1 chip.

Scaling it straight up to 12 Firestorm cores and a 32-core GPU would take up less than 260mm2. That’s not a giant chip by any means.

Is 12 a typo? The article says 16 with some cores potentially disabled.

I got 263mm2, eyeballing the Firestorm block at about 18mm2 and GPU cores at about 30mm2. We still have to dedicate some space for extra I/O. I'd round up to 300mm2 to account for that and potential increases to SLC and other areas, but we're in the same ballpark.

I agree with you that that's not a giant chip. But it's... really big. It pretty much puts the final design in the hands of TSMC's N5 process. I guess that's consistent with the article, which is suggesting they could bin these pretty aggressively. They'd probably have to before putting it into the MBP14 anyway, so it's not like these binned chips wouldn't have a home.

A 16+32 core APU still gives me pause. Like, even if we assume the CPU scaled down to iPhone speeds, it's still consuming 40W under load... and the GPU also consumes 40W under load... all within about a square inch of silicon. Dell struggled to cool Kaby G, which was 65W over a much larger area, and I think that's the biggest accomplishment to date.

diamond.g · Dec 8, 2020

awesomedeluxe said:
Is 12 a typo? The article says 16 with some cores potentially disabled.

I got 263mm2, eyeballing the Firestorm block at about 18mm2 and GPU cores at about 30mm2. We still have to dedicate some space for extra I/O. I'd round up to 300mm2 to account for that and potential increases to SLC and other areas, but we're in the same ballpark.

I agree with you that that's not a giant chip. But it's... really big. It pretty much puts the final design in the hands of TSMC's N5 process. I guess that's consistent with the article, which is suggesting they could bin these pretty aggressively. They'd probably have to before putting it into the MBP14 anyway, so it's not like these binned chips wouldn't have a home.

A 16+32 core APU still gives me pause. Like, even if we assume the CPU scaled down to iPhone speeds, it's still consuming 40W under load... and the GPU also consumes 40W under load... all within about a square inch of silicon. Dell struggled to cool Kaby G, which was 65W over a much larger area, and I think that's the biggest accomplishment to date.

Would it be easier to cool if they used more of a chipset design and spread the chip out more?

EntropyQ3 · Dec 8, 2020

awesomedeluxe said:
Is 12 a typo? The article says 16 with some cores potentially disabled.

I got 263mm2, eyeballing the Firestorm block at about 18mm2 and GPU cores at about 30mm2. We still have to dedicate some space for extra I/O. I'd round up to 300mm2 to account for that and potential increases to SLC and other areas, but we're in the same ballpark.

I agree with you that that's not a giant chip. But it's... really big. It pretty much puts the final design in the hands of TSMC's N5 process. I guess that's consistent with the article, which is suggesting they could bin these pretty aggressively. They'd probably have to before putting it into the MBP14 anyway, so it's not like these binned chips wouldn't have a home.

A 16+32 core APU still gives me pause. Like, even if we assume the CPU scaled down to iPhone speeds, it's still consuming 40W under load... and the GPU also consumes 40W under load... all within about a square inch of silicon. Dell struggled to cool Kaby G, which was 65W over a much larger area, and I think that's the biggest accomplishment to date.

Apple is already cooling WAY beyond that in the iMac in total thermal load, and even power density both on the intel CPU, and the 5700xt GPU. In the same enclosure, Apple should be able to cool the chip outlined above very quietly as the amount of heat needing to be dissipated determines the required airflow.
It’s a big chip compared to their cell phone SoC:s, but Sonys and Microsofts new console chips are 306 and 360mm2 respectively on the closely related 7nm node (and they are cheap). TSMC already reports comparable defect rates on their 5nm node, and will have further dialed in the process in manufacturing a hundred million or so A14 SoCs by now.
And, as opposed to their phone SoC:s, Apple will have opportunities to increase yield further both by disabling defective/underperforming functional blocks and binning, standard procedure in the industry.
I really wouldn’t worry.

awesomedeluxe · Dec 8, 2020

diamond.g said:
Would it be easier to cool if they used more of a chipset design and spread the chip out more?

Yeah. Like there's no problem with 80W in the abstract; you can buy Macbooks right now that use 90W. I've never seen that much power in that small a space, though.

EntropyQ3 said:
Apple is already cooling WAY beyond that in the iMac in total thermal load, and even power density both on the intel CPU, and the 5700xt GPU. In the same enclosure, Apple should be able to cool the chip outlined above very quietly as the amount of heat needing to be dissipated determines the required airflow.
It’s a big chip compared to their cell phone SoC:s, but Sonys and Microsofts new console chips are 306 and 360mm2 respectively on the closely related 7nm node (and they are cheap). TSMC already reports comparable defect rates on their 5nm node, and will have further dialed in the process in manufacturing a hundred million or so A14 SoCs by now.
And, as opposed to their phone SoC:s, Apple will have opportunities to increase yield further both by disabling defective/underperforming functional blocks, standard procedure in the industry.
I really wouldn’t worry.

Oh, for sure! It's no problem in the iMac. But I think these chips are supposedly going into the MBP14 and MBP16. I'm just taking for granted right now that the parts going into the MBP14 are binned and have a lot of cores disabled. But the article implies that they're testing a full 16+32 loadout for the MBP16.

That's not impossible but it's certainly unprecedented. Take a gander at this article about cooling Kaby G in the XPS. The XPS is the same thickness as the current MBP16, and Kaby G is a 65W part which comes with its own high bandwidth memory solution. A 16+32 core part would probably have a similar TDP, but still be in need of high bandwidth memory (creates more heat) and is trying to accomplish that in a smaller area (more heat-dense).

EntropyQ3 · Dec 8, 2020

awesomedeluxe said:
Yeah. Like there's no problem with 80W in the abstract; you can buy Macbooks right now that use 90W. I've never seen that much power in that small a space, though.

Oh, for sure! It's no problem in the iMac. But I think these chips are supposedly going into the MBP14 and MBP16. I'm just taking for granted right now that the parts going into the MBP14 are binned and have a lot of cores disabled. But the article implies that they're testing a full 16+32 loadout for the MBP16.

That's not impossible but it's certainly unprecedented. Take a gander at this article about cooling Kaby G in the XPS. The XPS is the same thickness as the current MBP16, and Kaby G is a 65W part which comes with its own high bandwidth memory solution. A 16+32 core part would probably have a similar TDP, but still be in need of high bandwidth memory (creates more heat) and is trying to accomplish that in a smaller area (more heat-dense).

Fully agree when it comes to the portables. Again, it wouldn’t really be worse than what they are already handling in the enclosure, on the other hand I think putting that kind of thermal load inside a MBP is .... suboptimal.

theorist9 · Dec 8, 2020

diamond.g said:
Do we know how they disable the 8th GPU core?

According to several reports, they don't. They knew a certain percentage of the chips were coming out with a bad GPU core. Thus, rather than throwing those out, they created an SKU for the Air with 7 GPU cores, and use them in that. The process is called "binning".

To the extent they offer different CPU/GPU core counts within each model in the upcoming generation of AS Macs, some (but not all) of that will likely result from binning as well.

theorist9 · Dec 8, 2020

awesomedeluxe said:
Yeah, anandtech calls the 16GB of RAM in the Mac Mini LPDDR4X-4266-class, so I'm pretty sure it's LPDDR4X everywhere.

In light of the recent Bloomberg report about "32 core" graphics in laptops, I'm again wondering what Apple will do with their GPU memory.

I guess it's plausible that a 16-core CPU and 32-core GPU could be married into a giant APU. The article does imply that Apple is prepared for abysmal yield on these things, and yield could definitely be a blood bath at that size. But if it's a big APU, 4x LPDDR5-6400 modules would probably cut it. That would be a threefold increase in bandwidth relative to a fourfold increase in core count. Bandwidth starved, sure, but still performant.

But it really sounds like Apple is making separate GPUs. Chips that, at the very least, are separate enough to need their own memory controller. A 16 firestorm core part, minus the GPU cores, plus some I/O is probably under 200mm2. 32 GPU cores plus a memory controller is around 150mm2, and it just makes more sense for Apple to target parts this size. But it's weird to imagine a discrete GPU plugging back into the same memory pool the CPU is using, not to mention challenging. I can't think of a good way to go about it.

This post, by cmaier, speaks to some of the issues you've raised:

Developer Delves Into Reasons Why Apple's M1 Chip is So Fast

My understanding is that apple already has a separate GPU ready-to-go, and we will likely see it mid-2021 in some variation of the MBP and iMac. The CPU and GPU can still both share RAM even if the GPU is not on the same die as the CPU. It could be that the “discrete” GPU is in the same...

forums.macrumors.com

leman · Dec 9, 2020

awesomedeluxe said:
But if it's a big APU, 4x LPDDR5-6400 modules would probably cut it. That would be a threefold increase in bandwidth relative to a fourfold increase in core count. Bandwidth starved, sure, but still performant.

I don't think that is going to be enough bandwidth. For that kind info chip, you really want 200+GB/s... so something "HBM-like". Frankly, I am starting thinking that Apple will have a DYI-HBM with stacked LPDDR chips and very wide memory bus, with 8 memory controllers or more. They already stack RAM on top of the iPhone chip, so I don't see why it wouldn't be possible.

awesomedeluxe said:
That's not impossible but it's certainly unprecedented.

I don't think it's unprecedented. There are laptops shipping with large and hot GPUs. A mobile RTX 2080 is 545mm2 with a TDP of 80W in the Max-Q configuration. An Apple SoC with a 12+4 CPU and a 32-core GPU will probably have a combined TDP of around 60-70watts. Shouldn't be that much of a challenge to cool in the current 16" chassis.

awesomedeluxe said:
But it really sounds like Apple is making separate GPUs. Chips that, at the very least, are separate enough to need their own memory controller. A 16 firestorm core part, minus the GPU cores, plus some I/O is probably under 200mm2. 32 GPU cores plus a memory controller is around 150mm2, and it just makes more sense for Apple to target parts this size. But it's weird to imagine a discrete GPU plugging back into the same memory pool the CPU is using, not to mention challenging. I can't think of a good way to go about it.

For high-end configs (like the Mac Pro), I don't really see them doing monolithic chips — yields will probably be abysmal. But a multi-chip package, with CPU+GPU dies connected to a shared I/O+cache die (possibly stacked with RAM) — that should be doable. AMD does it with Zen3 and it seems to work just fine.

But than again, it's Apple we are talking about. I can totally see them using a 1000mm2 monolithic die that costs $1000 to make just to prove a point. Still cheaper than the Xeons and the GPUs they have to buy from a third party

Pressure · Dec 9, 2020

awesomedeluxe said:
Is 12 a typo? The article says 16 with some cores potentially disabled.

I got 263mm2, eyeballing the Firestorm block at about 18mm2 and GPU cores at about 30mm2. We still have to dedicate some space for extra I/O. I'd round up to 300mm2 to account for that and potential increases to SLC and other areas, but we're in the same ballpark.

I agree with you that that's not a giant chip. But it's... really big. It pretty much puts the final design in the hands of TSMC's N5 process. I guess that's consistent with the article, which is suggesting they could bin these pretty aggressively. They'd probably have to before putting it into the MBP14 anyway, so it's not like these binned chips wouldn't have a home.

A 16+32 core APU still gives me pause. Like, even if we assume the CPU scaled down to iPhone speeds, it's still consuming 40W under load... and the GPU also consumes 40W under load... all within about a square inch of silicon. Dell struggled to cool Kaby G, which was 65W over a much larger area, and I think that's the biggest accomplishment to date.

No, that’s 12 High Performance cores and 4 High Efficiency cores.

But let us be honest with ourselves.

I think the 16” MacBook Pro will get at maximum 8 High Performance cores, 4 High Efficiency cores and up to 16-core GPU (12- to 16-cores depending on configuration).

Apple GPUs speculation: TBDR, RT and the benefit of wide ALUs

macrumors Core

macrumors G5

macrumors Core

macrumors G5

macrumors Core

macrumors 65816

macrumors Core

macrumors G5

macrumors Core

macrumors G5

macrumors Core

macrumors G5

macrumors 601

macrumors 6502

macrumors 603

macrumors G5

macrumors 6502

macrumors G5

macrumors 6502a

macrumors 6502

macrumors 6502a

macrumors 601

macrumors 601

macrumors Core

macrumors 603

Our Staff