No dedicated video RAM on ARM Macs

pshufd · Jun 28, 2020

leman said:
At the same time, not having to copy data between devices is also an efficiency win. I would say it depends on how well the memory subsystem can deal with heterogenous concurrent requests. And of course, TBDR GPUs tend to exhibit much better cache locality in typical scenarios.

I look at it as more of an api where you draw objects instead of sending pixels.

leman · Jun 28, 2020

pshufd said:
I look at it as more of an api where you draw objects instead of sending pixels.

I think one should look at as an API for programming a parallel processor. Drawing objects is not enough. One needs to have a basic understanding of the computational model in order to do so efficiently. You can have the fastest SIMD architecture on the planet, but if the programmer does not understand how data is structured and processed, the performance is going to suck.

The big advantage of Apple GPUs is that they expose the cache structure to the developer, while making certain guarantees about data locality. And of course, they offer a unified set of assumptions, which simplifies development.

UltimateSyn · Jun 28, 2020

Regarding the discussion on page one, unified memory is confirmed to be the way they are going. No VRAM!

AceFernalld said:
Check out the post above. The path forward for Apple Silicon is with CPUs and GPUs on one system on a chip.

"Intel-based Macs contain a multi-core CPU and many have a discrete GPU ... Machines with a discrete GPU have separate memory for the CPU and GPU. Now, the new Apple Silicon Macs combine all these components into a single system on a chip, or SoC. Building everything into one chip gives the system a unified memory architecture. This means that the CPU and GPU are working over the same memory."

Explore the new system architecture of Apple silicon Macs - WWDC20 - Videos - Apple Developer

Discover how Macs with Apple silicon will deliver modern advantages using Apple's System-on-Chip (SoC) architecture. Leveraging a unified...

developer.apple.com

pshufd · Jun 28, 2020

AceFernalld said:
Regarding the discussion on page one, unified memory is confirmed to be the way they are going. No VRAM!

Performance depends on bandwidth then.

reallynotnick · Jun 28, 2020

I mean if Apple is going to make me pay insane prices for non-upgradable memory then they better make it some pretty damn good memory!

Not sure what the status of HBM3 is, but that would make for a hell of a setup.

Zackmd1 · Jun 28, 2020

Would be interesting to see if Apple manages to put DDR5 in these macs.... It is technically available and since they are no longer tied to waiting for support from the CPU manufacturer they could try and implement it. It would go a long way in helping GPU performance as well as system performance.

Joelist · Jun 28, 2020

Remember that Apple's "ARM" is a LOT different than other implementations. They have an ARM Architecture license so while they use ARM instruction sets the cores are Apple's design. Add in that Apple poached a LOT of design talent from Intel, Qualcomm and even AMD and Nvidia over the past few years, especially from the team that designed "Conroe" (better known as Core2Duo et al) and it isn't that surprising that their A series SOCs have become beasts. Note that at WWDC they were very careful to say that we haven't yet seen the SOC they will be using on Macs - and the breakout sessions gave some hints but not a lot of details.

With that said, I would not be a bit surprised if they are preparing a design that has direct channels to the RAM from both the CPU and GPU blocks and a lot of high speed RAM with dynamic rules of the road on which gets priority in any given scenario. They even showed in a WWDC session some hints on how such a setup would work (without saying it was regarding the GPU and CPU blocks in the SOC).

pshufd · Jun 28, 2020

Joelist said:
With that said, I would not be a bit surprised if they are preparing a design that has direct channels to the RAM from both the CPU and GPU blocks and a lot of high speed RAM with dynamic rules of the road on which gets priority in any given scenario. They even showed in a WWDC session some hints on how such a setup would work (without saying it was regarding the GPU and CPU blocks in the SOC).

I was thinking about this possibility earlier today. I think that there are still some substantial challenges to doing this though.

aeronatis · Jun 29, 2020

I think some variation of A14 will be used for the notebooks that currently comes with iGPU (MacBook Air, 13" MacBook Pro, Mac Mini).

As for those with dedicated GPU, they would come up with a combination of Apple Silicon + AMD Radeon Pro Graphics with HBM2 VRAM. Otherwise, it would be weird to release a product that performs worse than its predecessor.

The fact that developer kit comes with A12Z with 16 GB DDR4 RAM actually supports that the chip could be used along with dedicated memory. If they could do it for system memory, they could also do for graphics memory.

leman · Jun 29, 2020

aeronatis said:
As for those with dedicated GPU, they would come up with a combination of Apple Silicon + AMD Radeon Pro Graphics with HBM2 VRAM. Otherwise, it would be weird to release a product that performs worse than its predecessor.

Dedicated VRAM ≠ Performance. We are so used to the current definitions that we forget to look at the big picture. A GPU does not need VRAM to have good performance. The devil is in the detail.

aeronatis said:
The fact that developer kit comes with A12Z with 16 GB DDR4 RAM actually supports that the chip could be used along with dedicated memory. If they could do it for system memory, they could also do for graphics memory.

Apple's developer documentation is stressing the notion of Apple GPUs using the system memory. This is part of their overall design. Note that their GPUs operate very differently from Nvidia or AMD ones. Apple GPUs generally need to perform less work to achieve the same result, and they are less reliant on memory bandwidth, because they are better at optimizing memory accesses. Additionally, Apple GPU give the programmer direct access to the on-GPU cache, which allows one to implement rendering techniques much more efficient and with much less RAM usage. For example, many modern games use so called deferred shading multi-step rendering technique, where information about object materials is first collected in memory buffers and then used to compute complex lighting effects. This technique generally needs a lot of fast VRAM (since these buffers are large and need to be read and written multiple times). On Apple GPU however, this technique can be performed in one step, using on-chip cache only. No need to have a lot fo memory to hold the material buffers, no need to move this memory around. This is one of the reasons why Apple GPUs can be very fast while using less memory — they are simply "smarter".

aeronatis · Jun 29, 2020

leman said:
Dedicated VRAM ≠ Performance. We are so used to the current definitions that we forget to look at the big picture. A GPU does not need VRAM to have good performance. The devil is in the detail.

Apple's developer documentation is stressing the notion of Apple GPUs using the system memory. This is part of their overall design. Note that their GPUs operate very differently from Nvidia or AMD ones. Apple GPUs generally need to perform less work to achieve the same result, and they are less reliant on memory bandwidth, because they are better at optimizing memory accesses. Additionally, Apple GPU give the programmer direct access to the on-GPU cache, which allows one to implement rendering techniques much more efficient and with much less RAM usage. For example, many modern games use so called deferred shading multi-step rendering technique, where information about object materials is first collected in memory buffers and then used to compute complex lighting effects. This technique generally needs a lot of fast VRAM (since these buffers are large and need to be read and written multiple times). On Apple GPU however, this technique can be performed in one step, using on-chip cache only. No need to have a lot fo memory to hold the material buffers, no need to move this memory around. This is one of the reasons why Apple GPUs can be very fast while using less memory — they are simply "smarter".

These are very good points. I didn't say dedicated VRAM should be equal to more performance though. Then again, there are situations a dedicated VRAM simply has to exist (Yes, I am talking about current terms). Since on chip memory is actually quite fast (much faster than system memory) but small in size, the current design needs dedicated memory.

I am excited to see what kind of A14 variant they introduce with the first Apple Silicon Macs. I am even optimistic for iPadOS versiyon of Final Cut Pro at this point (which would make me switch to iMac and use iPad Pro as my main portable device). Currently using 16" MacBook Pro with 5500M and 8 GB GDDR6 VRAM is my regular bottleneck during my video projects. If, like you say, that issue will cease to exist, I would be the happiest buyer.

leman · Jun 29, 2020

aeronatis said:
I am excited to see what kind of A14 variant they introduce with the first Apple Silicon Macs. I am even optimistic for iPadOS versiyon of Final Cut Pro at this point (which would make me switch to iMac and use iPad Pro as my main portable device). Currently using 16" MacBook Pro with 5500M and 8 GB GDDR6 VRAM is my regular bottleneck during my video projects. If, like you say, that issue will cease to exist, I would be the happiest buyer.

I would guess that traditional dGPU design is in fact suboptimal for some pro applications. You have a fast GPU, you have some really fast VRAM, but the bus connecting the CPU and GPU is much slower. So if your video workflow requires you to constantly send new data to the GPU, you will be limited by the PCI-e performance. Unified memory might perform better in this case.

pshufd · Jun 29, 2020

leman said:
I would guess that traditional dGPU design is in fact suboptimal for some pro applications. You have a fast GPU, you have some really fast VRAM, but the bus connecting the CPU and GPU is much slower. So if your video workflow requires you to constantly send new data to the GPU, you will be limited by the PCI-e performance. Unified memory might perform better in this case.

It depends on the data you're sending. If you're sending compact instructions, it could be far less than pixels.

leman · Jun 29, 2020

pshufd said:
It depends on the data you're sending. If you're sending compact instructions, it could be far less than pixels.

Instructions are not a problem. Streaming HDR 4K video data over PCI-e bus might be one.

aeronatis · Jun 29, 2020

leman said:
I would guess that traditional dGPU design is in fact suboptimal for some pro applications. You have a fast GPU, you have some really fast VRAM, but the bus connecting the CPU and GPU is much slower. So if your video workflow requires you to constantly send new data to the GPU, you will be limited by the PCI-e performance. Unified memory might perform better in this case.

Exactly, just like TB3 creating a bottleneck for external graphics cards, any bus needed for communication between CPU & GPU is less than optimal for more pro apps. Can't wait to see what's around the corner.

weckart · Jun 30, 2020

aeronatis said:
As for those with dedicated GPU, they would come up with a combination of Apple Silicon + AMD Radeon Pro Graphics with HBM2 VRAM. Otherwise, it would be weird to release a product that performs worse than its predecessor.

Mac Mini 2014? I seem to remember the price of used 2011/2012 MMs went right up after that was launched. Not saying that will necessarily happen again but it wouldn't be weird as we've been there before.

aeronatis · Jun 30, 2020

weckart said:
Mac Mini 2014? I seem to remember the price of used 2011/2012 MMs went right up after that was launched. Not saying that will necessarily happen again but it wouldn't be weird as we've been there before.

That was when they switched from 45 watt quad core H series to 15/28 watt dual core U series, but I thnk that is a different situation. I don't believe they woul do the same to 16" MacBook Pro or iMac Pro etc, or at least I don't want to believe

JacobHarvey · Jun 30, 2020

leman said:
I would guess that traditional dGPU design is in fact suboptimal for some pro applications. You have a fast GPU, you have some really fast VRAM, but the bus connecting the CPU and GPU is much slower. So if your video workflow requires you to constantly send new data to the GPU, you will be limited by the PCI-e performance. Unified memory might perform better in this case.

Am I misunderstanding or are people here suggesting that PCIe bandwidth is a serious bottleneck for video cards?

The standard x16 PCIe 3.0 lanes used to connect GPUs to CPUs on DOES NOT create a real life bottleneck for even the top of the line nVidia RTX 2080 Ti (let alone any GPU of similar power to an Apple integrated GPU) and that standard is ANCIENT in tech terms. Even people halving the bandwidth to x8 PCIe 3.0 lanes has been shown to only cause a small performance drop of ~3% for the RTX 2080 Ti

Plus PCIe 3.0 has since been superseded by PCIe 4.0 on AMD platforms offering double the bandwidth per lane so this is simply not a problem.

There are a whole heap of other things holding back GPUs before you get to PCIe bandwidth (which again to be clear IS NOT bottlenecking GPUs and is unlikely to do so for quite a few years, by which point it will likely be addressed by a new PCIe revision).

PCIe bus speed being a major bottleneck to GPUs is a misconception that must be done away with.

I think Apple will be fine without dedicated video memory, its iGPUs will destroy intels offerings on most macs (and most mac users have been fine using those macbooks). In the future it may shift to producing its own dedicated GPU chip to get around die size and thermal constraints to target higher performance levels. We just have to wait and see how this all pans out.

leman · Jun 30, 2020

JacobHarvey said:
Am I misunderstanding or are people here insinuating that PCIe bandwidth is a serious bottleneck for video cards?

PCIe bandwidth is more than adequate for games (because its copy once use frequently), but it might be a limiting factor in a professional workflow where you have to move a lot of data back and forth. Again, it is just a conjecture on my side, I haven't looked at any benchmarks. I imagine latency being a bigger issue than bandwidth in case you need to quickly synchronise some data between CPU and the GPU.

JacobHarvey · Jun 30, 2020

leman said:
PCIe bandwidth is more than adequate for games (because its copy once use frequently), but it might be a limiting factor in a professional workflow where you have to move a lot of data back and forth. Again, it is just a conjecture on my side, I haven't looked at any benchmarks. I imagine latency being a bigger issue than bandwidth in case you need to quickly synchronise some data between CPU and the GPU.

Ah ok professional apps could present some specific workloads that could benefit from more bandwidth, but I still think PCIe 4.0 can provide more than enough bandwidth in such cases.

I also still think that PCIe wouldn't be a serious bottleneck (even from a latency point of view) for GPUs of the performance tier that are put in any Apple Macs (or are likely to be developed for macs in the next few years). People who need such performance for scientific applications and other high performance computing are likely using custom hardware or workstation class GPUs, running specialized programs that aren't relevant for macos at all anyway.

Die size constraints, thermals, the GPU microarchitecture itself etc. are likely to be much more important challenges for Apple

Joelist · Jun 30, 2020

Apple's on die GPUs are still very much performance parts. Let's remember that the A12X GPU with its 7 cores already is at the same power level as the Xox One S. Then remember that not only is the Mac Apple Silicon going to be a new family but it can be run at much higher power levels as a Mac does not have the same thermal restrictions as, say, an iPad.

Woochoo · Jun 30, 2020

leman said:
...
Even if Apple uses LPDDR5 in their new Macs (RAM bandwidth approaching 50 GB/s), they won’t be able to complete with modern GDDR6 etc. solutions that deliver bandwidth of 200 GB/s and higher.
...

What do you think?

I think that's pretty fine for what they actually will be replacing first years. No Macbook has a dGPU with dedicated memory until you go for a 3k MBP 16", and the Apple GPUs are way more cappable than Intel iGPUs.

Now, for a high end machine? Actual Apple GPUs are about the 1050Ti performance, but I bet they can go for a more beefy design with more cores and even dedicated memory for desktop class ones like the iMacs and Mac Pro whenever they consider to switch it to ARM.

ian87w · Jun 30, 2020

I'd say, it makes sense from Apple's perspective.
However, that would question if the RAM will be customer accessible anymore, or will everything be soldered from factory (no DIY upgrades).

A side of me doesn't like it. Unless Apple put plenty of RAM for the base Mac to allow at least 5-year headroom into the life of the Mac, it makes the lifecycle of the device more controlled (which is good for Apple). OTOH, if native apps are more efficient, maybe it won't be an issue for majority of consumers (plus I assume macOS will allow the system to have a swap file, unlike iOS).

Waragainstsleep · Jul 1, 2020

leman said:
There are reasons why I think Apple could potentially pull this off. First of all, this kind of system is going to be very expensive (interposers are complex and cost a lot of money). This is probably why we don’t see it much in everyday computing as companies prefer more conservative solutions that scale to different markets. But Apple doesn’t care about this. They don’t have to cater to different markets, they have their target pretty much locked in. The 16” MBP already costs a lot of money - and they might as well funnel the savings from using their own chips into a more expensive memory subsystem. This would also be advantageous to Apple, since nobody else would be even close to offering anything even remotely comparable.

What do you think?

Love the sound of it. If I understand correctly, the cost of these interposers is in the R&D stage where Apple is happy to invest its vast amounts of cash. It also sounds like once they have this running, they can scale it up or down to any SoC they want. So while it might start in the Mac Pro, it will eventually filter all the way down to even consumer MacBooks. People are already underestimating the value of Apple being able to differentiate itself from competitors. When they had PPC, they had a certain advantage particularly when they were first to go properly 64-bit with the G5 and people were falling over themselves to build Apple based supercomputer clusters from PowerMacs and then Xserves. With Intel inside they lost that advantage and its why they essentially abandoned the research markets because you could usually get something newer and cheaper from Dell or HP since they Xserves didn't update their CPU options more than once a year or so as Apple don't like tweaking models every three months.

Now they can design their own chips with all sorts of extra dedicated modules included for things like video processing, encryption, machine learning, etc they can do all sorts. Having some spectacular memory system that also gives them the ability to catch up and maybe even overtake AMD and Nvidia very quickly sounds like an absolute no brainer. Even less reliance on partners, lower production costs and superior products to attract new customers to the platform.
Currently there is nothing known about running Windows for ARM on these new Macs. Without that they are going to need to expand their market share to tempt software developers who have always held out into porting their product to Mac. Having hundreds of millions of iOS users will help, but I do think they are going to want to sell a lot more Macs than they have before.

Industry leading GPU performance with low power, great core count scaleability and insanely fast data throughput due to the memory system would mean they'd be building supercomputers again in no time flat.

leman · Jul 1, 2020

ian87w said:
However, that would question if the RAM will be customer accessible anymore, or will everything be soldered from factory (no DIY upgrades).

User-upgradeable RAM is gone and not coming back, especially on portable machines. Pursuit of higher performance, lower memory consumption and reliability is not really compatible with modular RAM...

For Mac Pro, modular RAM might still be a thing. We don't really know what Apple is going to do with the MP during the ARM transition. I can understand how their SoC chips can be utilized in laptops (up to MacBook Pro), but I don't really see them being a good fit for a Mac Pro without major changes. We'll have to wait and see.

No dedicated video RAM on ARM Macs

macrumors G4

macrumors Core

macrumors 601

macrumors G4

macrumors 65816

macrumors 6502a

macrumors 6502

macrumors G4

macrumors regular

macrumors Core

macrumors regular

macrumors Core

macrumors G4

macrumors Core

macrumors regular

macrumors 603

macrumors regular

macrumors regular

macrumors Core

macrumors regular

macrumors 6502

macrumors 6502a

macrumors G3

macrumors 6502a

macrumors Core

Our Staff