The final blow for real gaming on the Mac, or the beginning of a new era?

Colstan · Oct 6, 2020

For what it is worth, Larian has just released Baldur's Gate 3 for early access. I think that can be classified as a premiere title, which is what this thread is about. The game supports both Windows and the Mac. I am not aware of them specifically stating that BG3 would support the Mac, but am not certain. While Larian hasn't made any comments about Apple Silicon, seeing how early access started today, the actual release date is going to be sometime in 2021, according to earlier statements. Larian tends to keep their games on early access for a substantial amount of time, so it's probably the second half of 2021, if not later due to world events.

Regardless, by the time that BG3 is ready for release, there will be many Apple Silicon Macs in the hands of consumers. I doubt Larian only plans to support Intel Macs once the official release happens. There will likely be clarification further in development once Apple Silicon is available.

You can see the Mac system requirements on GOG's website, which are of course, subject to change.

Baldur's Gate 3 at GOG

I have to wonder if Larian already has an Apple Silicon dev kit?

diamond.g · Oct 6, 2020

Boil said:
Blizz putting WoW on Apple silicon is great, makes me wonder if we might see WoW come to iPadOS, now that iPads have keyboard & mouse capabilities...!?!

Not sure how big it is on Mac but on Windows the game folder is 68GB. Seems doubtful.

leman · Oct 6, 2020

Colstan said:
For what it is worth, Larian has just released Baldur's Gate 3 for early access. I think that can be classified as a premiere title, which is what this thread is about. The game supports both Windows and the Mac. I am not aware of them specifically stating that BG3 would support the Mac, but am not certain. While Larian hasn't made any comments about Apple Silicon, seeing how early access started today, the actual release date is going to be sometime in 2021, according to earlier statements.

Larian always had a close relationship to Apple. Their games work great on Mac, and Original Sin 2 even has support for Touch Bar. I am quite sure they will have a native Apple Silicon port by the time the game is released, probably even with added support for Apple-specific features.

Colstan · Oct 6, 2020

leman said:
Larian always had a close relationship to Apple. Their games work great on Mac, and Original Sin 2 even has support for Touch Bar. I am quite sure they will have a native Apple Silicon port by the time the game is released, probably even with added support for Apple-specific features.

I was aware that DOS2 was ported to the Mac with Apple's help. It runs very well on my Mac mini. I just hadn't heard or read anything about BG3 having Mac support until today. Previously, Steam and GOG only listed system requirements for Windows. It's good to see Larian's continued support. I had wondered if DOS2 was a "one and done" affair or if support would continue.

rkuo · Oct 6, 2020

leman said:
Yes, but aren't most of your concerns related to tiling itself? Like geometry processing and rasterization overhead? You also mention load balancing issues for the pixel shading phase of course, but that is probably the less substantiated worry, when we consider the small tile size and the amount of work saved by the deferred approach.

It is, yes. The point that I am trying to make is that most TBDR skeptics are questioning the scalability of the tiling phase — which is odd, as tiling is used on pretty much all modern rendering hardware.

By the way, if I am not mistaken other mobile GPUs such as Qualcomm's Adreno are tile based immediate renderers, not deferred.

I guess we're mixing words here but basic tiled memory access patterns isn't the concern as far as scalability goes, it's the TBDR / sort-middle approach. I don't agree that small tile sizes avoid the problem. First, if you make the tiles too small, you increase the resource requirements since each tile must independently store and process its own geometry prior to rasterizing. Second, that still doesn't address load imbalance, where most of the complex primitives land on a just a few tiles, leaving the other tile processors with nothing to do. Small or large, it's still a problem. I wouldn't assume that the work saved by the deferred approach is always more or better ... it depends on the hardware constraints.

We're sort of flailing on this without actual numbers as to the tradeoffs involved, but the above issues seem to be the classic problems with TBDR. Maybe Apple / ImgTec have some ideas to scale into the higher end desktop space.

mr_roboto · Oct 6, 2020

rkuo said:
I guess we're mixing words here but basic tiled memory access patterns isn't the concern as far as scalability goes, it's the TBDR / sort-middle approach. I don't agree that small tile sizes avoid the problem. First, if you make the tiles too small, you increase the resource requirements since each tile must independently store and process its own geometry prior to rasterizing.

The scene's geometry is X bytes of data. Tiling is only expected to grossly magnify X in cases where triangles are so large they span many tiles. That's OK, though, since such scenes shouldn't stress the system much. On the other end of the spectrum, the smaller the triangles (i.e. the more detailed the scene), the less this overhead matters.

Furthermore, when it comes to geometry, there are no on-chip resource requirements, as talks like these make clear:

Metal 2 on A11 - Overview - Tech Talks - Videos - Apple Developer

The seamless integration of Metal 2 with the A11 Bionic chip lets your apps and games realize entirely new levels of performance and...

developer.apple.com

Harness Apple GPUs with Metal - WWDC20 - Videos - Apple Developer

Create visually stunning, high-performance apps and games when you combine the power of Apple GPUs with Metal, the modern foundation for...

developer.apple.com

Apple TBDR GPUs read geometry from system memory and store tiled vertex data back into system memory. Later on, a tile renderer gets assigned one of the tiled vertex buffers to process. On-chip tile memory is used for things like depth and color buffers, not geometry.

I'm reading between the lines here, but I think this architecture shouldn't have significant scaling problems because total geometry memory traffic (even accounting for the extra round trip compared to immediate mode) is likely quite low compared to the per-pixel operations later in the pipeline. (Which is why tile memory focuses on that part of the problem.)

Second, that still doesn't address load imbalance, where most of the complex primitives land on a just a few tiles, leaving the other tile processors with nothing to do. Small or large, it's still a problem.

I think you've had this explained to you before. The hardware has N tile "processors", and ideally they just run in parallel. When one tile takes longer to process than others, that's OK. There's some form of work dispatcher which gives tile processors another tile to work on as they come free. At the end of a frame, sure, there's going to be a few tile processors idle waiting for others to finish, but I don't think this can be a serious issue so long as N is significantly less than the number of tiles the viewport is split into. (which it will be.)

(About that "ideally they just run in parallel" - a common theme I've noticed in the Apple GPU optimization videos I've watched is Apple describing common false dependencies they've observed in application code, and how to fix them to eliminate false serialization and extra memory accesses. It's clear that optimizing Metal code for TBDR requires paying attention to things which don't matter on immediate mode GPUs.)

rkuo · Oct 6, 2020

mr_roboto said:
The scene's geometry is X bytes of data. Tiling is only expected to grossly magnify X in cases where triangles are so large they span many tiles. That's OK, though, since such scenes shouldn't stress the system much. On the other end of the spectrum, the smaller the triangles (i.e. the more detailed the scene), the less this overhead matters.

This isn't just about primitives crossing tiles (overlap) ... each tile needs to accumulate, sort, and maintain its own list of data (primitives list). That's the entire reason a tile exists ... to be an independent work unit. Decreasing the tile size increases that overhead.

Furthermore, when it comes to geometry, there are no on-chip resource requirements, as talks like these make clear:

Metal 2 on A11 - Overview - Tech Talks - Videos - Apple Developer

The seamless integration of Metal 2 with the A11 Bionic chip lets your apps and games realize entirely new levels of performance and...

developer.apple.com

Harness Apple GPUs with Metal - WWDC20 - Videos - Apple Developer

Create visually stunning, high-performance apps and games when you combine the power of Apple GPUs with Metal, the modern foundation for...

developer.apple.com

Apple TBDR GPUs read geometry from system memory and store tiled vertex data back into system memory. Later on, a tile renderer gets assigned one of the tiled vertex buffers to process. On-chip tile memory is used for things like depth and color buffers, not geometry.

I'm reading between the lines here, but I think this architecture shouldn't have significant scaling problems because total geometry memory traffic (even accounting for the extra round trip compared to immediate mode) is likely quite low compared to the per-pixel operations later in the pipeline. (Which is why tile memory focuses on that part of the problem.)

But the question is if moving the work to the middle of the pipeline is worth it (which is where the scaling issue is). And saving memory bandwidth is an issue on mobile devices, but it's generally not the bottleneck on desktop GPU's.

I think you've had this explained to you before. The hardware has N tile "processors", and ideally they just run in parallel. When one tile takes longer to process than others, that's OK. There's some form of work dispatcher which gives tile processors another tile to work on as they come free. At the end of a frame, sure, there's going to be a few tile processors idle waiting for others to finish, but I don't think this can be a serious issue so long as N is significantly less than the number of tiles the viewport is split into. (which it will be.)

Shrug, could be. I don't have the numbers either way on this.

(About that "ideally they just run in parallel" - a common theme I've noticed in the Apple GPU optimization videos I've watched is Apple describing common false dependencies they've observed in application code, and how to fix them to eliminate false serialization and extra memory accesses. It's clear that optimizing Metal code for TBDR requires paying attention to things which don't matter on immediate mode GPUs.)

Well that's a whole other can of worms to deal with. The differences between IMR and TBDR appear to be difficult to abstract over via any sort of API, which means ports are going to be suboptimal without a lot of extra work by developers.

diamond.g · Oct 6, 2020

Colstan said:
For what it is worth, Larian has just released Baldur's Gate 3 for early access. I think that can be classified as a premiere title, which is what this thread is about. The game supports both Windows and the Mac. I am not aware of them specifically stating that BG3 would support the Mac, but am not certain. While Larian hasn't made any comments about Apple Silicon, seeing how early access started today, the actual release date is going to be sometime in 2021, according to earlier statements. Larian tends to keep their games on early access for a substantial amount of time, so it's probably the second half of 2021, if not later due to world events.

Regardless, by the time that BG3 is ready for release, there will be many Apple Silicon Macs in the hands of consumers. I doubt Larian only plans to support Intel Macs once the official release happens. There will likely be clarification further in development once Apple Silicon is available.

You can see the Mac system requirements on GOG's website, which are of course, subject to change.

Baldur's Gate 3 at GOG

I have to wonder if Larian already has an Apple Silicon dev kit?

it isn’t clear, can macOS users participate in Early Access?

MrGunnyPT · Oct 7, 2020

Chozes said:
Blizzard are porting World of Warcraft to ARM. They are even adding Ray Tracing. I would assume many companies may make it a focus?

Where is the information saying they will do it? I didn't see anything regarding this

leman · Oct 7, 2020

rkuo said:
This isn't just about primitives crossing tiles (overlap) ... each tile needs to accumulate, sort, and maintain its own list of data (primitives list). That's the entire reason a tile exists ... to be an independent work unit. Decreasing the tile size increases that overhead.

The big difference in terms of cost between TBDR and IMR is that a TBDR needs to store the output of the vertex processing stage until all primitives are rasterized, where an IMR can immediately rasterize a primitive and discard it's data. This is the primary cause of scaling concerns. In a sense, TBDR's bet is that processing the geometry is less costly than processing the pixels. If your geometry list is long and the geometry itself is complex, with many very small triangles (we are talking few pixels in size) and little to no overdraw, this assumption can be violated. In comparison, the overhead of managing bins is negligible. Clipping and binning is done on very fast fixed-function hardware and the bins only need to store a list of primitive ids intersecting the tile (in practice, these are 16-bit indices, so the bandwidth overhead is very small compared to the size of transformed vertex data). From what I've read, GPUs try to manage the geometry recording phase by aggressively compressing data and by putting a limit on the buffer size. Once you get too many primitives per tile, that tile is flushed, which keeps the things from getting stuck. Probably one of the biggest challenges of TBDR (and only TBDR) is transparency. If you encounter a transparent pixel you have to flush the tile as well, since that pixel will needs access to the underlaying image. Keeping track of all this while maintaining good performance is an engineering nightmare — which is probably why only one company up do date has managed to figured out TBDR in practice.

So far, it seems like IMR should always do better with complex geometry and small triangles. But it's not that simple. First of all, modern IMR also use binning, so they have to deal with some of the above issues as well (the common approach here seems to use large tiles and flush early). Second, modern IMR have an even bigger problem with small triangles since they break the SIMD coherency. A modern desktop GPU such as Navi or Ampere needs 32 data items to be processed simultaneously in order to get good shading unit utilization. This means that pixel shaders are executed on blocks of pixels (Navi for example uses 8x8 blocks). For optimal performance, all pixels in the block have to belong to the same triangle. Edges are always a problem since it means that parts of the SIMD units are not doing anything useful. Small triangles are basically all edges, so if most of your triangles are like this, your performance will tank big time. A TBDR in contrast has no problems here because it always shades an entire tile at once — it can shade pixels belonging to multiple different triangles in one go. An IMR simply can't do that. So you know, you win some, you lose some.

rkuo said:
But the question is if moving the work to the middle of the pipeline is worth it (which is where the scaling issue is). And saving memory bandwidth is an issue on mobile devices, but it's generally not the bottleneck on desktop GPU's.

The scaling issue is entirely in the frontend. Bandwidth is always a problem. Why would desktop GPUs use faster and faster RAM, with wider and wider memory buses otherwise? Latency is an even bigger issue.

TBDR is simply about more efficient utilization of resources. There is no reason why a TBDR GPU could not be paired with high-bandwidth RAM for example. It's just that it doesn't necessarily need that RAM at a given performance level. But if Apple is serious about higher performance tiers, they will have to use fast RAM. LPDDR5 will only get you that far, even with an TBDR.

rkuo said:
Well that's a whole other can of worms to deal with. The differences between IMR and TBDR appear to be difficult to abstract over via any sort of API, which means ports are going to be suboptimal without a lot of extra work by developers.

Depends how much of the underlaying architecture details you want to expose. On a fundamental level, you don't need to abstract much. Any standard API uses the same abstraction: triangles in, rasterized in order, pixels out. Both IMR and TBDR are very happy to work with this model. It only starts getting tricky if you want to get out more out of the hardware.

Not to mention that modern APIs already have some optimization for TBDR GPUs. Apple for example introduced render passes — sort of contracts that describe how frame buffer resources are being used. These can useful for tilers (any tiles, not just TBDR) in a low-bandwidth environment, since they can use the information to optimize memory transfers. For IMR GPUs, they don't do much (but this could change in the near future as tiling becomes more important). Render pass APIs were subsequently adopted in Vulkan, DX12 and WebGPU — so it's now an industry standard abstraction. So if your Vulkan game uses render passes, it will automatically run "better" on Apple Silicon.

Beyond that Metal on Apple GPUs of course has a lot of additional features that wouldn't make sense for an IMR at all. Which is not different to any other GPU out there. You can use DX12 to make a game that runs good on both AMD or Nvidia GPUs, or you can use Nvidia's mesh shader extensions to make your game run better on Nvidia GPUs... etc. I remember the old good OpenGL days, where you ended up using non-standard vendor extensions because core OpenGL was outdated crap.

diamond.g said:
it isn’t clear, can macOS users participate in Early Access?

Yes, they have a Mac version. Which came as a bit of a surprise, since Sven Winkle was not sure whether Mac will make it to early access just a short while ago.

MrGunnyPT said:
Where is the information saying they will do it? I didn't see anything regarding this

There was some mention of ARM strings in data-mined patches on WoW forums, unfortunately I can't find a link to the source...

diamond.g · Oct 7, 2020

It also looks like WoW is only using RT Shadows. Which makes sense as using it for GI probably would require way more engine work, and would also likely break the art direction they go for.

Colstan · Oct 7, 2020

diamond.g said:
it isn’t clear, can macOS users participate in Early Access?

Mac users can use Early Access, however availability depends on the game and the developer. Some devs push out betas simultaneously for Windows, Mac and Linux. Some will only do Windows. Others will push out updates for the Mac, but not as frequently as for Windows.

My personal philosophy is the just wait for the official release, at the very least. In the case of BG3, here is an article about Early Access and how the Mac version was a surprise.

Baldur's Gate 3 Surprise-Releasing For Mac On Early Access Launch Day

I must say that article title is grammatically painful, but it gets the point across.

MysticCow · Oct 7, 2020

jeanlain said:
Not sure the Switch is so successful then.

Which switch?

68k to PPC worked about as well as the very first ever massive switch would work.

PPC to Intel worked at first and worked insanely well. Then came the obvious LACK of support for things like OpenGL and Java. Put those two together (like some apps I use do) and you have a disaster, heavy on the DIS! Metal didn't help, as nobody went to Metal for such a long time that it became irrelevant right out of the gate, or at least in my experiences.

Now we're facing Intel to ASMac. The ball is in Apple's court and I hope it's a 1000-0 blowout over other processors.

diamond.g · Oct 7, 2020

MysticCow said:
Which switch?

68k to PPC worked about as well as the very first ever massive switch would work.

PPC to Intel worked at first and worked insanely well. Then came the obvious LACK of support for things like OpenGL and Java. Put those two together (like some apps I use do) and you have a disaster, heavy on the DIS! Metal didn't help, as nobody went to Metal for such a long time that it became irrelevant right out of the gate, or at least in my experiences.

Now we're facing Intel to ASMac. The ball is in Apple's court and I hope it's a 1000-0 blowout over other processors.

Nintendo Switch....

Jouls · Oct 8, 2020

I know iOS is not considered “real” gaming by some. Still wanted to share this newsbit from AppleInsider with you:

“The App Store generated more than $11 billion just from gaming, pushing just that category higher than the entire Google Play Store for the third quarter.”

diamond.g · Oct 8, 2020

Jouls said:
I know iOS is not considered “real” gaming by some. Still wanted to share this newsbit from AppleInsider with you:

“The App Store generated more than $11 billion just from gaming, pushing just that category higher than the entire Google Play Store for the third quarter.”

No one disputes that free to play games with in app purchases make a lot of money. To be honest it points to the biggest reason why games like Control don‘t exist on the platform, it is hard to make a single player game like that make financial sense on iOS as a platform, at the same time it is released on others. $60 (upfront) games don’t sell on iOS.

EntropyQ3 · Oct 9, 2020

diamond.g said:
No one disputes that free to play games with in app purchases make a lot of money. To be honest it points to the biggest reason why games like Control don‘t exist on the platform, it is hard to make a single player game like that make financial sense on iOS as a platform, at the same time it is released on others. $60 (upfront) games don’t sell on iOS.

Well, $60 games are a dodgy proposition on any platform, which is why AAA games are made by the big publishers pretty much exclusively, and the category is dominated by very well established franchises.

What needs to be explored is the middle ground, but that takes a bit of guts on the part of the publisher/developer. Personally, I detest games that are built around micro transactions/selling your personal information/advertising for other games. And I sure as hell don't want my kids to engage in gaming that is just a thinly veiled attempt att fostering a gambling addiction. Apple Arcade is an alternative, but the number of attractive titles is low, and they predominantly assume fluency in English, which my daughters haven't acquired yet. And Games as a Service has its own caveats.

PortoMavericks · Oct 9, 2020

diamond.g said:
No one disputes that free to play games with in app purchases make a lot of money. To be honest it points to the biggest reason why games like Control don‘t exist on the platform, it is hard to make a single player game like that make financial sense on iOS as a platform, at the same time it is released on others. $60 (upfront) games don’t sell on iOS.

The only way to get around this consumer tendency is making the Apple TV a full fledged gaming platform. Pay it once, get it on all your platforms.

Also, Apple should withdraw that rule of not allowing gamepad only games on the iPhone and the iPad.

MrGunnyPT · Oct 9, 2020

diamond.g said:
It also looks like WoW is only using RT Shadows. Which makes sense as using it for GI probably would require way more engine work, and would also likely break the art direction they go for.

Yeah they gonna need to overhaul their engine eventually.

Just hoping they do make a ARM version, or I’m going with a 12l inch Mac and play via remote

jeanlain · Oct 11, 2020

While we speculate about the ability of the TBDR design to scale, it's worth reminding that Apple never said that their GPUs will replace AMD's. They did not explicitly say that AS Macs will not have discrete GPUs, they just said that all AS Macs will be equipped with an SoC comprising an Apple GPU (+ other things).
MacBook Pros alreading combine an iGPU and a dGPU.
The base Mac Pro could ship without a graphic card, for those who don't need one.
The iMac is a bit special. One may wonder about the benefit of having an iGPU + a dGPU (for the higher-end models). Well, high end iMac already have both, AFAIK. The iGPU is there only because Apple wants Quicksync. On AS iMacs, Apple could leave the iGPU active for apps that can take advantage of two GPUs (for Apps that don't, the dGPU would be used).

leman · Oct 11, 2020

jeanlain said:
While we speculate about the ability of the TBDR design to scale, it's worth reminding that Apple never said that their GPUs will replace AMD's. They did not explicitly say that AS Macs will not have discrete GPUs, they just said that all AS Macs will be equipped with an SoC comprising an Apple GPU (+ other things).
MacBook Pros alreading combine an iGPU and a dGPU.
The base Mac Pro could ship without a graphic card, for those who don't need one.
The iMac is a bit special. One may wonder about the benefit of having an iGPU + a dGPU (for the higher-end models). Well, high end iMac already have both, AFAIK. The iGPU is there only because Apple wants Quicksync. On AS iMacs, Apple could leave the iGPU active for apps that can take advantage of two GPUs (for Apps that don't, the dGPU would be used).

They were actually very clear that Apple Silicon Macs will use Apple GPUs. For the laptops, I see no possibility at all for AMD or Nvidia graphics. In sub 50 watt space Apple is the king. Why would they use an AMD GPU in a 16” MBP when their own is likely to be twice as fast?

Besides, Apple aims to have consistent APIs between their devices. Can’t have that if you mix IMR with TBDR.

thenewperson · Oct 11, 2020

leman said:
Besides, Apple aims to have consistent APIs between their devices. Can’t have that if you mix IMR with TBDR

I wonder if they'll still allow eGPUs on ASi Macs, but only if they use a TBDR architecture.

jeanlain · Oct 11, 2020

leman said:
They were actually very clear that Apple Silicon Macs will use Apple GPUs. For the laptops, I see no possibility at all for AMD or Nvidia graphics. In sub 50 watt space Apple is the king. Why would they use an AMD GPU in a 16” MBP when their own is likely to be twice as fast?

That assumes that Apple wants/can produce iGPUs that are twice faster than AMD's. We don't know if that's the case.
And not all Macs will operate under 50 watts. It's not obvious that an Apple GPU will be twice as fast as the radeon pro 5600M.
We also know that the macOS betas have references to upcoming AMD GPUs.
Anyway I don't see apple dropping compatibility with e-GPUs. So they'll have to cope with IMR + TBDR.

Solomani · Oct 11, 2020

Hopefully Feral and Aspyr will chime in and have something to say about the future of their games on Apple Silicon Macs and iOS.

jeanlain · Oct 11, 2020

Apyr is long gone. From the Mac, at least.

The final blow for real gaming on the Mac, or the beginning of a new era?

macrumors 6502

macrumors G5

macrumors Core

macrumors 6502

macrumors 65816

macrumors 6502a

macrumors 65816

macrumors G5

macrumors 65816

macrumors Core

macrumors G5

macrumors 6502

macrumors 68000

macrumors G5

macrumors member

macrumors G5

macrumors 6502a

macrumors 6502

macrumors 65816

macrumors 68020

macrumors Core

macrumors 65816

macrumors 68020

macrumors 601

macrumors 68020

Our Staff