Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

mr_roboto

macrumors 6502a
Sep 30, 2020
856
1,866
Thanks for pointing that out, it is very informative. The last slide mentioned that XCode has several new performance counters related to the MMU (Memory Mapping Unit), which is the hardware block on a CPU that translates logical to physical addresses. Those were MMU Limiter, MMU Utilization Counter, and MMU TLB Miss Rate. TLB = Translation Lookaside Buffer, which is the MMU's cache for recent logical-to-physical translations.

The speaker threw that in at the last moment of his talk, with no other explanation. However it implies that besides the normal GPU-related bottlenecks, there are possible MMU-related bottlenecks the programmer must be aware of. In some of his videos Max Yuryev has speculated that MMU TLB bottlenecks may be limiting GPU scalability on the M1 Max and Ultra.
Correction: in some of his videos Max Yuryev has baselessly speculated that MMU TLB bottlenecks etc. He's not a good source for technical information, he's a youtube talking head who doesn't even seem to really understand what a TLB is, much less how it could affect graphics performance.

You shouldn't take the existence of performance counters for various pieces of the system as corroboration for Yuryev's ideas.
 

leman

macrumors Core
Oct 14, 2008
19,521
19,677
The speaker threw that in at the last moment of his talk, with no other explanation. However it implies that besides the normal GPU-related bottlenecks, there are possible MMU-related bottlenecks the programmer must be aware of.

TLB misses pose a huge problem for GPU performance since it can launch thousands of memory requests simultaneously. This is not new and not exclusive to Apple GPUs, this has been an active field of academic (and practical) research since GPGPU became a reality. A TLB miss counter can be very valuable when designing your memory access patterns.
 

joema2

macrumors 68000
Sep 3, 2013
1,646
866
TLB misses pose a huge problem for GPU performance since it can launch thousands of memory requests simultaneously. This is not new and not exclusive to Apple GPUs, this has been an active field of academic (and practical) research since GPGPU became a reality. A TLB miss counter can be very valuable when designing your memory access patterns.
Interesting; I see in "Reducing GPU Address Translation Overhead with Virtual Caching", Yoon (2006), he discusses how traditional TLB caching may not work well for integrated CPU/GPU unified memory designs due to poor locality of reference: https://minds.wisconsin.edu/bitstream/handle/1793/75577/TR1842.pdf?sequence=3&isAllowed=y

In that paper he proposes a method of alleviating the bottleneck of MMU TLB misses caused by shared CPU/GPU address translation hardware.

With that in mind it's understandable why the WWDC2022-10159 session you mentioned advised that Xcode developers improve GPU scalability by monitoring the new MMU-related counters, inc'l MMU TLB miss rate.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.