Tile size on Apple GPUs can be one of 32 x 32, 32 x 16 or 16 x 16 pixels. I'd assume that each GPU core (or maybe a slice of a core) renders to a different tile, possibly multiple tiles in parallel to hide memory latency.
So the amount of tile buffer memory is not that important per se, it only limits the amount of data you can have associated with the tile while rendering to it. Apple GPUs currently limit it to 32KB, which allows you to store 32 bytes of data per pixel for the 32 x 32 tile or 128 bytes of data if you choose the 16 x 16 tile size. This is probably enough for most practical applications.
To speed things up, they need to a) increase the number of GPU cores b) improve the memory bandwidth per GPU core and c) improve the GPU core ability to work on multiple tiles in parallel. The c) is the only part that would benefit for larger per-core memory.