Yes, instanced rendering can draw a huge number of objects. The problem is that the CPU cannot prepare data for GPU with a reasonable FPS, especially when the number of more than 100k and the scene is fully dynamic. GPU can be used for scene processing and draw commands generation with unbeatable performance.
Metal API allows using only single draw command for indirect rendering. It means that there is no way to draw different geometry in a single command. And each geometry variation should use separate draw command:
Encodes a draw command that renders multiple instances of a geometric primitive with indexed vertices and indirect arguments.
developer.apple.com
Other API except D3D11 and OpenGLES have multi-draw-indirect-count functionality where it's possible to combine multiple draw-indirect commands in a single API call. And no CPU-GPU synchronization is required for that:
We are emulating multi-draw-indirect-count functionality as a loop of draw indirect commands for D3D11, OpenGLES, and Metal. This emulation is working even faster for the previous Nvidia GPU generation because of the driver issue:
D3D11 71 FPS (GTX 1060):
GravityMark GPU Benchmark
gravitymark.tellusim.com
Vulkan 58 FPS (GTX 1060):
GravityMark GPU Benchmark
gravitymark.tellusim.com
But unfortunately, not in the case of Metal API, where the same HW is three times slower.
Metal API supports indirect command buffers. Our test demonstrates that they are slower than the loop of simple indirect draw commands. That makes them practically useless. Or maybe only the next Apple HW will have benefits from them.
View attachment 1810390
View attachment 1810391