But even disregarding all that — modern desktop GPUs already use tiling! So it must be worth it for them. Here is a screenshot from a tool I wrote to study rasterization patterns on Macs:
View attachment 963093
Here, I am drawing two large overlapping triangles: yellow over red, while limiting the amount of pixels shaded, on a MacBook Pro with an AMD Navi GPU. As you can see, the GPU tiles the screen into large rectangular areas which are rasterized in tile order (left to right, top to bottom). Inside the large tiles, you get smaller square block artifacts. These blocks correspond to the invoked pixel shader groups: since Navi SIMD units are 32-wide units, it needs to receive work in batches of 32 items to achieve good hardware utilization. The blocks itself are 8x8 pixels, so once the rasterizer has output a 8x8 (64 pixels in total) pixel block, the pixel shader is executed. The tiling capability of Navi is fairly limited, once I go over a certain number of triangles (I think it's about 512, but I need to verify it), it resets. I am sure that details are much more complex, but this gives us a general idea how this stuff works. Nvidia is similar (can't test it since I don't have one, but I saw results from other people), just with different tile and block sizes. Intel (at least in the Skylake gen) does not use tiling (see the other screenshot I have attached), and it also uses different SIMD dispatch blocks (4x4 pixels — which makes sense given that fact that Intel Gen uses 4-wide SIMD).
As you can see, tiling must be cheap and beneficial enough so that big houses like Nvidia and AMD use it as well. But while they use tiling rendering, this is not deferred rendering: you can see the bottom red triangle drawn before the yellow triangle paints it over. Each pixel gets shaded twice. On a TBDR GPU (iPad, iPhone etc.) you won't ever see any red pixels, because the pixel shader is never called for the red triangle.