That is because usually graphic cards have 1/32nd FP64 rates and are bandwidth limited.A practical example of a device family with greater theoretical tflops numbers than practical graphics performance is AMD’s Vega. The big Vegas were very good compute cards with excellent raw numbers. But for graphics work are often beaten by much weaker (flops wise) RDNA based gpus.
The VEGA 10 die (used for the Radeon RX 56 and 64) had 1/16th FP64 rate but paired with (at the time) fast HBM with low latency.
The VEGA 20 die (Radeon VII) had 1/2 FP64 with even more bandwidth, so that was a very fast compute card.
Something like the modern NVIDIA Geforce RTX 4090 has 1/64th FP64 rate.