There are several considerations to be made: First of benchmark performance from current cards vs older generations ones won’t reflect actual performance. The reason is that newer cards are build to work best under the PCI 4 spec (with PCI 3 fallback in mind), while older ones were build for the PCI 3 spec (with PCI 2 fallback in mind - which these Mac Pros have). Due to PCIs inherent backwards compatibility current cards will work in older Computers, but performance might be heavily tanked and a PCI 4 card could performance worse than an theoretical inferior older generation card. This shouldn’t happen, but it cannot be ruled out for sure without either finding such tests or conducting them yourself.
Additionally I’ve somewhat got a bell ringing that in BIOS mode these Macs don’t even advertise (full) PCI 2(.1) compatibility as they do with later (and the current) EFI firmware. For best ACPI compatibility (and performance) running Windows in EFI mode will hence probably increase performance, especially with newer (than the Mac itself) GPUs. Regarding the firmware corruption issue that is related to this, OpenCore (when used as the boot manager for chainloading Windows) has protection mechanisms against this then configured correctly.
All that so far is rather theoretical and hypothetical. That said, I still think, the best possible GPU should be a current one (due to performance per watt ratio) fitting into the Mac’s power budget. This means that per spec it would be a dual 6 pin or single 8 pin Powered GPU. However as Macs are beautifully over engineered in my experience a 6+8 pin powered GPU works great, though I’d highly recommend distributing the load between the two mini Molex connectors: meaning dual 6 pin mini Molex to dual standard 6 pin Molex -> dual female 6 pin Molex to single 8 pin Molex Y-cable -> female 8 pin Molex to 8+6 (or 8+8, mind the TDP!) pin Molex. I ran that configuration for years without any issues (including coilwhine) whatsoever. (295W RX Vega 64)
So to sum it, reflecting upon my own experience any GPU with up to a 300W TDP should work, if they are build with a 6 pin (75W) and a 8 pin (150W) connector (providing the additional 75W needed through the PCI port itself, as per spec).
As newer GPUs only come with a 12 pin or dual 8 pin Molex connectors those should better not exceed 225W as it’s not clear (unless somewhere found?) wether they actually draw the remaining 75W solely through the PCI port. Thus performance wise some RTX 4070 Supers are within that save limit without additional modifications to the Mac. The absolut safest bet within the Mac’s per spec would be the most powerful single 8 pin GPU. There are some RTX 4070 variants.
That said I’d recommend going with a admittedly slower, but usually not by to far AMD RX 6800 which with a TDP starting at 250W is just 25W over the working limit (outside spec) with a rather save assumption that the remaining 25W will be drawn from the slot. The great advantage of that GPU is, that it will work with macOS as well (GPU firmware patch required) while not being that much slower in many scenarios. Feeling adventurous an RX 6800 XT equals an GTX 4070, and is TDP wise still below 300W which can work.
In any case I really recommend to balance the load between the two mini Molex connectors. I hope this helps.