Yes, it is bottlenecked in every cMP (both 3,1 and 5,1), but its so minor it really doesn't matter.
I have had no complaints using the RX580 in mine.
The massive win is the ability to playback and encode h264/h265/HEVC video streams using the GPU using the AMD hardware acceleration feature that doesn't exist for Nvidia in macOS. This allows the 3,1 to playback the highest bitrate encoded 4k video from the jellyfish site with 0 dropped frames, without this the 3,1 just shows a still image.
A collection of .mkv video clips encoded at various bitrates; useful for testing the network streaming and playback performance of media streamers & HTPCs.
jell.yfish.us
You won't be playing this video back in macOS in that 3,1 using any nvidia card, but this RX580 can do it flawlessly and not even bother the CPU in the 3,1 to do it.
The bottleneck isn't very big, but also probably depends on what you use it for.
Compared a Dell T5810 hackintosh (6core 12thread @3.5Ghz xeon) with the RX580 8Gb Red Devil which is using PCIe 3.0 (the iMacPro1,1 in the results) to the cMP 3,1 (the "vmware7,1" OpenCore booted cMP 3,1) and the scores aren't much different considering the cMP 3,1 is also using a RX 580 4Gb Pulse.
Basically 5k point loss on metal benchmark
Benchmark results for a VMware7,1 with an Intel Xeon X5482 processor.
browser.geekbench.com
Benchmark results for an iMacPro1,1 with an Intel Xeon E5-1650 v3 processor.
browser.geekbench.com
And about 4k loss on OpenCL benchmark
Benchmark results for an iMacPro1,1 with an Intel Xeon E5-1650 v3 processor.
browser.geekbench.com
Benchmark results for a VMware7,1 with an Intel Xeon X5482 processor.
browser.geekbench.com