I did a comparison of two identical EVGA GTX 570 2.5gb versions with CUDA-Z. One of the cards had the EFI boot rom modification performed by MacVidCards. The computer was the same, just swapped the card and ran the CUDA-Z test, Xbench and a subjective test playing back a Red 5k timeline in Adobe Premier.
The cards scored identically on everything except for the Memory Copy Device to Host tests, where the flashed card was about twice as fast. I am assuming this is because the flashed card runs at double PCI speed, but I am not an expert at these things.
We tried playing back a premier timeline that was 5K Red footage with effects at various resolutions but both cards "felt" identical, (neither played full res). The flashed card didn't subjectively seem any faster. If anyone has an idea how to stress the cards to show the difference let me know and I will try to redo the test.
The OpenCL fix (libclh.dylib) made everything much slower, even things like finder window redraws so we uninstalled it. To be honest I don't fully understand the OpenCL fix.
CUDA-Z Report Stock Card
=============
Version: 0.6.133 SVN Built Jun 25 2010 23:28:46
http://cuda-z.sourceforge.net/
OS Version: Mac OS X 10.8 12A269
Driver Version: 8.0.51 295.30.00f01
Driver Dll Version: 5.0
Runtime Dll Version: 3.0
Core Information
----------------
Name: GeForce GTX 570
Compute Capability: 2.0
Clock Rate: 1464 MHz
Multiprocessors: 15
Warp Size: 32
Regs Per Block: 32768
Threads Per Block: 1024
Threads Dimensions: 1024 x 1024 x 64
Grid Dimensions: 65535 x 65535 x 65535
Watchdog Enabled: Yes
Integrated GPU: No
Concurrent Kernels: Yes
Compute Mode: Default
Memory Information
------------------
Total Global: 2559.69 MiB
Shared Per Block: 48 KiB
Pitch: 2048 MiB
Total Constant: 64 KiB
Texture Alignment: 512 B
Texture 1D Size: 65536
Texture 2D Size: 65536 x 65535
Texture 3D Size: 2048 x 2048 x 2048
GPU Overlap: Yes
Map Host Memory: Yes
Error Correction: No
Performance Information
-----------------------
Memory Copy
Host Pinned to Device: 2946.53 MiB/s
Host Pageable to Device: 2791.7 MiB/s
Device to Host Pinned: 2958.62 MiB/s
Device to Host Pageable: 2799.05 MiB/s
Device to Device: 59.5438 GiB/s
GPU Core Performance
Single-precision Float: 1392.71 Gflop/s
Double-precision Float: 175.572 Gflop/s
32-bit Integer: 700.618 Giop/s
24-bit Integer: 699.786 Giop/s
Generated: Thu Aug 16 20:09:10 2012
CUDA-Z Report EFI Flashed MacVidCards card
=============
Version: 0.6.133 SVN Built Jun 25 2010 23:28:46
http://cuda-z.sourceforge.net/
OS Version: Mac OS X 10.8 12A269
Driver Version: 8.0.51 295.30.00f01
Driver Dll Version: 5.0
Runtime Dll Version: 3.0
Core Information
----------------
Name: GeForce GTX 570
Compute Capability: 2.0
Clock Rate: 1464 MHz
Multiprocessors: 15
Warp Size: 32
Regs Per Block: 32768
Threads Per Block: 1024
Threads Dimensions: 1024 x 1024 x 64
Grid Dimensions: 65535 x 65535 x 65535
Watchdog Enabled: Yes
Integrated GPU: No
Concurrent Kernels: Yes
Compute Mode: Default
Memory Information
------------------
Total Global: 2559.56 MiB
Shared Per Block: 48 KiB
Pitch: 2048 MiB
Total Constant: 64 KiB
Texture Alignment: 512 B
Texture 1D Size: 65536
Texture 2D Size: 65536 x 65535
Texture 3D Size: 2048 x 2048 x 2048
GPU Overlap: Yes
Map Host Memory: Yes
Error Correction: No
Performance Information
-----------------------
Memory Copy
Host Pinned to Device: 5681.18 MiB/s
Host Pageable to Device: 3533.28 MiB/s
Device to Host Pinned: 5679.95 MiB/s
Device to Host Pageable: 3529.13 MiB/s
Device to Device: 58.8025 GiB/s
GPU Core Performance
Single-precision Float: 1387.29 Gflop/s
Double-precision Float: 175.602 Gflop/s
32-bit Integer: 700.318 Giop/s
24-bit Integer: 699.442 Giop/s
Generated: Thu Aug 16 20:48:01 2012