I'm looking at this CalDigit card to help spread the workload... have 2 HDDs run off the card and the other 2 HDDs on the cMP internal ports. And have them all in the same RAID 0 configuration I have now. That should raise the ceiling limit to more like 800-900MB/s since I'm splitting the bandwidth between the cMP's native SATA ports and this one, running on its separate PCIe bandwidth.
Seems to make sense in theory. But do you see something that I may not be seeing in this kind of setup?
People were benching 4-digit speeds I think as high as 1500MB/s for a single PCIe storage card and 5900MB/s for the Amfeltec adapter that held 4 cards, but these were synthetic benchmarks. In real life using real applications for real work it seems even with these very fast cards there are other limitations in the system that result in real bandwidth of about 500-600MB/s for write speeds. There are some real-world exceptions that can go faster, like single files with multi-gigabyte size in certain specialized applications, but these exceptions are rare.
So if you are already pushing 550MB/s you are already doing about as well as most real world uses can go and you might not be able to hit your 800-900MB/s goal except in a synthetic benchmark.
It's possible I'm wrong. You are doing a strange setup that I haven't seen performance tests for (RAID spread across SATA and PCIe), and maybe that spread will make a difference. Also my information is from older versions of OS X so perhaps the system bottleneck has been reduced or removed by some new feature, for example APFS.
If you do this, I hope you post the results for everyone. I wish you the best, but just in case, prepare yourself for the possibility of little real world difference.