I haven't seen that figure, maybe you are mixing it with
early demos for example. Care to show where you got that 800MB/s from?
Of course it's from demo's

, as Intel won't publish a full specification on TB (given technical details on the hardware itself, but nothing on protocol conversions to date).
But if you want a source, take a look
here (specifically, take a look under the
Third Party Support heading).
Thunderbolt supports two channels of 10Gbps (equivalent to about 1280MBps) transfers in both directions, simultaneously. Intel demonstrated actual throughputs of up to 6.25Gbps (800MBps) using prototype consumer products.
BTW, the math checks out, so the author wasn't smoking anything.
(10Gb/s / 8) * 1024 = 1280MB/s (max sustainable bandwidth), which is in excess of what the PCIe lanes can actually handle. Sounds great.
But there's still the latency to be taken into account (8ns, which I verified to day), and more importantly, there's absolute silence on the protocol aspects of it, and if you recall, TB is a
dual-protocol interconnect (conversions on both ends).
Let me give you an example using SATA's 8b/10b encoding alone with the above calculations...
8b/10b * 1280MB/s = 1024MB/s. Still decent vs. 800 - 850 or so (though 800 - 850 isn't a slouch by any means).
Now take into account the latency (and it's per packet BTW), you'll get less in terms of real world throughputs.
Where it gets murky however, is with the TB to PCIe protocol conversion, as Intel hasn't produced any public data on this.
So all we can go on is real world testing that's currently available, which I suspect you'll agree with.
My previous point was with the example you used that got it over 1GB/s, as it took cache into account.
A more accurate test IMO would be multiple SSD's over TB, but not attached to any sort of RAID hardware (takes cache out of the equation, save the cache on the SSD's themselves), and operated independently in a Daisy Chain on the same TB port, but simultaneously (i.e. read/write at least one very large file per disk simultaneously).
Or better yet, any other test that can achieve more than 1GB/s over TB that doesn't use cache whatsoever in order to try and determine what TB's actual limits are for real world performance (what's more important to users anyway).
Unfortunately, I don't have access to the hardware, but perhaps you do, given your connections with Anandtech.

Hint hint.
So you're saying the 200MB/s higher result with SSDs on the R6 is all due to cache??
Yes. It could easily be that high, and there are systems that can exceed that substantially.
For example, I have a volume that's built of 8x disks in a RAID 5, and can generate 1.39GB/s during a write due to cache (2GB of it BTW). But when I turn it off, that performance figure is cut to nearly half (~780MB/s). So we're looking at a 620MB/s or so boost just do to cache, which is
substantial.
Granted, my RAID card is more robust than the controller that's in the Pegasus, but cache can have that great of an influence on performance data.