This is the closest I found for a spec, but the page I need to see is gone.
http://developer.apple.com/technotes/tn/tn1150.html#CanonicalDecomposition
It would be nice to be able to find this page.
http://developer.apple.com/technotes/tn/tn1150table.html
The Unicode Decomposition table contains a list of characters that are illegal as part of an HFS Plus string, and the equivalent character(s) that must be used instead. Any character appearing in a column titled "Illegal", must be replaced by the character(s) in the column immediately to the right (titled "Replace With").
If you come up with the list of illegal characters I can help you with the regex. The easy illegal character is ":", but this missing link promises some others.
-numero