Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

dj.mooky

macrumors newbie
Original poster
Feb 26, 2008
25
0
This refferences the post from https://forums.macrumors.com/threads/734883/

So I have successfully used reg ex's to accomplish my tasks, and I give a sincere thank you shout out to all those that helped me, but I have run into a new issue.

I'm not seeing many things on google about this, that would serve my purposes anyway however on to the punchline

I have a text file that I am parsing for information. I am adding a new format that I am going to support this time, however the sly foxes have written unicode characters into their txt files.

I have been breaking down the text files by the 4 carriage returns "\n". So every "\n\n\n\n" I break the string down into an array, from there I break it down every single return to get the line-by-line information, and generally parse from there. However this is not working because of a unicode character... specificially "\Ufeff" Which seems to be a character that dictates to use standard spacing something something text.

This code comes at the beginning of every line, so I am getting errors, and unable to break the string into multiple strings based upon "\n" because it assumes that "\n\Ufeff" is a single character, and will not break it off at the \n without taking the \Ufeff along with it. Furthermore, when I attempt to
Code:
anArray = [aString componentsSeparatedByString:@"\n\n\n\n\Ufeff"]
it tosses an error saying that \U is an "incomplete universal character name \Ufeff"

Has anyone dealt with anything like this, and come across a fancy way to remove this specific unicode character? It is really turning into a thorn in my side right now.

Thanks in advance
 
Update:

So I've discovered that my issue is that the code \Ufeff only exists in UTF16, and if I go into text edit, and save-as to a UTF8 file, it works perfectly. So while I do want to figure out how to parse a UTF16 file without doing anything, does anyone know of a good way to downgrade the string once you import it from a file?

Thanks
 
So while I do want to figure out how to parse a UTF16 file without doing anything, does anyone know of a good way to downgrade the string once you import it from a file?

You can convert it using a number of different NSString methods, such as dataUsingEncoding:allowLossyConversion: followed by initWithData:encoding:
 
You can convert it using a number of different NSString methods, such as dataUsingEncoding:allowLossyConversion: followed by initWithData:encoding:

Excellent thanks... off to the races

I don't suppose I could get you to give me an example of that code in action? I am getting nsstrings that throw selector errors for initWithBytes:length:encoding: type of things.... I know this has to be simpler than I am making it... push come to shove I may just write an apple script to save them as UTF8 files in text edit....

But i'm convinced there has to be an easier way


FOR HARK! there was a better way

Code:
anArray = [aString componentsSeparatedByString:@"\uFEFF"];
aNewString = anArray[1]+[2]...etc
My problem was my "u" was capitalized, and my "FEFF" was not.... this successfully removed all BOM from the file and allowed parsing regularly in my app...

Much thanks for all the help though, I always appreciate it, and enjoy learning better or different ways to do things.

Until next time, hasta luego mis amigos
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.