Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

czeluff

macrumors 6502
Original poster
Oct 23, 2006
272
2
Hello to all,

I'm still a pretty new programmer, but I'd like to try taking on a more difficult program. Someone at work asked me to write a program that'd do the following:

1. Search for all emails within a document (.doc, .pages, .rtf etc does not matter, whichever will be easiest to code).
2. Export the list to an rtf.

Where should I start looking for the classes and code necessary to do this? I'm a C++ developer (new) for work, so if it's easiest to do in C++ on Windows i'll do that. But i'd prefer to do this using Objective-C on a Mac.

Any guidance would be greatly appreciated. :)

Chad
 

lee1210

macrumors 68040
Jan 10, 2005
3,182
3
Dallas, TX
Plain text would be the easiest to parse, but there are likely libraries to allow acess to the actual text of other filetypes.

As far as pattern matching, querying by way of NSPredicate would likely work, but as has been mentioned here before, what is a valid email address is very complex. If you simplified to anything between two spaces with an @, you might get close, but would assuredly get false positive results.

-Lee
 

kainjow

Moderator emeritus
Jun 15, 2000
7,958
7
As for reading in those file formats, you could use Cocoa and the NSAttributedString class, which can read Word, RTF, and HTML and give you back a plain text representation, which then could be used to search for emails. Here's a sample (untested, assumes ASCII):

Code:
#include <Foundation/Foundation.h>

char* plainTextFromFile(char *file) {
    char *string = NULL;
    NSAutoreleasePool *pool = [[NSAutoreleasePool alloc] init];
    NSAttributedString *astr = [[[NSAttributedString alloc] initWithPath:[NSString stringWithUTF8String:file] documentAttributes:nil] autorelease];
    if (astr) {
        NSString *plain = [astr string];
        string = malloc([plain length]+1);
        [plain getCString:string maxLength:[plain length] encoding:NSASCIIStringEncoding];
    }
    [pool release];
    return string;
}

You could call that from a C/C++ program, just make sure you compile the file as Objective-C (easiest way is to give it a .m extension with gcc) and link with the Foundation framework :)
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.