Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

ChrisA

macrumors G5
Original poster
Jan 5, 2006
13,082
2,498
Redondo Beach, California
Anyone know of a good library for reading PDF files? I don't need to display them I need a general purpose way of extracting embedded metadata.

I've not yet decided on a programming language and may base its selection of the availability of a good PDF library. I do need this to be portable. I like to keep only the GUI platform specific. Open source is preferred because it will be used by an open source application

It turns out the USGS has made all of their topographic maps available as free downloads in "geoPDF" form. These maps cover the entire US in extreme detail and can be displayed using any PDF reader but they also have meta data inside (hence the name "geoPDF") that describe the exact map projection. I'm planning on pulling this data out of 10,000+ PDF files the storing it in a DBMS.

I'll write my own PDF parser if I have to but I thought I'd ask around first.
 
If you are only targeting the Mac, the PDFKit framework will handle everything, and Core Graphics has plenty of PDF support built in as well. (PDFKit is a Cocoa wrapper around the Core Graphics PDF features.)

If you're working cross-platform, maybe look at xpdf or one of the Linux PDF readers and see what they use.
 
If you are only targeting the Mac,...

Thanks. But it needs to be cross platform.

Just in case anyone comes across this thread via a search I thought I'd answer my own question.

Google the name "pandalex".

This is a lex/yacc parcer for PDF. The user of the library adds call backs that are called when syntactic features are found. The project is about 90% complete but works well enough to parse many, many example PDF files but lacks "polish" like a working GNU Autotools build system and a automatic regression testing. I downloaded the code and I'm cleaning it up. It now builds warning free with just ./configure; make. I will try to contribute fixes back but if I can't contact the author I'll post a "fork". PM me if you need this It is covered by GPL.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.