Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Big Ron

macrumors 6502
Original poster
Dec 7, 2012
419
106
United Kingdom
Can anyone suggest a good app to index the contents of PDF files? I have lots of PDFs that I would like to search for a certain keyword. I can search within individual files but I would like the ability to search a folder for a word or phrase across many PDFs. I’m currently running latest version of BS. Thanks in advance.
 
  • Like
Reactions: snipper

grahamwright1

Cancelled
Feb 10, 2008
210
202
Spotlight seems to do a good job with PDF's already, but not filtering on only a specific folder - what problems are you having with it?

I don't use it but have heard good things about Alfred for searching within files - https://www.alfredapp.com
 
Last edited:

BrianBaughn

macrumors G3
Feb 13, 2011
9,822
2,494
Baltimore, Maryland
Are you saying you'd like to search all the PDFs in a location for a keyword and see all the results in context…and not just a list of files that contain the keyword?
 

Big Ron

macrumors 6502
Original poster
Dec 7, 2012
419
106
United Kingdom
Are you saying you'd like to search all the PDFs in a location for a keyword and see all the results in context…and not just a list of files that contain the keyword?
Yes I would like to search for a word for example: “USB C”, and have the results presented in a format that displays all of the documents that contain that word together with a short paragraph placing it in context. I have tried spotlight as grahamwright1 suggests but it isn’t returning results at all. If I invoke spotlight to search for the title of the PDF it always returns a correct result.
 

Big Ron

macrumors 6502
Original poster
Dec 7, 2012
419
106
United Kingdom
Spotlight seems to do a good job with PDF's already, but not filtering on only a specific folder - what problems are you having with it?

I don't use it but have heard good things about Alfred for searching within files - https://www.alfredapp.com
I have tried spotlight thanks but it isn’t finding words within a PDF document, or word document for that matter. I’ll look into Alfred thanks.
 

ZBB

macrumors newbie
Mar 5, 2008
26
0
Take a look at Yep... https://ironicsoftware.com/yep/

I've used it for years (since ~2008) for managing all my scanned personal records. Its sort of a focused Finder tool for PDFs and other document files, but includes an organization feature (you can put things in "filed documents" that automatically organizes into year and month folders by just dragging them to a box that slides out from the side when dragging a file from finder). It also has a search that works well, and you can tag files (with the tags stored in the spotlight index). I used to just tag all PDFs before I had an OCR tool. But since I now have an OCR, I rarely tag anything unless I'm scanning something from years ago and I want to tag a year to it...
 

HDFan

Contributor
Jun 30, 2007
7,267
3,320
I have lots of PDFs that I would like to search for a certain keyword. I can search within individual files but I would like the ability to search a folder for a word or phrase across many PDFs.
In finder you can do a search with multiple search criteria (using the + search icon). I just did an "any" search for "searchword" type pdf and got a list of all the pdfs on my entire system which contain "search word".

No way to see the word context though. You have to spacebar through the list and scroll though the documents.
 

mj_

macrumors 68000
May 18, 2017
1,618
1,281
Austin, TX
Spotlight does indeed do a fantastic job at searching PDFs as long as they contain searchable text of course. I use it all the time to find specific documents that I know contain a certain phrase or number (e.g. bank statements for a specific bank account).
 

BrianBaughn

macrumors G3
Feb 13, 2011
9,822
2,494
Baltimore, Maryland
Further to my post about Houdaspot…I tried it out and the demo is fully functional. Searching my Documents folder for the phrase "Flex pitch" (a Logic Pro feature) it finds two documents and the context of each result is displayed. Looks pretty good:

Screen Shot 2021-05-14 at 1.02.13 PM.png
 

Big Ron

macrumors 6502
Original poster
Dec 7, 2012
419
106
United Kingdom
Thanks for all your suggestions - this forum never fails😀

I seem to get mixed results with spotlight - perhaps I will rebuild its index.

I have tried PDF Studio Viewer. It seems to do the trick, although the interface is a bit ugly however; search words are successfully found and identified by document source, then when selected, the searched for word is displayed in the source Pdf document. The link is here: https://www.qoppa.com/pdfstudioviewer/

I like the look of Houdaspot thanks BrianBaughn. I will investigate that next. Thanks to all who took time to reply.
 

pdxrevolution

macrumors member
Sep 2, 2015
41
69
Yes I would like to search for a word for example: “USB C”, and have the results presented in a format that displays all of the documents that contain that word together with a short paragraph placing it in context. I have tried spotlight as grahamwright1 suggests but it isn’t returning results at all. If I invoke spotlight to search for the title of the PDF it always returns a correct result.

You have two different problems here. The first is that not all PDFs show up on your Mac as searchable. The creator of the PDF has to make the PDF searchable, and if that doesn't happen, then you have to use OCR software to scan each page of the PDF to "find" the text and add it to the PDF. When you say that Spotlight doesn't find any PDFs when you search for "USB C" that means there is no searchable text inside the PDF. When PDFs are searchable, it means that, hidden in the PDF is the equivalent of a text file that is a layer underneath the image on each page. That's what Spotlight can find and index. If that text layer isn't there, there's nothing Spotlight can do for you.

Personally, I use PDFPenPro to OCR my textless PDFs. The Pro version has a batch command that lets you OCR hundreds of PDFs in a row. I also like that PDFPenPro simply modifies the existing PDF and adds the text to it. Some apps (especially in the Windows world) require you to save a copy of the exported PDF, which is a real pain. But there are plenty of other apps that do this, like ABBYY FineReader, DevonThink (I think the Pro version only). Evernote also does this, but it doesn't save the text to the file. Rather, it saves the text to its own database. So, if you ever take the PDF out of Evernote, it's still not searchable.

Once you get your documents OCRd, you're right to say Spotlight will only show you a list of documents with those words. It won't show you where the word is used in the document (your contextual paragraph request). That's where these other app suggestions come in, like Houdah, Yep, and again DevonThink. But you need to OCR your PDFs first.
 
  • Like
Reactions: HDFan

Big Ron

macrumors 6502
Original poster
Dec 7, 2012
419
106
United Kingdom
Spotlight does indeed do a fantastic job at searching PDFs as long as they contain searchable text of course. I use it all the time to find specific documents that I know contain a certain phrase or number (e.g. bank statements for a specific bank account).
You are right of course. I have many Pdfs that do not contain searchable text. For example I have some scanned old MacFormat Pdfs from 1984 that are essentially just 'pictures' however PDF Studio Viewer still manages to find my search word.
 

Big Ron

macrumors 6502
Original poster
Dec 7, 2012
419
106
United Kingdom
You have two different problems here. The first is that not all PDFs show up on your Mac as searchable. The creator of the PDF has to make the PDF searchable, and if that doesn't happen, then you have to use OCR software to scan each page of the PDF to "find" the text and add it to the PDF. When you say that Spotlight doesn't find any PDFs when you search for "USB C" that means there is no searchable text inside the PDF. When PDFs are searchable, it means that, hidden in the PDF is the equivalent of a text file that is a layer underneath the image on each page. That's what Spotlight can find and index. If that text layer isn't there, there's nothing Spotlight can do for you.

Personally, I use PDFPenPro to OCR my textless PDFs. The Pro version has a batch command that lets you OCR hundreds of PDFs in a row. I also like that PDFPenPro simply modifies the existing PDF and adds the text to it. Some apps (especially in the Windows world) require you to save a copy of the exported PDF, which is a real pain. But there are plenty of other apps that do this, like ABBYY FineReader, DevonThink (I think the Pro version only). Evernote also does this, but it doesn't save the text to the file. Rather, it saves the text to its own database. So, if you ever take the PDF out of Evernote, it's still not searchable.

Once you get your documents OCRd, you're right to say Spotlight will only show you a list of documents with those words. It won't show you where the word is used in the document (your contextual paragraph request). That's where these other app suggestions come in, like Houdah, Yep, and again DevonThink. But you need to OCR your PDFs first.
Excellent explanation thanks, I never realised that there was an 'underlying' layer that was searchable. I am learning so much. I will look at Houdah, Yep, and DevonThink to see what I like the best. Thanks.
 

NoBoMac

Moderator
Staff member
Jul 1, 2014
6,245
4,936
I seem to get mixed results with spotlight - perhaps I will rebuild its index.
That could help, as the PDFs that are problematic possibly were not run against the MD Importer (/System/Library/Spotlight/PDF.mdimporter).

And what @mj_ said re: actual text in the file (I've got some PDFs whose text have been vectorized [ooooold PDFs]).

Generally, PDFs I've created/edited on Mac seem to work fine re: text search and Spotlight.

EDIT: mdimport -L to get a list of importers that should be adding to Spotlight.
 

Fowl

macrumors regular
Sep 28, 2018
135
140
HoudahSpot underlyingly uses the Spotlight index, same as using Spotlight from the Finder. Are you sure that you are searching the whole computer in Spotlight, and not just one folder?
 

BrianBaughn

macrumors G3
Feb 13, 2011
9,822
2,494
Baltimore, Maryland
HoudahSpot underlyingly uses the Spotlight index, same as using Spotlight from the Finder. Are you sure that you are searching the whole computer in Spotlight, and not just one folder?
The point of HoudahSpot is it combines the file searching capability of Spotlight with the internal, context-revealing searching of Preview.
 

Fowl

macrumors regular
Sep 28, 2018
135
140
The point of HoudahSpot is it combines the file searching capability of Spotlight with the internal, context-revealing searching of Preview.
Right, but if it can find a file on the computer containing "USB C", I'd expect Spotlight to do so too.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.