I'm looking for an OCR program that will take a pdf (without a text layer) as input and either add the text layer or output plain text. The issue is I want text that is not only searchable, but capable of being selected and copied out without being mangled. I have an OCR program already (Scansnap) that came with my scanner, but when I try to select blocks of text for copying to another program the lines get highlighted (and therefore selected and ultimately copied) in the wrong order.
I've seen this happen on pdf files before, not just those ones created using this Scansnap OCR - you highlight text for copying, drag down the page, and the highlighted area misses whole sections of lines (usually ends of lines) until you get lower down when they fill-in along with other parts of the text. It's almost as if the text layer is partially mixed up like a badly shuffled pack of cards. And when you paste the text to an outside application, the pasted text is shuffled too.
The text I'll be using this on is printed source code, so won't have recognisable word boundaries like English text, but as long as the OCR can work character by character, top left to bottom right with no mixing on the way, it should be okay. In fact I'm surprised my current OCR doesn't do this.
Can anyone recommend a program that will do this reliably?
(I'm on Mac OS 10.13.6 - I saw OwlOCR recommended on other forum posts but the Appstore won't deliver it below 10.15)
Thanks!
I've seen this happen on pdf files before, not just those ones created using this Scansnap OCR - you highlight text for copying, drag down the page, and the highlighted area misses whole sections of lines (usually ends of lines) until you get lower down when they fill-in along with other parts of the text. It's almost as if the text layer is partially mixed up like a badly shuffled pack of cards. And when you paste the text to an outside application, the pasted text is shuffled too.
The text I'll be using this on is printed source code, so won't have recognisable word boundaries like English text, but as long as the OCR can work character by character, top left to bottom right with no mixing on the way, it should be okay. In fact I'm surprised my current OCR doesn't do this.
Can anyone recommend a program that will do this reliably?
(I'm on Mac OS 10.13.6 - I saw OwlOCR recommended on other forum posts but the Appstore won't deliver it below 10.15)
Thanks!