Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

likegadgets

macrumors 6502a
Original poster
Jul 22, 2008
785
355
US
Screen Shot 2021-12-10 at 3.46.20 PM.png
Thanks. I have Acrobat Pro 2020 and the option toenable native OS is not there.
 

Jayratch

macrumors newbie
Jan 7, 2009
20
6
Buffalo, NY
We ran some tests on M1's and there was no perceptible benefit. We resorted to using multiple Windows/Intel machines. Maybe when there is a recompiled version of Acrobat we will try it out on an M1 Max
That's because you are using 1990s software that has repeatedly been revised only enough to *run* on current hardware and OS software. As far as I can tell, there have really only been a few updates to the core engine software in Acrobat. Once to transition from PowerPC to Intel, once to retain compatibility when 32 bit Intel support was dropped, and finally once to run on M1. At no point, ever, has Adobe made any gesture toward supporting multiple cores in Acrobat, which means that for the past 15 years it's been using ½ or less of the capacity of the machine.

If you want fast OCR, Acrobat is not the tool. Acrobat is a good tool for creating complex editable PDF files, yes, but it is not fast and I doubt it ever will be. For large files, it seems that Tesseract and tools built around it is the only way to go. I'm getting about 3-4 pages per second with OCRmyPDF.
 
  • Like
Reactions: mainemini

michalm

macrumors member
Apr 17, 2014
72
66
Microsoft Azure Cognitive Services are ideal for this, all API driven. It is a commercial service, so you pay for the compute resources you use, but infinitely scalable.

We did use this and were hitting about 9500 pages/second in the US West 2 region.

If confidentiality is a priority and you can't use public cloud, same APIs can be used with a docker container.
 
Last edited:
  • Like
Reactions: Botts85

Botts85

macrumors regular
Feb 9, 2007
229
175
"Yet". macOS 12 provides a new feature to the Vision framework which is exactly designed for recognizing text in document images. It runs on the Neural Engine and is super-fast(It can even do document recognition in real-time on a camera/video feed).
"Yet" is exactly it. Once someone builds an app to use the Vision framework to OCR, we'll be in great shape. I'm honestly flabbergasted Apple didn't build it into Preview. I almost wonder if there's a patent issue.

Live Text is very fast and has been WAY more accurate than commercial OCR software in my testing. It's also on-device for privacy reasons.

If you're okay losing formatting: this works -- I made an Apple Shortcut to extracted a PDF into individual page images, got text from each image file, then added them to a text file. It did 114 pages single spaced pages in a minute and a half.

We might not see third party apps use Vision to OCR PDFs only because Apple has some App Store terms that one cannot monetize an Apple API feature.
 

mainemini

macrumors member
Dec 26, 2014
83
146
Maine
I'm buying an M1Mac Mini for our office and wondering if there is any specific OCR benefit with 16 GB vs the cheaper 8 GB base M1 Mini.

We've been using FineReader which allowed use of multiple cores but we'd dump that if a better M1 or Vision OCR program comes along.

When we OCR we'll do hundreds of PDF pages at a time but otherwise these machines aren't taxed much as our other software is browser based.
 
Last edited:

Tagbert

macrumors 603
Jun 22, 2011
6,256
7,281
Seattle
I'm buying an M1Mac Mini for our office and wondering if there is any specific OCR benefit with 16 GB vs the cheaper 8 GB base M1 Mini.

We've been using FineReader which allowed use of multiple cores but we'd dump that if a better M1 or Vision OCR program comes along.

When we OCR we'll do hundreds of PDF pages at a time but otherwise these machines aren't taxed much as our other software is browser based.
If you have the software and another Mac, now, you should do a test run and watch it's RAM usage in Activity Monitor, that will give you an idea if the software uses a lot of RAM or is more i/o bound. In general, 16GB will be better if you are using multiple apps at once or apps that use large data sets in memory or are doing things like running vms.
 

mainemini

macrumors member
Dec 26, 2014
83
146
Maine
Thanks but our current minis are Intel which is why I'm hoping to upgrade to an M1 but also uncertain if 16 GB is needed with the M1s doing OCR.
 

Tagbert

macrumors 603
Jun 22, 2011
6,256
7,281
Seattle
Thanks but our current minis are Intel which is why I'm hoping to upgrade to an M1 but also uncertain if 16 GB is needed with the M1s doing OCR.
My point was that the memory utilization is unlikely to be significantly different, so you could use your intel mini to estimate how much RAM is used and base your decision on that. If it currently uses only 4GB of RAM, then that is not your bottleneck. If the process is using 12GB of RAM, that would be another answer. There isn’t really any difference between how the M1 and the Intel Macs use RAM other than the M1 Mac also use that RAM for the display. If this mini is doing batch processing and connected to a basic monitor, that is unlikely to be a deciding factor.
 
  • Like
Reactions: mainemini
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.