Start by experimenting with the 'grep' command on the Terminal.app command line.
First, make a folder and put some example PDF files in it. For example, make the folder TEST on your Desktop. Then do this in Terminal:
Code:
cd ~/Desktop/TEST
grep -E -h -o 'VAX[0-9]{8}' *.pdf
The first command simply sets the working directory to TEST.
The 'grep' command searches for the pattern in every file with the ".pdf" extension. Make sure the .pdf of the test files is lower-case, because the shell by default is case-sensitive.
You can copy and paste the grep command into a Terminal window, rather than typing it in.
The option -e tells grep to use extended patterns, which enables
{n}.
The -h option tells grep to NOT output the filename. If you omit this, then each found item is preceded by the filename it was found in.
The option -o tells grep to only output the exact text that matches the pattern. Otherwise grep would output the entire "line" containing the pattern, and since PDF files aren't line-oriented, you'd get a bunch of crap you'd have to remove.
The pattern is quoted so the shell won't try expanding it. The pattern
VAX[0-9]{8} means:
- VAX means the three letters "VAX" literally.
- [0-9] means the digits 0 thru 9.
- {8} means the digits are repeated exactly 8 times.
Try this command on several different PDFs. Be sure to make some test PDFs that have near-matching patterns such as VAX123456 and VAX1234567 to make sure that those patterns are NOT found (too few digits).
Also make a PDF with "VAX" followed by 9 or 10 digits, and observe what happens. If you can't accept this, then clearly say so in a reply post, so another pattern can be given.
Here's the man page for grep:
https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man1/grep.1.html
Once you have the output working correctly, post again and provide some example PDFs and a list of exactly what the correct output should be.
Don't try putting the grep command into a workflow, unless you understand exactly how to add shell scripts to workflows. The grep command itself is just one step in a multi-step process, but it's the all-important first step. Take one step at a time.
Also post the Automator workflow you have now, so we don't have to guess about how it finds PDFs.
Finally, it's possible that this grep command won't find anything in your PDFs, because of how PDFs can be produced. The data may not be an actual string "VAX_8_digits", so it's possible that grep won't find any patterns. If that happens, post an example of the failing PDF, so we have something that demonstrates the problem.