Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

kiercardo

macrumors newbie
Original poster
Jan 3, 2013
4
0
Rome, IT
Hi!
I downloaded a lot of academic papers, and their first page is not very useful, and i would move it to the bottom of the file. I also saw that the first line is the title of the article, which i would use to rename the pdf (the files have a random filename, pretty useless). I think I should use Automator to do that, but I'm a completely newbie to automation. Can you help me?
Thanks



3ca802b2c5c5dc4b496545bb463b5173_Schermata%202016-11-11%20alle%2010.44.30.png
 
There might be cleaner and easier solutions, but here are my thoughts!
The following code depends on PDFtk Server, a command-line tool licensed under GNU General Public License that you can use for free as long as you don't sell commercial software with it. If you want to use the code below, you need to install it. Maybe you can also get the task done with the Applescript library of Preview.app or with another PDF tool like Xpdf, but I have chosen PDFtk Server. We use it for two things:
1. Make a dump of the file to get the title for the PDF (sometimes the title is missing in the test files I downloaded from JSTOR. More on this later)
2. Move the first page to the end of the document and write the file
Then there is another command-line tool involved that you should already have installed on your Mac. It's called: awk. With awk I did the string manipulations to extract the title from the file dump. Alternatives would be to use sed or grep for this.
I guess it would be better to write this as a shell script, but here is a one liner you can copy and paste to your Terminal bash after modifying the input and output path to your needs:
Code:
inputfiles=$"/path/to/inputfiles/*pdf" ; outputfiles=$"/path/to/outputfiles/" fileext=".pdf" ; for i in $inputfiles ; do newfilename=$(pdftk "$i" dump_data_utf8 | awk 'c&&!--c; /InfoKey: Title/{c=1}' | awk '{ sub("InfoValue: ", ""); print}') ; if [ -z "$newfilename" ] ; then newfilename="$(basename "$i" $fileext)" ; fi ; fileiterator=0 ; while [ -f "$outputfiles$newfilename$fileext" ] ; do let fileiterator++ ; newfilename="${newfilename%%_*}"_"$fileiterator"; done ; pdftk A="$i" cat A2-end A1 output "$outputfiles$newfilename$fileext" ; done
The files without a title value keep their filename (as a number). You could use Xpdf to read out the PDF content to plain text and extract the first line if you want. The test files I have are unfortunately not so homogeneous that the first line is always the expected title. So this could be a little tricky.
If there are documents with equal titles I just added a number to the document. You could extract other values to get a better name.
Finally you could call the code within an Automator task or Applescript, but that's on you. Hope this helps to accomplish the task.
 
  • Like
Reactions: kiercardo
There might be cleaner and easier solutions, but here are my thoughts!
The following code depends on PDFtk Server, a command-line tool licensed under GNU General Public License that you can use for free as long as you don't sell commercial software with it. If you want to use the code below, you need to install it. Maybe you can also get the task done with the Applescript library of Preview.app or with another PDF tool like Xpdf, but I have chosen PDFtk Server. We use it for two things:
1. Make a dump of the file to get the title for the PDF (sometimes the title is missing in the test files I downloaded from JSTOR. More on this later)
2. Move the first page to the end of the document and write the file
Then there is another command-line tool involved that you should already have installed on your Mac. It's called: awk. With awk I did the string manipulations to extract the title from the file dump. Alternatives would be to use sed or grep for this.
I guess it would be better to write this as a shell script, but here is a one liner you can copy and paste to your Terminal bash after modifying the input and output path to your needs:
Code:
inputfiles=$"/path/to/inputfiles/*pdf" ; outputfiles=$"/path/to/outputfiles/" fileext=".pdf" ; for i in $inputfiles ; do newfilename=$(pdftk "$i" dump_data_utf8 | awk 'c&&!--c; /InfoKey: Title/{c=1}' | awk '{ sub("InfoValue: ", ""); print}') ; if [ -z "$newfilename" ] ; then newfilename="$(basename "$i" $fileext)" ; fi ; fileiterator=0 ; while [ -f "$outputfiles$newfilename$fileext" ] ; do let fileiterator++ ; newfilename="${newfilename%%_*}"_"$fileiterator"; done ; pdftk A="$i" cat A2-end A1 output "$outputfiles$newfilename$fileext" ; done
The files without a title value keep their filename (as a number). You could use Xpdf to read out the PDF content to plain text and extract the first line if you want. The test files I have are unfortunately not so homogeneous that the first line is always the expected title. So this could be a little tricky.
If there are documents with equal titles I just added a number to the document. You could extract other values to get a better name.
Finally you could call the code within an Automator task or Applescript, but that's on you. Hope this helps to accomplish the task.
thank you for the tips! However all i get after running the PDFtk script you posted is a empty pdf file (not renamed). How it could be possible
 
thank you for the tips! However all i get after running the PDFtk script you posted is a empty pdf file (not renamed). How it could be possible
That should not happen. As I explained it is possible, that the PDF cannot be renamed, but it shouldn't be empty. I have no idea why this happens. Therefore it could take a while to figure out, where exactly the problem is. As I don't know your experience with the Terminal, lets start from the beginning.
I tested the command again from the source of my last post. Here is what I did.
1. Copied the code from inputfiles= ...to... "$outputfiles$newfilename$fileext" ; done from the post above to an empty TextEdit document.
2. Opened a window in Finder with the folder of input and output files
3. Dragged the input folder from Finder to a new line into the TextEdit document, where the bash command resides.
4. Dragged the output folder to another new line into TextEdit
5. Replaced in TextEdit the input path (/input/path) -> inputfiles=$"/path/to/inputfiles/*pdf" (just the part written bold -> leave /*pdf intact)
6. Replaced in TextEdit the output path (/output/path) -> outputfiles=$"/path/to/outputfiles/" (-> leave the slash (/) intact)
7. Opened Terminal.app (There is the word bash in top of the window)
8. Copied the whole command (see step 1) from TextEdit to the Terminal bash window and pressed enter
That's it and it works (To be honest, I don't have pdftk in my environment variable and additionally exchanged the two occurrences of the pdftk binary with the whole path to pdftk).
If you did the same steps 1 to 8 and still get an empty PDF, we will test the pdftk and the awk command next. If they're working like expected, we need to take a look on the PDF (is it protected?), on the system (are there special chars or whitespace in your input/output path and filename, although that should work) and on the command itself for your special use case, that we need to find out, what is so special on it.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.