Bulk Convert HTML Files to PDF

MBP123 · May 2, 2016

I have a bunch of .html files saved to my computer that I would like to convert to PDFs. Is there an easy way to do this? I've done a bunch of research and have yet to come across an elegant solution.

kiwipeso1 · May 2, 2016

If you want exact PDFs based on how they appear in your favourite browser, then print though the system dialogue to PDF.

This is simple, free and effective in any version of OS X.

MBP123 · May 2, 2016

kiwipeso1 said:
If you want exact PDFs based on how they appear in your favourite browser, then print though the system dialogue to PDF.

This is simple, free and effective in any version of OS X.

Yeah but it's manual, tedious, and can't be done in bulk.

superscape · May 3, 2016

MBP123 said:
Yeah but it's manual, tedious, and can't be done in bulk.

It's pretty straightforward to make an Automator service to do it for you. How good the PDFs look very much depends on the HTML - for example, if there's heaps of javascript going on then you may struggle. If its fairly simple HTML then you'll probably be okay.

If I have time later, I'll write up how to do it in a bit more detail. But in short, take a look at the "Run Shell Script" action, and "cupsfilter", if you're comfortable in the Terminal.

richard2 · May 3, 2016

You could use htmldoc, which can be installed via MacPorts or Homebrew.

MBP123 · May 3, 2016

superscape said:
It's pretty straightforward to make an Automator service to do it for you. How good the PDFs look very much depends on the HTML - for example, if there's heaps of javascript going on then you may struggle. If its fairly simple HTML then you'll probably be okay.

If I have time later, I'll write up how to do it in a bit more detail. But in short, take a look at the "Run Shell Script" action, and "cupsfilter", if you're comfortable in the Terminal.

I'm no engineer, but I know enough in Terminal and Automator to be dangerous. I've been tinkering with wkhtmltopdf and then followed these steps:

Create your HTML document that you want to turn into a PDF (or image)
Run your HTML document through the tool.
For example, if I really like the treatment Google has done to their logo today and want to capture it forever as a PDF: wkhtmltopdf http://google.com google.pdf

I pasted "wkhtmltopdf http://google.com google.pdf" into Terminal and it works properly and saves a PDF of Google.com in this example. However, I fail to see how I would do this in bulk? If I have say a list of ~3000 URLs that I want saved to PDF, how would I go about that within Terminal along with the naming convention I want?
[doublepost=1462289592][/doublepost]

richard2 said:
You could use htmldoc, which can be installed via MacPorts or Homebrew.

Thanks for the info but I don't think I can use MacPorts. I'm on a work machine and ran into some issues when trying MacPorts a few months back.

chown33 · May 3, 2016

MBP123 said:
... However, I fail to see how I would do this in bulk? If I have say a list of ~3000 URLs that I want saved to PDF, how would I go about that within Terminal along with the naming convention I want?

1. What format is your list of 3000 URLs in? Can you get it into a plain text file format, with exactly one URL per line? If so, then it's fairly straightforward to write a shell script to run the commands repeatedly.

2. What naming convention do you want? Please be as specific as possible, giving a few examples. Again, if you can clearly describe this, it may be straightforward to write a script for it. Honestly, though, it depends on the complexity of the naming convention.

I'm happy to contribute pieces of scripting, but without details to work from I can't do it. If you post an example of say 5-10 URLs, in the one-per-line text format, and the naming convention you want for each, someone will be able to write and test a script. Superscape also volunteered above, and is a regular in the Mac Programming forum.

This has now become a Mac Programming question, which is fine, but you might consider asking the moderators to move the thread to the Mac Programming forum. You do that by reporting your thread (the !-in-a-circle icon) and asking them to move it.

MBP123 · May 3, 2016

chown33 said:
1. What format is your list of 3000 URLs in? Can you get it into a plain text file format, with exactly one URL per line? If so, then it's fairly straightforward to write a shell script to run the commands repeatedly.

2. What naming convention do you want? Please be as specific as possible, giving a few examples. Again, if you can clearly describe this, it may be straightforward to write a script for it. Honestly, though, it depends on the complexity of the naming convention.

I'm happy to contribute pieces of scripting, but without details to work from I can't do it. If you post an example of say 5-10 URLs, in the one-per-line text format, and the naming convention you want for each, someone will be able to write and test a script. Superscape also volunteered above, and is a regular in the Mac Programming forum.

This has now become a Mac Programming question, which is fine, but you might consider asking the moderators to move the thread to the Mac Programming forum. You do that by reporting your thread (the !-in-a-circle icon) and asking them to move it.

Thanks for the help!

So I can get them into a list of URLs pretty easily (spreadsheet format, one per line). To give some background, I am looking to pull 10k's (annual financial reports) of each publicly traded company from the SEC's website. For example, this is AAPL's 2015 report: https://www.sec.gov/Archives/edgar/data/320193/000119312515356351/0001193125-15-356351-index.htm and here is IBM's: https://www.sec.gov/Archives/edgar/data/51143/0001047469-14-001302-index.htm

The naming convention is pretty straightforward, and would be AAPL 2015.pdf and IBM 2015.pdf. For 2014, it would be AAPL 2014.pdf and IBM 2014.pdf, respectively. I'm working with a developer to scrape these URLs into spreadsheet format so that part won't be an issue. But I'm still stuck on 2 things:

1) Creating a script that leverages wkhtmltopdf, whereby I can just take an entire list of the URLs and it converts them to PDFs with the correct naming convention in bulk. I can get it to do one a time (though without the space I want between the company's ticker symbol and the year, but I could figure that out using quotes with the script I'm sure), but need some assistance creating the loop piece in the script so it goes onto the next one after completing the previous one.

2) The way IBM posts their annual report differs slightly from the way in which AAPL does. For example, AAPL's 10k contains pretty much all of the info I'd be looking for, with proper links to each of the corresponding reference documents. But IBM on the other hand, submits it as a bunch of separate html files, without links from the main one to the others. This seems to result in separate files for certain data that I would need, such as balance sheet, income statement, etc.

The developer suggested outputting all the urls and then generating PDFs of each of them... but that's also manual so does the wkhtmltopdf program let you pass multiple urls and offer the option of concatenating them into a single pdf?

chown33 · May 3, 2016

MBP123 said:
Thanks for the help!

So I can get them into a list of URLs pretty easily (spreadsheet format, one per line). To give some background, I am looking to pull 10k's (annual financial reports) of each publicly traded company from the SEC's website. For example, this is AAPL's 2015 report: https://www.sec.gov/Archives/edgar/data/320193/000119312515356351/0001193125-15-356351-index.htm and here is IBM's: https://www.sec.gov/Archives/edgar/data/51143/0001047469-14-001302-index.htm

The naming convention is pretty straightforward, and would be AAPL 2015.pdf and IBM 2015.pdf. For 2014, it would be AAPL 2014.pdf and IBM 2014.pdf, respectively. I'm working with a developer to scrape these URLs into spreadsheet format so that part won't be an issue. But I'm still stuck on 2 things:

1) Creating a script that leverages wkhtmltopdf, whereby I can just take an entire list of the URLs and it converts them to PDFs with the correct naming convention in bulk. I can get it to do one a time (though without the space I want between the company's ticker symbol and the year, but I could figure that out using quotes with the script I'm sure), but need some assistance creating the loop piece in the script so it goes onto the next one after completing the previous one.

2) The way IBM posts their annual report differs slightly from the way in which AAPL does. For example, AAPL's 10k contains pretty much all of the info I'd be looking for, with proper links to each of the corresponding reference documents. But IBM on the other hand, submits it as a bunch of separate html files, without links from the main one to the others. This seems to result in separate files for certain data that I would need, such as balance sheet, income statement, etc.

The developer suggested outputting all the urls and then generating PDFs of each of them... but that's also manual so does the wkhtmltopdf program let you pass multiple urls and offer the option of concatenating them into a single pdf?

First, I see no relationship between the URL string and the company ticker symbol or year. So let's assume the "naming convention" will have to be explicit. That is, a single line contains the URL and the desired output name. Since the URL is by far the longest, it makes it easier to review the input data if the much shorter output name comes first on the line, for example:

Code:

AAPL 2015 https://www.sec.gov/Archives/edgar/data/320193/000119312515356351/0001193125-15-356351-index.htm
IBM 2015 https://www.sec.gov/Archives/edgar/data/51143/0001047469-14-001302-index.htm

Second, exporting a spreadsheet to comma-separated values (CSV) is a workable format for further processing, but you'll need to be very specific about the separator: Is it commas or tabs? You'll also need to consider quoting: Is the name quoted or not? Is the URL quoted or not? None of those are insurmountable obstacles, you just need to be specific about exactly what the file produced by the spreadsheet app has in it.

Third, quoting a space in a filename is simple. The loop is something else entirely, and I'd probably do this in two stages:
1. Run a preliminary script that outputs commands.
2. Run the commands to fetch URL data and product PDFs.

The 1st script would read the CSV file and output one command line for each input line. Those commands can easily be written to a text file. You then run the commands by telling the shell to read that text file as input.

Two stages are preferable because it lets you review the command-lines before running them. This means you can check how well the parsing and command-line production works without actually running any commands.

Fourth, this process would be completely automated so exceptional cases will need to be exceptions. If IBM or some other company does something other than one report, you'll have to handle those companies with a different automation process. You might be able to automate that (you'd have to describe each exception in detail), or it might be simplest to do a handful of them manually.

The key point here is that the repetitive part must repeat the same actions for each URL. If the actions differ, then you have to take that URL out of that particular repeated process.

If you want a first cut of how I'd approach this, please post a half-dozen or so lines of text as described above. This would be the Stage 1 script only. I'd post back with the command-line output for you to try. With a half-dozen lines, you could simply copy and paste it into a Terminal window.

MBP123 · May 3, 2016

chown33 said:
First, I see no relationship between the URL string and the company ticker symbol or year. So let's assume the "naming convention" will have to be explicit. That is, a single line contains the URL and the desired output name. Since the URL is by far the longest, it makes it easier to review the input data if the much shorter output name comes first on the line, for example:

Code:

AAPL 2015 https://www.sec.gov/Archives/edgar/data/320193/000119312515356351/0001193125-15-356351-index.htm IBM 2015 https://www.sec.gov/Archives/edgar/data/51143/0001047469-14-001302-index.htm

Second, exporting a spreadsheet to comma-separated values (CSV) is a workable format for further processing, but you'll need to be very specific about the separator: Is it commas or tabs? You'll also need to consider quoting: Is the name quoted or not? Is the URL quoted or not? None of those are insurmountable obstacles, you just need to be specific about exactly what the file produced by the spreadsheet app has in it.

Third, quoting a space in a filename is simple. The loop is something else entirely, and I'd probably do this in two stages:
1. Run a preliminary script that outputs commands.
2. Run the commands to fetch URL data and product PDFs.

The 1st script would read the CSV file and output one command line for each input line. Those commands can easily be written to a text file. You then run the commands by telling the shell to read that text file as input.

Two stages are preferable because it lets you review the command-lines before running them. This means you can check how well the parsing and command-line production works without actually running any commands.

Fourth, this process would be completely automated so exceptional cases will need to be exceptions. If IBM or some other company does something other than one report, you'll have to handle those companies with a different automation process. You might be able to automate that (you'd have to describe each exception in detail), or it might be simplest to do a handful of them manually.

The key point here is that the repetitive part must repeat the same actions for each URL. If the actions differ, then you have to take that URL out of that particular repeated process.

If you want a first cut of how I'd approach this, please post a half-dozen or so lines of text as described above. This would be the Stage 1 script only. I'd post back with the command-line output for you to try. With a half-dozen lines, you could simply copy and paste it into a Terminal window.

Sorry, I should have been more clear. Don't get too hung up on the naming convention/URL right now because the spreadsheet I'll be building will contain both (e.g., URL in one column, naming convention in another). Once this is complete, I would be looking to take these and then convert the URLs to PDF via wkhtmltopdf.

Attached is an example of the "source" spreadsheet, so let me know if this is manageable?

chown33 · May 3, 2016

MBP123 said:
Sorry, I should have been more clear. Don't get too hung up on the naming convention/URL right now because the spreadsheet I'll be building will contain both (e.g., URL in one column, naming convention in another). Once this is complete, I would be looking to take these and then convert the URLs to PDF via wkhtmltopdf.

Attached is an example of the "source" spreadsheet, so let me know if this is manageable?

You've misunderstood. I don't need the spreadsheet file. I need the text file exported from the spreadsheet file. It should be in exactly the format I outlined above.

If you can't produce the text file in the outlined format, then you need to provide a text file in a format you can produce, so the script can be written to handle that exact format.

One of the points I was trying to make is that you need to "get hung up" on the naming convention and URL now, because the script to process them needs to be written to handle them, in the exact format they will exist. This isn't just a minor detail, it's a major point on which the automation hinges.

I just finished a first cut at a script for the first stage.

It reads text files in the format I outlined above. It then outputs a series of wkhtmltopdf commands. Those commands are suitable for saving in another text file, or pasting into a Terminal window, or feeding directly to a shell command that executes them.

Here's the example input:

Code:

## It skips lines that start with '#'.

AAPL 2014  https://www.sec.gov/Archives/edgar/data/320193/000119312515356351/0001193125-15-356351-index.htm
IBM  2014  https://www.sec.gov/Archives/edgar/data/51143/0001047469-14-001302-index.htm

FOO  2015  http:/example.com/foo-data
FUN  2015  http:/example.com/foo-data

## Lines too short.  Ignored with message.
Ignored.
Skip this.

Here's the output from the script:

Code:

wkhtmltopdf 'https://www.sec.gov/Archives/edgar/data/320193/000119312515356351/0001193125-15-356351-index.htm' 'AAPL 2014.pdf'
wkhtmltopdf 'https://www.sec.gov/Archives/edgar/data/51143/0001047469-14-001302-index.htm' 'IBM 2014.pdf'
wkhtmltopdf 'http:/example.com/foo-data' 'FOO 2015.pdf'
wkhtmltopdf 'http:/example.com/foo-data' 'FUN 2015.pdf'
#-- SKIPPED: Lacks 3 items: Ignored.
#-- SKIPPED: Lacks 3 items: Skip this.

This output can be pasted directly into a Terminal window (you can try this), or fed directly to a shell using a pipe.

MBP123 · May 3, 2016

So I think I'm overcomplicating things. The scraper is going to scrape the URLs and will put them into an excel sheet with the requisite format for wkhtmltopdf. For example:

wkhtmltopdf 'https://www.sec.gov/Archives/edgar/data/320193/000119312515356351/d17062d10k.htm' 'AAPL 2015.pdf'
wkhtmltopdf 'https://www.sec.gov/Archives/edgar/data/320193/000119312514383437/d783162d10k.htm' 'AAPL 2014.pdf'
wkhtmltopdf 'https://www.sec.gov/Archives/edgar/data/320193/000119312513416534/d590790d10k.htm' 'AAPL 2013.pdf'
wkhtmltopdf 'https://www.sec.gov/Archives/edgar/data/1287213/000155837016003908/plow-20151231x10k.htm' 'PLOW 2015.pdf'
wkhtmltopdf 'https://www.sec.gov/Archives/edgar/data/1287213/000155837015000337/plow-20141231x10k.htm' 'PLOW 2014.pdf'
wkhtmltopdf 'https://www.sec.gov/Archives/edgar/data/1287213/000104746914002150/a2218556z10-k.htm' 'PLOW 2013.pdf'

Seems like I'd just take this, paste them into Terminal and be good to go, no?

chown33 · May 3, 2016

MBP123 said:
So I think I'm overcomplicating things. The scraper is going to scrape the URLs and will put them into an excel sheet with the requisite format for wkhtmltopdf. For example:

wkhtmltopdf 'https://www.sec.gov/Archives/edgar/data/320193/000119312515356351/d17062d10k.htm' 'AAPL 2015.pdf'

wkhtmltopdf 'https://www.sec.gov/Archives/edgar/data/320193/000119312514383437/d783162d10k.htm' 'AAPL 2014.pdf'

wkhtmltopdf 'https://www.sec.gov/Archives/edgar/data/320193/000119312513416534/d590790d10k.htm' 'AAPL 2013.pdf'

wkhtmltopdf 'https://www.sec.gov/Archives/edgar/data/1287213/000155837016003908/plow-20151231x10k.htm' 'PLOW 2015.pdf'

wkhtmltopdf 'https://www.sec.gov/Archives/edgar/data/1287213/000155837015000337/plow-20141231x10k.htm' 'PLOW 2014.pdf'

wkhtmltopdf 'https://www.sec.gov/Archives/edgar/data/1287213/000104746914002150/a2218556z10-k.htm' 'PLOW 2013.pdf'

Seems like I'd just take this, paste them into Terminal and be good to go, no?

Do a test: paste it into a plain text file in TextEdit.app. If it looks like a valid command line, then paste a few lines into a Terminal window. See what happens.

MBP123 · May 3, 2016

Yeah it seems to work fine pasting directly from the spreadsheet into Terminal. I'll be copy/pasting around 20k rows though, so are there copy/paste limitations and/or processing limitations within Terminal that you can foresee?

chown33 · May 3, 2016

MBP123 said:
Yeah it seems to work fine pasting directly from the spreadsheet into Terminal. I'll be copy/pasting around 20k rows though, so are there copy/paste limitations and/or processing limitations within Terminal that you can foresee?

There might be. I've never tried pasting 20k lines before.

You can paste it into a text file, then tell Terminal to read the commands from the text file. I know there aren't any limitations in that case. It'll just read commands and run them until it hits EOF, then it'll stop. If you want to do that, I can tell you how; just ask.

If there are errors running any of the commands, such as the URL is unreachable, the pasted commands won't stop. Neither would the simple case of reading commands from a text file. If you want it to stop running commands on an error, I can tell you how; just ask.

In any case, 20k commands are going to take a while to finish. If it takes 3 secs to finish a single command, that's 60k secs, which is about 16 hours and 40 minutes.

MBP123 · May 3, 2016

That should be okay, as I'm patient and time isn't really a big deal with this type of thing since companies only file the reports once a year.

superscape · May 4, 2016

superscape said:
If I have time later, I'll write up how to do it in a bit more detail. But in short, take a look at the "Run Shell Script" action, and "cupsfilter", if you're comfortable in the Terminal.

Just for the sake of completeness, here's the shell script / Automator solution I proposed:

https://www.ghostotter.com/automating-html-pdf-file-conversion/

MBP123 · May 4, 2016

Thanks a bunch for the help! I'll give it a go and let you know how it plays out.

MBP123 · May 5, 2016

So I'm running into some issues when converting. For example, pasting the below into Terminal only winds up converting ~ 15 of the first entries. Any ideas as to what could be going on?

Code:

wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1090872/000109087215000051/a-10312015x10k.htm' 'A 2015.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1090872/000109087214000045/a-10312014x10k.htm' 'A 2014.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1090872/000109087213000029/a-10312013x10k.htm' 'A 2013.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1090872/000109087212000018/a-10312012x10k.htm' 'A 2012.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1090872/000104746911010124/a2206674z10-k.htm' 'A 2011.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1090872/000104746910010499/a2201423z10-k.htm' 'A 2010.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1090872/000104746909010861/a2195875z10-k.htm' 'A 2009.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1090872/000104746908013312/a2189713z10-k.htm' 'A 2008.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1090872/000104746907010272/a2181802z10-k.htm' 'A 2007.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1090872/000104746906015256/a2175273z10-k.htm' 'A 2006.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1090872/000104746906000516/a2166540z10-k.htm' 'A 2005.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1090872/000119312504216917/d10k.htm' 'A 2004.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1090872/000119312503098260/d10k.htm' 'A 2003.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1090872/000089161802005622/f86150e10vk.htm' 'A 2002.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1090872/000089161802000185/f78349e10-k.htm' 'A 2001.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1090872/000109581101000226/f67706e10-k.txt' 'A 2000.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/4281/000119312516470162/d216801d10k.htm' 'AA 2015.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/4281/000119312515054376/d836461d10k.htm' 'AA 2014.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/4281/000119312514051516/d634164d10k.htm' 'AA 2013.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/4281/000119312513062916/d448525d10k.htm' 'AA 2012.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/4281/000119312512065493/d257313d10k.htm' 'AA 2011.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/4281/000119312511039230/d10k.htm' 'AA 2010.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/4281/000119312510034308/d10k.htm' 'AA 2009.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/4281/000119312509029469/d10k.htm' 'AA 2008.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/4281/000119312508032695/d10k.htm' 'AA 2007.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/4281/000119312507033124/d10k.htm' 'AA 2006.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/4281/000119312506034739/d10k.htm' 'AA 2005.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/4281/000119312505033143/d10k.htm' 'AA 2004.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/4281/000119312504031464/d10k.htm' 'AA 2003.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/4281/000092701603000933/d10k.htm' 'AA 2002.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/4281/000102140802003033/d10k.txt' 'AA 2001.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1606180/000156459016014247/aac-10k_20151231.htm' 'AAC 2015.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1606180/000156459015001496/aac-10k_20141231.htm' 'AAC 2014.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/4515/000119312516474605/d78287d10k.htm' 'AAL 2015.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/4515/000119312515061145/d829913d10k.htm' 'AAL 2014.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/4515/000000620114000004/aagaa10k-20131231.htm' 'AAL 2013.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/6201/000000620113000023/amr-10kx20121231.htm' 'AAL 2012.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/6201/000119312512063516/d259681d10k.htm' 'AAL 2011.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/6201/000095012311014726/d78201e10vk.htm' 'AAL 2010.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/6201/000000620110000006/ar123109.htm' 'AAL 2009.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/6201/000000620109000009/ar120810k.htm' 'AAL 2008.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/6201/000095013407003888/d43815e10vk.htm' 'AAL 2006.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/6201/000095013406003715/d33303e10vk.htm' 'AAL 2005.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/6201/000095013405003726/d22731e10vk.htm' 'AAL 2004.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/6201/000095013404002668/d12953e10vk.htm' 'AAL 2003.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/6201/000104746903013301/a2108197z10-k.htm' 'AAL 2002.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1555074/000155507416000051/aamc10k_12312015.htm' 'AAMC 2015.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1555074/000155507415000007/aamc10k_12312014.htm' 'AAMC 2014.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1555074/000119312514060955/d675550d10k.htm' 'AAMC 2013.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1555074/000155507413000005/aamc-20121231x10k.htm' 'AAMC 2012.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/8177/000156761916002103/h10035300x1_10k.htm' 'AAME 2015.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/8177/000156761915000337/s000800x1_10k.htm' 'AAME 2014.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/8177/000114036114014200/form10k.htm' 'AAME 2013.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/8177/000114036113014090/form10k.htm' 'AAME 2012.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/8177/000114036112017382/form10k.htm' 'AAME 2011.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/8177/000114036111018563/form10k.htm' 'AAME 2010.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/8177/000095012310028692/g22608e10vk.htm' 'AAME 2009.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/8177/000095014409002745/g18290e10vk.htm' 'AAME 2008.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/8177/000095014408002428/g12502e10vk.htm' 'AAME 2007.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/8177/000095014407002889/g06245e10vk.htm' 'AAME 2006.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/8177/000095014406003023/g00520e10vk.htm' 'AAME 2005.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/8177/000095014405003330/g93010e10vk.htm' 'AAME 2004.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/8177/000095014404003175/g87714e10vk.htm' 'AAME 2003.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/8177/000095014403004207/g81143e10vk.htm' 'AAME 2002.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/706688/000070668816000237/a10k4q2015.htm' 'AAN 2015.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/706688/000070668815000089/a10k4q2014.htm' 'AAN 2014.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/706688/000070668814000015/a10k4q2013.htm' 'AAN 2013.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/706688/000119312513071592/d456615d10k.htm' 'AAN 2012.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/706688/000119312512088733/d277599d10k.htm' 'AAN 2011.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/706688/000095012311018548/c11796e10vk.htm' 'AAN 2010.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/706688/000095012310018116/c96874e10vk.htm' 'AAN 2009.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/706688/000136231009002984/c81755e10vk.htm' 'AAN 2008.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/706688/000095014408001538/g11960e10vk.htm' 'AAN 2007.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/706688/000095014407001717/g05705e10vk.htm' 'AAN 2006.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/706688/000095014406002291/g00200e10vk.htm' 'AAN 2005.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/706688/000110465905009960/a05-2712_110k.htm' 'AAN 2004.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/706688/000104746904007891/a2130820z10-k.htm' 'AAN 2003.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/706688/000104746903011371/a2107228z10-k.htm' 'AAN 2002.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/706688/000095014402003290/g75091e10-k.txt' 'AAN 2001.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/706688/000095014401004438/g67848e10-k.txt' 'AAN 2000.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1158114/000101968716005442/applied_10k-123115.htm' 'AAOI 2015.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1158114/000101968715000833/aaoi_10k-123114.htm' 'AAOI 2014.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1158114/000101968714000736/aaoi_10k-123113.htm' 'AAOI 2013.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/824142/000082414216000118/aaon10k123115.htm' 'AAON 2015.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/824142/000082414215000014/aaon10-k123114.htm' 'AAON 2014.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/824142/000082414214000015/aaon10-k.htm' 'AAON 2013.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/824142/000102660813000024/aaon_10k123112.htm' 'AAON 2012.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/824142/000102660812000022/aaon_10k2011.htm' 'AAON 2011.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/824142/000102660811000023/aaon_10k123110.htm' 'AAON 2010.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/824142/000102660810000026/aaon_10k123109.htm' 'AAON 2009.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/824142/000102660809000028/aaon_10k08.txt' 'AAON 2008.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/824142/000102660808000053/aaon_10k07.txt' 'AAON 2007.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/824142/000102660807000038/aaon_10k06.txt' 'AAON 2006.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/824142/000102660806000041/aaon_10k05.txt' 'AAON 2005.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/824142/000102660805000013/aaon_10k04.txt' 'AAON 2004.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/824142/000102660804000016/aaon_10k03.txt' 'AAON 2003.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/824142/000102660803000049/aaon_10k02.txt' 'AAON 2002.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/824142/000102660802000004/aaon_10k01.txt' 'AAON 2001.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/824142/000102660801500003/aaon_10k00.txt' 'AAON 2000.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1158449/000115844916000299/aap_10kx122016.htm' 'AAP 2016.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1158449/000115844915000063/aap_10kx132015.htm' 'AAP 2015.pdf'
wkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1158449/000115844914000058/aap_10kx12282013.htm' 'AAP 2013.pdf'
xwkhtmltopdf 'http://www.sec.gov/Archives/edgar/data/1158449/000115844913000069/aap_10kx12292012.htm' 'AAP

chown33 · May 5, 2016

MBP123 said:
So I'm running into some issues when converting. For example, pasting the below into Terminal only winds up converting ~ 15 of the first entries. Any ideas as to what could be going on?

Are there any error messages in the Terminal window? If so, please post them.

Exactly which lines work? Which ones fail?

Does wkhtmltopdf produce error messages or not? To test this, tell it to retrieve a known bad URL, and write output to a test PDF. What error messages (if any) are produced? The purpose of this test is to learn what an wkhtmltopdf error looks like.

If wkhtmltopdf doesn't prodce error message by default, does it have command-line options that tell it to produce error messages? Consult the reference for the command to learn of these, if any.

Are any of the later output files created? That is, does a listing of the directory producted by 'ls' show the expected PDF files, or are they completely missing?

If it only converts 15, try pasting exactly 20. Describe what happens. Post any error messages.

EDIT
Ultimately, you may end up pasting the commands into a text file, then running a shell that reads the text file. I mentioned above that you can ask for info on how to do this.

Step 1: Paste the commands into a text file. If you use TextEdit for this, make sure it saves it as plain text, not RTF or any other format. For the example commands that follow, I'll assume the text file is stored in your home folder, named "example.txt".

Step 2: Confirm the text file looks OK.

Code:

head example.txt

This command will show the first 10 lines of the file. If they look like plain text, great. If not, you probably didn't save it as plain text.

Code:

tail example.txt

Another check of the file, but this time at the end (tail) instead of the start (head). Again, it should be plain text.

Step 3: Run a shell taking commands from the file.

Code:

bash example.txt &>output.txt &

The shell will now run the commands in the background. It reads commands from example.txt. All its output goes into "output.txt". Closing this Terminal window will stop this background shell.

You can quickly check on the progress using this command in a Terminal window:

Code:

tail output.txt

If you don't want the shell to run in background, omit the single trailing &. The shell will then run until it stops. You can open another Terminal window and issue other commands, such as the tail to look at the output.txt file.

ocabj · May 5, 2016

http://www.tldp.org/LDP/Bash-Beginners-Guide/html/

Bulk Convert HTML Files to PDF

macrumors regular

Suspended

macrumors regular

macrumors 6502a

macrumors regular

macrumors regular

Moderator

macrumors regular

Moderator

macrumors regular

Attachments

Moderator

macrumors regular

Moderator

macrumors regular

Moderator

macrumors regular

macrumors 6502a

macrumors regular

macrumors regular

Moderator

macrumors 6502a

Our Staff