Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

angelwatt

Moderator emeritus
Original poster
Aug 16, 2005
7,852
9
USA
I've modified my .htaccess to redirect hotlinking of images from my site, and I wanted to do something similar with PDF files. In essence when a link comes from outside my site trying to access a PDF file, I want them redirected to a certain page on my site. Currently I have this:

Code:
### Incoming links to PDF files get redirected to publications page
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^$  # blank referrer OK
RewriteCond %{HTTP_REFERER} !mysite\.com [NC] # from ymsite OK
RewriteRule ^.*/articles/.*\.pdf$ /publications.php [R]

This doesn't seem to be working though. I looked up a PDF from my site through Google and it still opens the PDF rather than redirecting. I have read some on that the redirect needs to go to something of the same mime type, so I may have to create a hotlink.pdf to redirect to, but would prefer sending them to an actual page.

If you have an idea of how to resolve the above, or know of a better technique to do what I'm wanting let me know.
 
It was a slow weekend so just want to make sure people saw this post. I haven't resolved it as of yet.
 
i did some looking around and a list apart looks like they have a good solution for your problem. you could just alter it to redirect to a page.

check it out here.
 
i did some looking around and a list apart looks like they have a good solution for your problem. you could just alter it to redirect to a page.

check it out here.

Thanks for the link, it's one I've read before as I'm a big fan of ALA, but unfortunately their solution doesn't seem to apply to PDF files. I've got hotlinking blocked for images, css, js, and music files, but it won't work for PDFs for whatever reason.
 
just curious if you ever found a solution?

after haven't looked, all i find are image hotlink preventions. no file other files types.
 
just curious if you ever found a solution?

after haven't looked, all i find are image hotlink preventions. no file other files types.

No, not really. There was some reads on streaming the PDF content, but that didn't sound like a great solution. I think the best bet is to have urls like page.php?article=1024, then have PHP correlate that with the actual PDF file and produce it without ever showing a direct link to the PDF file, which would let me place the PDFs in a non-web accessible folder so even if someone knew a direct folder access to the PDF they couldn't do so. I haven't tried this out yet, but seems to be the most straight forward option at the moment. I'm not really worried about people hotlinking my PDFs, I just wanted to make sure they saw my actual site as well, and I like learning more server configurations and how they work.

Why the hotlinking block wasn't working is still bugging me. Not sure if it's a binary file issue, or if its because the PDF has to be loaded through a plug-in and that somehow interrupts information flow between the server redirect and the browser somehow. Thought about testing it out by placing a Windows executable out there and doing a hotlink block on it, but haven't gotten around to it. I could also try with a movie file since those work through plug-ins as well.
 
yeah, i think that the fact that a pdf needs the plugin in order to open in a browser could be helping. but then again, a list apart says this:

You might want to change this to include .swf, .mp3, or other similar files.

so we would think that pdf's are not different, right? especially if you can mod_rewrite a flash file.
 
just curious if you ever found a solution?

after haven't looked, all i find are image hotlink preventions. no file other files types.

One sure fire, fool proof method is to replace all the files you don't want hot linked with encrypted version of those files. Now there is no need to prevent hot linking,

Then on your publications web page you replace your links with a link to a cgibin file that is passed the filename as input and writes the un-encrypted file to standard out. You don't even need a strong encryption. Any hack will be good enough.
 
Then on your publications web page you replace your links with a link to a cgibin file that is passed the filename as input and writes the un-encrypted file to standard out. You don't even need a strong encryption. Any hack will be good enough.

Yeah, that's similar to what I was thinking, though encryption isn't really needed (depending on what you meant by encryption here). I don't need to protect the content of the PDF, just wanting them to view it from my site rather than someone elses. Thanks for the ideas.
 
Alright, I came up with a solution using PHP. Say you have a file page.php, then you setup links to the PDFs like so: page.php?file=filename.pdf. At the top (or at least before you output anything to the screen) you have this PHP code:
PHP:
function GetThatPDF($file, $path)
{
  // Check if its a PDF file
  if (preg_match('/\.pdf$/', $file)) {
    header("Pragma: public");
    header("Expires: 0");
    header("Cache-Control: must-revalidate, post-check=0, pre-check=0");
    header("Cache-Control: public");
    header("Content-Description: File Transfer"); 
    header("Content-Type: application/pdf");
    header("Content-Length: ". filesize(
      $_SERVER['DOCUMENT_ROOT']."$path/$file"));
    // Assigns a filename for the file; we'll keep the same name
    header("Content-Disposition: inline; filename=$file");
    header("Content-Transfer-Encoding: binary");
    // Outputs the file content
    readfile($_SERVER['DOCUMENT_ROOT'] ."$path/$file");
  }
}
// If referred from my site and argument sent for viewing specific file
if (isset($_SERVER['HTTP_REFERER']) && isset($_GET['file']) &&
    preg_match('/page\.php/', $_SERVER['HTTP_REFERER']) {
  GetThatPDF(stripslashes($_GET['file']), "/../files");
}
It grabs the PDF files from a folder called "files," which is outside the htdocs (or www or html_public) folder, which can't be directly viewed by people. It also checks if the referrer page was the page with this PHP code, otherwise it will just display whatever you have for the page. For me, page.php is a HTML page with links to the PDF files using the method above. So, anyone who uses a link to the PDF (page.php?file=file.pdf) will get sent to this page on my site. They also won't ever see the direct directory path to your PDF files. Unfortunately you have to set this up for any page with PDF files on it (though the function be in its own file and have PHP include it when necessary). Certainly not as nice as the Apache rewrite technique to block image hotlinking, but it gets the job done I guess.

If this isn't clear (and that's certainly possible) just ask and I'll try to clarify.

Edit: My first bit of code only worked on Mac, very odd. Added some more header information, which makes the function longer, but this now works on Mac/Windows with Firefox, Safari, and IE.
 
I've been reading over this entire thread, especially the mod_rewrite rules via .htaccess for Apache. I checked a few other web sites and basically alot of people have the same problem - hotlinked PDF files sometimes are not associated with a referrer so the rewrite rules will fail in such circumstances.

The better approach is to integrate PDF and other downloadable office documents into a download system, not via direct linking. Just protect such directories from indexing, set proper permissions to prevent direct access, and process all downloads via server side code (as discussed in recent replies)

BTW, coding it allows for integrating sensible features such as tracking downloads via a hit counter stored in your DB, customize the download link and associated icons and order them as you see fit, and integrate it all into a site requiring login first for any downloads (for example).

-jim
 
I've been reading over this entire thread, especially the mod_rewrite rules via .htaccess for Apache. I checked a few other web sites and basically alot of people have the same problem - hotlinked PDF files sometimes are not associated with a referrer so the rewrite rules will fail in such circumstances.

The better approach is to integrate PDF and other downloadable office documents into a download system, not via direct linking. Just protect such directories from indexing, set proper permissions to prevent direct access, and process all downloads via server side code (as discussed in recent replies)

BTW, coding it allows for integrating sensible features such as tracking downloads via a hit counter stored in your DB, customize the download link and associated icons and order them as you see fit, and integrate it all into a site requiring login first for any downloads (for example).

-jim

well yeah he could do that. but he only wanted to make sure people that viewed his PDF, viewed /downloaded it by going to his site, not from some random site. i don't think the actual download process was a priority here.
 
Yep. When I said "direct linking" that means a URL pointing to a PDF on his server which can be hotlinked (referenced in a web page different than his). I just pointed out the effort was worthwhile (i.e. I did say "BTW") to create a download system instead of direct linking, i.e. if login is required to download anything - not only will it be presented nicely and trackable, it can only be accessible by users on the physical site. Everyone else gets redirected to a login page. I'm not going to get into the semantics of cookie and session spoofing to bypass this (no system is perfect) -- that's another subject, but get my point in general? That's all I'm sayin'. :)

Sometimes the simple approach isn't always the best - i.e. the rewrite method if only it worked 100% of the time for PDF's. I'm still researching for a better ruleset, so far no luck. That would be great for the OP, no doubt.

-jim
 
I don't need to protect the content of the PDF, just wanting them to view it from my site rather than someone elses.

@SrWebDeveloper: a login system is too much trouble for this problem.

@angelwatt: nice job on figuring out a decent solution.
 
@SrWebDeveloper: a login system is too much trouble for this problem.
@angelwatt: nice job on figuring out a decent solution.

@notnek: Sloooooow down, comprehend. I praise angelwatt's solution which involves server side coding vs. mod_rewrite (the preferred solution if it worked). My comments about tracking hits and login were on;y as a "while you're at it" kind of thing - that's all, to let the OP know even though mod_rewrite is easiest (if it worked), a server side approach has other benefits as well. "BTW" means by the way, btw! ;)

Oops... in angelwatt's code replace $file with $f in line 1 and use either GetTheFile or GetThePDF - the function name and reference to it is different so it won't work. But the solution is excellent.

I am still looking for a mod_rewrite rule that works with PDF's just like .gifs to see if MIME type can used in the rule set instead of file extension. That would mean no PHP coding at all.

-jim
 
Oops... in angelwatt's code replace $file with $f in line 1 and use either GetTheFile or GetThePDF - the function name and reference to it is different so it won't work. But the solution is excellent

Good call. I changed how the functioned worked, but didn't re-copy and paste all of the code, which caused that miss-connect. I follow you on the database idea. I use something to that effect with my image gallery and recipe sections of my site, though my database is a XML (because I'm wile like that :D). I have fun with XML/XSLT, but decided it wasn't currently needed for my simple situation at the moment. I may go that route if I end up adding many more PDF files.

Thanks both of you for your interest in the topic and your discussion. If either of you, or anyone who comes across this thread who finds a better solution that uses the .htaccess feel free to post it here. It's certainly an oddball problem. Wish I had the motivation to contact the Apache group about it.
 
(scratching head) ... angelwatt's method IS a "download system" and the same type I was suggesting - I just expanded on the idea by suggesting "btw" that "integrating sensible features" such as hit counter, login, etc. would be worthwhile. That's all. Can you possibly live with this and carry on, nontech? I mean notnek, sorry, my bad there. Damn keyboard.

@angelawatt - thank you for emphasizing a key issue, Apache's solution isn't working right (or we need to find another) because when all is said and done, even if a coded "download system" is super cool and secure, life would be easier it Apache got it right.

-jim
 
(scratching head) ... angelwatt's method IS a "download system" and the same type I was suggesting - I just expanded on the idea by suggesting "btw" that "integrating sensible features" such as hit counter, login, etc. would be worthwhile. That's all. Can you possibly live with this and carry on, nontech? I mean notnek, sorry, my bad there. Damn keyboard.

-jim

yes. it does download a .pdf automatically for mac users. however, for most windows users, it loads the acrobat plug-in inside the browser. it can also do the same thing for office documents. from seeing several of angelwatt's various posts on here, and looking at his site, he is a developer who is all about compatibility and accessibility. therefore, he needs/will want to adhere to different cross-browser compatibility, and the way a lot of people would view his .pdf.

and yes, i know what btw means, i'm not 70 years old. hilarious name calling too. grow up.
 
Just as a note for those who were following the thread. The code I had before only worked on Mac apparently, though don't ask me why. I found a fix though. I just had to add more header information like content-size, which seemed to affect Safari as well. What a pain. Anyways, the code I posted back on Post 10 has been updated accordingly for working across browsers and OS, though technically I haven't checked things on Linux.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.