How can I prevent hotlinking to PDF files (Full Version)

All Forums >> [Web Development] >> Server Issues



Message


Kitka -> How can I prevent hotlinking to PDF files (1/27/2006 0:01:36)

One site of ours contains hundreds of brochures and hefty user manuals (all pdfs) intended mainly for clients' use.

I have the directories banned via robots.txt which deters the casual surfer, but it seems that someone somewhere (I can't find out who) is hotlinking to many of them, and costing us bandwidth.

All of them arrive with no referrer - which seems to be standard for pdf downloads, even if the user downloads from a page in our site. So I can't ban the "no-referrer" requests via .htaccess. Therefore they get the pdfs and never see any page in our site.

Does anyone have any suggestions as to how to protect the files from hotlinking, but still enable any people legitimately visiting our site to download them with little difficulty?




womble -> RE: How can I prevent hotlinking to PDF files (1/27/2006 5:00:05)

Have them in a secured log-in area? Or is that making things too complicated?




Kitka -> RE: How can I prevent hotlinking to PDF files (1/27/2006 5:27:24)

quote:

Or is that making things too complicated?


Mmnn, not complicated, but maybe clarification is in order.

Client hires out a broad range of very specialised and expensive equipment. Potential hirees need access to brochures (to decide if equipment meets their needs) and subsequently user manuals (to know how to use it).

Our client wants all brochures and manuals for both current equipment and ex-hire equipment available as he feels it enhances his business's reputation. He doesn't understand that people can access them at his cost but without ever being aware of his business / services.

Brochures and manuals need to be easily available to genuine visitors to the site - but not hotlinked to sites unknown. Clients are not static - they vary from day to day.

Visitor log in would be fine, but it needs to be generic - e.g. User: Guest, Password: Anon. But how would I implement this, such that it admits visitors who access files only from our site, as opposed to from someone else's site?? In other words, how do I force them to be aware of the company providing the manuals/ brochures, and prevent downloads that do not originate from a product page in our site.

<edit> Hosted on apache/linux, not windoze </edit>




Nicole -> RE: How can I prevent hotlinking to PDF files (1/27/2006 6:02:54)

Kitka,

I don't know the answer to your questions, but have you considered searching specific unique key phrases or document titles in an effort to try and find out who it might be? Also I wondered if Copyscape could be used to see who might be plagerising any document title or key phrases?

Hope that helps.

Nicole




caz -> RE: How can I prevent hotlinking to PDF files (1/27/2006 6:38:05)

You could put password security on the individual pdfs and display that password on the html pages on your site so that the only people able to open the pdfs are those who have followed the correct route to do so.

I am assuming that you have Acrobat or similar to make pdfs. [;)]




Kitka -> RE: How can I prevent hotlinking to PDF files (1/27/2006 7:09:33)

Caz: I have Acrobat (v5), and could probably add a password, although all these pdfs have been downloaded from various manufacturers sites, so do not originate with us.

But supposing I place a password on the files and mention it on each product page, what is to stop the hotlinkers from displaying that password on their site?

The hotlinkers have found the files by coming to our site in the first place, so would have seen the password - the files themselves are not listed in reputable search engines - and most disreputable ones have been banned via .htaccess.

Nicole: I have searched Google, Yahoo and MSN for all links to our site and also even more specific terms relating to the pdfs in question - all of which turned up nothing suspicious. Yet there is a constant stream of isolated accesses, mainly from Europe, all with no referrer. Copyscape only helps if web page text has been duplicated, this problem involves only links. Many thanks for the suggestion though [:)]




caz -> RE: How can I prevent hotlinking to PDF files (1/27/2006 7:14:44)

quote:

what is to stop the hotlinkers from displaying that password on their site?


You could keep changing the passwords, but that is a lot of hassle for you I know- have you tried looking in the Adobe User to User forums for answers to this?




golfer -> RE: How can I prevent hotlinking to PDF files (1/27/2006 8:28:34)

Is it feasible to have a section on his web page which directs the visitor to a secure area but show in that section a password that needs to be typed in in order to access the information.

That way it may cause the hotlinks will be broken and the documents will only be available to his site visitors.

Hope I'm not spouting rubbish here[:)]




caz -> RE: How can I prevent hotlinking to PDF files (1/27/2006 8:40:58)

Alternatively you could do the hotlinking- to the manufacturers' sites for the pdfs [;)] But your client probably would not go for that if he wants to look like a one stop shop, as it were.

You could wrap each pdf in another pdf which says something like "Brought to you by <company name> and here is the password that you need to open the document/manual..." Y ou would link to the passworded manual from within the containing pdf. A bit like a Russian doll [:D]

I had a shufti around the Adobe forums, but didn't find anything apart from password security. But I did come acroos this site about preventing deep linking, if it's of any use to you. http://wordworx.com/




jeepless -> RE: How can I prevent hotlinking to PDF files (1/27/2006 10:22:23)

One solution I've often seen used to discourage hotlinking of any files is to rename the file in question, then subsitute a bogus file that uses the old file name. So your original file called "my.pdf" might become "onlymy.pdf", then in place of the old file add another PDF file called "my.pdf". Perhaps this new file could be a single page PDF file containing in big bold letters, "This document was stolen from XYZ Company", or it might include the actual link to the correct file. Then when the other site hotlinks to this "my.pdf" file, their visitors will get the bogus file with the stolen message or a link to the correct file. And chances are good it will take some time before the other website realizes what you did.

You could also just rename your current file so their visitors will get a broken link, and that would save the bandwidth, but it's likely the other website will catch on rather soon. Not breaking their link may very well "hide" what you did for quite a while.

It's a game of "cat-and-mouse", but it works.

Hope that helps...




caz -> RE: How can I prevent hotlinking to PDF files (1/27/2006 10:29:15)

I think that Kitka has rather a lot of files to work with, all the same that's an idea.




Kitka -> RE: How can I prevent hotlinking to PDF files (1/31/2006 8:26:07)

Many thanks for the ideas (especially jeepless')- but most are impracticable because of the large number of files in question. Until I asked my question here, I had been dealing with it by changing the name of the folder containing the pdfs - but whoever is doing it, now appears to be checking our site regularly from a bookmark (so no referrer) and adjusting their links. They seem to be located in Sweden, not that it matters much.

Many thanks for the link to that article Caz. Was very helpful in precisely describing my problem but didn't give any easy answers. Although it did explain why I haven't been able to trace the links (and hence the culprit) in the SEs - it is being done with Javascript <doh>. Links from Javascript don't send a referrer and SEs can't read them.

I'm searching Google now to see if there are any free anti-leech scripts for apache around. I found one called Hotlink Reverser, but it costs US$99. [:o] There are a number of references to using PHP to dynamically deliver links but I haven't found anything readymade yet.




sal.scozzari -> RE: How can I prevent hotlinking to PDF files (2/14/2006 17:10:12)

A little chunk of server code might do the trick.

Replace your hotlinks with link buttons, and store the PDF files somewhere inaccessible from outside. When the button is clicked, the server explicitly loads the PDF file and streams it into the response.

The files can reside on some folder outside the virtual folder hierarchy, or better yet in a database. Effectively, there is no URL that explicitly resolves to any of your docs.

Hope this helps.




Kitka -> RE: How can I prevent hotlinking to PDF files (2/14/2006 17:25:47)

Hi Sal,

Welcome to OutFront!

Your idea sounds wonderful, except I don't understand how to implement it.

quote:

store the PDF files somewhere inaccessible from outside


How do I do that?

quote:

Effectively, there is no URL that explicitly resolves to any of your docs.


If there is no URL, how do I link the button to the PDF? Could you give an example of the coding please?




sal.scozzari -> RE: How can I prevent hotlinking to PDF files (2/15/2006 11:15:01)

I'm assuming you're running ASP.NET; if you're not, then hopefully these suggestions will translate to whatever development environment you're using.

Let's say your server is hosting your web site "http://superwebsite" in the virtual folder "C:\Inetpub\wwwroot\superwebsite". You could store your docs in "C:\superwebsite\Docs". There is no way to navigate to that folder, and there is no direct URL that references that folder.

So, now no-one can get at these documents directly, but you want authentic visitors to your site to get at them. Instead of using hyperlinks (or whatever control that generates the <a href...> thingy ), place a button on your web page.

In the click event for the button, do something like this:

// Read the file into a buffer
string sf = @"C:\superwebsite\Docs\supermanual.pdf";
FileStream fs = new FileStream(sf, FileMode.Open, FileAccess.Read);
BinaryReader br = new BinaryReader(fs);

byte[] df = binReader.ReadBytes((int)fs.Length);

br.Close();
fs.Close();

// Transmit the buffer in the response
Response.Expires = 0;
Response.Buffer = true;
Response.ClearContent();
Response.ClearHeaders();
Response.ContentType = "application/pdf";
Response.BinaryWrite( df );
Response.Flush();
Response.End();

That's it. When your visitors click the button, the server responds with the requested document. Hope this helps.




Kitka -> RE: How can I prevent hotlinking to PDF files (2/15/2006 21:27:11)

Whoa, sounds like great stuff, but your suggestion is assuming knowledge that is way above my head!

quote:

I'm assuming you're running ASP.NET;


No, our sites are hosted on Linux/Apache.

All web-accessible files are stored in this folder on the remote server:

/home/username/public_html/

I don't really understand how to create a virtual folder. I have been able to create a folder called pdfs in the root directory (for our account) which is /home/username/ - is that what you meant?

However, I am now all at sea wondering how to call the pdfs contained in that directory from a button on a page. As I don't have access to ASP - what language is most appropriate?

Many thanks for your assistance with this [:)]




Ranger Bob -> RE: How can I prevent hotlinking to PDF files (6/6/2006 0:09:07)

Kitka,

I thought this stuff would be over my head but it's not. I had same problem with people hyperlinking to images, PDF's, and such directly on my website. I also run Apache so this works, solved the problem easily enough here:

http://ranger-bob.net/?p=423 { More Info }.

Basically,

In your '/{homedir}/pdf' folder -- create the following file called '.htaccess' and write it in there. Place an appropriate 'leech.gif' image in your root of website to say that off-site links are not allowed -- and a include a URL where to go find the real download section.

Here is how I set up my .htaccess file. Basically, enter the URL's of site you allow to access the content directly.


RewriteEngine On
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://ranger-bob.net/ [NC]
RewriteCond %{HTTP_REFERER} !^http://lance-taylor.net/ [NC]
RewriteCond %{HTTP_REFERER} !^http://edmontonobservers.net/ [NC]
RewriteCond %{REQUEST_URI} !^/leech.gif [NC]
RewriteRule \.(mp4|MP4|swf|SWF|avi|AVI|bmp|BMP|mp3|MP3|pdf|PDF|zip|ZIP|wmv|WMV|mov|MOV|gif|GIF|jpg|JPG)$ http://ranger-bob.net/leech.gif [R]



As a Test -- Try to click on this hyperlinked PDF document to see what happens.

http://ranger-bob.net/eog/Comet.PDF

You will get the 'leech.gif' image displayed instead. Effectively prevents directly hyper-linking to content on your website. Easy.. peasy. If you are still stumped.. jest email me.

Cheers!

Ranger Bob





Ranger Bob -> RE: How can I prevent hotlinking to PDF files (6/6/2006 0:18:24)

Oh, now if you want to find out the "WHO' is hyperlinking aspect.. that was easy with this piece of software. I'm using their 30 day demo right now. Great stuff.. likely register it soon.

http://www.weblogexpert.com

Easy install, then just point it to the '/Apache/log/access.log' file on your server. Reveals some amazing stuff it does.

In fact, it's how I found this website and your question.. just tracing back some of the URLS that have paid me a visit over time.




Kitka -> RE: How can I prevent hotlinking to PDF files (6/6/2006 0:38:21)

Many thanks Bob, but regrettably that method simply doesn't work in this instance. It is great for images, but not PDFs - because frequently, even with links from within the site hosting the file, no referrer is sent when a PDF is requested.

If you check your raw logs, you'll see that I managed to download your four page PDF with ease and no tricks employed. It is titled: "The Discovery of Comet Machholz".

And if you check your stats package, while it should show you my IP address, you will have no idea where I found the link, because there will be no referrer sent.

This was my major problem, and hence the call for assistance.

However, the good news (for me) is that I solved my problem by using Maxmind GeoIP Country lite (which is free) and PHP includes. On the manuals download page I have a PHP conditional script which calls a different include according to the country the visitor's IP belongs to. So I serve a page that links to the PDFs locally for Australian and New Zealand visitors and a different one that links to the manufacturer's PDFs on servers elsewhere - if I could find them, and I did find 99%. (Thanks Caz for the suggestion! [:)] )




Ranger Bob -> RE: How can I prevent hotlinking to PDF files (6/6/2006 0:47:25)

Hyup, you got me there. I am still trying to figure out how to block all anonymous access (i.e. proxy) to my website also.

216.232.5.247 - - [05/Jun/2006:15:41:36 -0600] "GET /eog/Comet.PDF HTTP/1.1" 200 122880 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
216.232.5.247 - - [05/Jun/2006:15:41:43 -0600] "GET /eog/Comet.PDF HTTP/1.1" 200 327680 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
216.232.5.247 - - [05/Jun/2006:15:41:48 -0600] "GET /eog/Comet.PDF HTTP/1.1" 206 212992 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
216.232.5.247 - - [05/Jun/2006:15:41:48 -0600] "GET /eog/Comet.PDF HTTP/1.1" 206 24576 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
216.232.5.247 - - [05/Jun/2006:15:41:49 -0600] "GET /eog/Comet.PDF HTTP/1.1" 206 24576 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
216.232.5.247 - - [05/Jun/2006:15:41:49 -0600] "GET /eog/Comet.PDF HTTP/1.1" 206 24576 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
216.232.5.247 - - [05/Jun/2006:15:41:50 -0600] "GET /eog/Comet.PDF HTTP/1.1" 206 32768 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"
216.232.5.247 - - [05/Jun/2006:15:42:01 -0600] "GET /eog/Comet.PDF HTTP/1.1" 206 524288 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"


If the inbound link does not have a referall.. they get in. Which is usually the hacker spambots etc. That be my own quest these days right now.

When you test the link above in this url (Anti-Leech) test you''ll see what you should get.

http://www.xentrik.net/htaccess/linktest.php





Kitka -> RE: How can I prevent hotlinking to PDF files (6/6/2006 1:07:49)

quote:

Interesting indeed.. cause I assumed this line of code was handling the 'no refferal'.

RewriteCond %{HTTP_REFERER} !^$


That line actually specifically "allows" access if there is no referrer sent. Which is 100% necessary because of the fact I mentioned above, that even people clicking on a link to a PDF from within a site, will not send a referrer. Why? I have no idea - referrers are normally sent for html, image, php files etc, but PDFs, favicons and SWF files rarely if ever have a referrer.

If you didn't have that line to allow blank referrers, you'd find you were blocking genuine visitors to your own site from downloading the protected files.

quote:

Will see what else I can dig up on the net.. sounds like we both have the same problem to solve.


The best method is as detailed by sal in message 15 above. But I was unable to find a free PHP version of it and I know next to nothing about writing my own scripts.




Ranger Bob -> RE: How can I prevent hotlinking to PDF files (6/6/2006 1:10:33)

Cool. Will check into it.... thanks! (OBTW, yer pretty quick on the reply too.) [;)]

And here I was just doing an RTFM.

http://httpd.apache.org/docs/1.3/misc/rewriteguide.html




Kitka -> RE: How can I prevent hotlinking to PDF files (6/6/2006 1:30:09)

quote:

OBTW, yer pretty quick on the reply too.


[sm=lol.gif]

Good luck in your hunt, and do let us know if you find something worthwhile that deals with the problem. [8|]




Page: [1]

Valid CSS!




Forum Software © ASPPlayground.NET Advanced Edition 2.4.5 ANSI
0.0625