|
Reflect -> RE: Keywords On Pages with Little Content (11/16/2005 12:46:50)
|
quote:
How do you use a redirect that will only affect a spider but not humans that you want to be able to view a page? Sorry, I should have worded that differently. On links to internal pages that I do not want spidered I point the link at a CGI script that then goes to the true page. This way then spider can not follow the link. quote:
I didn't realize that you could try to prevent SEs from seeing a page but can you be more specific (I'm a total beginner) about how to do this? Sure.... Start with a robots.txt file in the root of your web site. I make mine using notepad. You can use anything that will not add formatting as this is a bad thing for this file. This file only purpose to exist is for bots/spiders. Most bots/spiders, except rouge ones, will look for this file first. It will then see it is not allowed to look at a certain directory or page and honor it. Example of robot.txt file: User-agent: * Disallow: /_vti_cnf/ Disallow: /_vti_log/ Disallow: /_vti_pvt/ Disallow: /_vti_txt/ Disallow: /_borders/ Disallow: /_fpclcass/ Disallow: /_overlay/ Disallow: /_private/ Disallow: /cgi-bin/ Disallow: /checker/ Disallow: /css/ Disallow: /js/ Disallow: /masters/ Disallow: /misc/ Disallow: /readings/vsadmin/ Disallow: /readings/affiliate.asp Disallow: /readings/cart.asp Disallow: /readings/search.asp Disallow: /readings/sorry.asp Disallow: /readings/style.css Disallow: /readings/thanks.asp Disallow: /send/ Disallow: /tarot-card-spread/ This is my actual robbots.txt file. The first line is saying to ALL spiders/bots that it applies to them being "User-agent: *". You can actually target spider/bots by known names. Then say you have a page optimized for Yahoo but don't want the Google bot to see is you could do this: User-agent: googlebot This portion "Disallow" tells the bot it is not allowed to view it. On the flip side if you want a file or directory seen you can NOT use "Allow", this happens naturally when they are not Disallowed. Once the file is created upload it to the root of your web and that will cover implementing. Reference: http://www.robotstxt.org/wc/norobots.html Next thing I do, as a backup plan is use METAs. You can use: <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"> <meta name="robots" content="noindex,follow"> <meta name="robots" content="index,nofollow"> Depending on if you want the page not indexed, or not indexed but to follow links on the page, etc.. Reference: http://www.robotstxt.org/wc/meta-user.html Also you could tell a Google spider/bot not to follow a particular link. I, IMHO, do not use that. I see it as two fold and am waiting for this to season a little more to see what Google really will use this for. <a href=http://www.example.com/ rel="nofollow">I can't vouch for this link</a> Now there is also cloaking but I have not studied this as it is a sharp and painful sword if used wrongly. Take care, Brian
|
|
|
|