|
| |
|
|
womble
Posts: 6007 Joined: 3/14/2005 From: Living on the edge Status: offline
|
Google not following robots.txt? - 7/3/2008 10:02:55
Maybe I'm confused bewildered, and maybe I'm doing it wrong, but I've just been doing a search on Google using a search term that's in one of my domain names, and unsurprisingly some of the pages from that particular site came up, but then I switched to an image search, and although there's not huge amounts, some images from the site, in a folder unsurprisingly called "images" that should be blocked to search engines indexing, are coming up in the image search. The robots.txt file goes: User-agent: *
Allow: /
Disallow: /test/
Disallow: /scripts/
Disallow: /images/
Disallow: /photos/
Disallow: /styles/
Disallow: /includes/
...and a few other folders I don't want indexing. Looking at the robots.txt files on some of my other sites, they don't have the "Allow: / " line. Could it be that that's causing the problem...or do I just need to go round and give Google a good slapping?
_____________________________
~~ "A cruel god ain't no god at all" ~~ ~~ Erase hate. Practice love. ~~
|
|
|
|
rdouglass
Posts: 9280 From: Biddeford, ME USA Status: offline
|
RE: Google not following robots.txt? - 7/3/2008 10:57:15
Remove or put your Allow line at the end. I believe the parser will go down the list 'till it hits a valid rule and then stop. Hence, your Allow line as the first line will allow everything and never reach the Disallow lines. The Allow is virtually redundant (at least IMO) 'cause if it doesn't see a rule, it assumes Allow and I don't think 'Allow' is actually valid for all 'bots. Hope it helps.
< Message edited by rdouglass -- 7/3/2008 11:10:14 >
_____________________________
Don't take you're eye off your final destination. ASP Checkbox Function Tutorial.
|
|
|
|
jurgen
Posts: 424 Joined: 1/9/2007 From: Castle Rock, Colorado Status: offline
|
RE: Google not following robots.txt? - 7/3/2008 16:46:03
He is right Womble, the Allow doesn't do you any good. Below is what I use and it works. You could also specify certain bots and what they can do. quote:
User-agent: CazoodleBot Disallow: / User-agent: * Disallow: /style/ Disallow: /images/ Disallow: /gfx/
|
|
|
|
surajseo
Posts: 7 Joined: 6/23/2008 From: www.directoryurlsubmission.com Status: offline
|
RE: Google not following robots.txt? - 7/24/2008 2:01:11
quote:
ORIGINAL: womble The robots.txt file goes: User-agent: *
Allow: /
Disallow: /test/
Disallow: /scripts/
Disallow: /images/
Disallow: /photos/
Disallow: /styles/
Disallow: /includes/
You dont need to put allow command "Allow: /" only write here about disallow remaining all folder would be read by Google's bot or other SE's bot. I define only about disallow in my robot.txt and its working fine and i think its basic rules. According to your command first you permit the bot to read all ...... :) Thanks :) Thanks :)
|
|
New Messages |
No New Messages |
Hot Topic w/ New Messages |
Hot Topic w/o New Messages |
Locked w/ New Messages |
Locked w/o New Messages |
|
Post New Thread
Reply to Message
Post New Poll
Submit Vote
Delete My Own Post
Delete My Own Thread
Rate Posts
|
|
|