Google not following robots.txt? (Full Version)

All Forums >> [Web Development] >> Search Engine Optimization and Web Business



Message


womble -> Google not following robots.txt? (7/3/2008 10:02:55)

Maybe I'm confused bewildered, and maybe I'm doing it wrong, but I've just been doing a search on Google using a search term that's in one of my domain names, and unsurprisingly some of the pages from that particular site came up, but then I switched to an image search, and although there's not huge amounts, some images from the site, in a folder unsurprisingly called "images" that should be blocked to search engines indexing, are coming up in the image search.

The robots.txt file goes:

User-agent: *

Allow: /
Disallow: /test/
Disallow: /scripts/
Disallow: /images/
Disallow: /photos/
Disallow: /styles/
Disallow: /includes/


...and a few other folders I don't want indexing.

Looking at the robots.txt files on some of my other sites, they don't have the "Allow: / " line. Could it be that that's causing the problem...or do I just need to go round and give Google a good slapping? [:D]




rdouglass -> RE: Google not following robots.txt? (7/3/2008 10:57:15)

Remove or put your Allow line at the end. I believe the parser will go down the list 'till it hits a valid rule and then stop. Hence, your Allow line as the first line will allow everything and never reach the Disallow lines.

The Allow is virtually redundant (at least IMO) 'cause if it doesn't see a rule, it assumes Allow and I don't think 'Allow' is actually valid for all 'bots.

Hope it helps.




jurgen -> RE: Google not following robots.txt? (7/3/2008 16:46:03)

He is right Womble, the Allow doesn't do you any good. Below is what I use and it works. You could also specify certain bots and what they can do.

quote:

User-agent: CazoodleBot
Disallow: /

User-agent: *
Disallow: /style/
Disallow: /images/
Disallow: /gfx/




surajseo -> RE: Google not following robots.txt? (7/24/2008 2:01:11)


quote:

ORIGINAL: womble

The robots.txt file goes:

User-agent: *

Allow: /
Disallow: /test/
Disallow: /scripts/
Disallow: /images/
Disallow: /photos/
Disallow: /styles/
Disallow: /includes/




You dont need to put allow command "Allow: /" only write here about disallow remaining all folder would be read by Google's bot or other SE's bot.

I define only about disallow in my robot.txt and its working fine and i think its basic rules.

According to your command first you permit the bot to read all ...... :)

Thanks :)
Thanks :)




womble -> RE: Google not following robots.txt? (7/24/2008 8:49:37)

OMG! There's an echo in here! [:D] Thanks guys! [img]http://ecanus.net/smileys/coolup-yellow.gif[/img]

Now all I need to do is remember which site this particular post was about so I can remove it! [:D]




Page: [1]

Valid CSS!




Forum Software © ASPPlayground.NET Advanced Edition 2.4.5 ANSI
0.03125