Google Bot (Full Version)

All Forums >> [Web Development] >> Server Issues



Message


TODDMAN -> Google Bot (2/22/2007 23:26:27)

I received this email from my hosting provider dreamhost.com.

“We have noticed that your site (gallery.toddman.org) is getting hit hard
by the googlebot, google uses this to search the content of the site to
list it on google.com, this is maxing out the connections on the server
and causing load to go up. In an effort to bring the load down we had to
block the googlebot IP address to stop it from accessing your site, if
you don't care about google indexing your site there is no further action
that needs to be taken, if you do want google to index your site remove
the IP address from the .htaccess file that we placed in your domain
directory. and go through Google help site here
http://www.google.com/support/webmasters/bin/topic.py?topic=8843 to fine
out how to you lower the amount of crawling that the google bot does,
this will help keep the load on the server down.”

I followed the link to:
“What can I do if Google is creating too high a load on my server?
If Google is causing excessive strain on your servers and you'd like us to slow the rate at which Googlebot crawls your site, please let us know. In your message, please include a text snippet from your most recent weblog that lists Googlebot. Also, please confirm your request by creating a forgoogl.html page on your site and sending us the URL. We will then pass your request on to our engineers. “

Only problem I can’t get into my logs, waiting on my host to give me permission.

Below is a copy of the file inserted by my host.
“.htaccess
order allow,deny
deny from 66.249.66.143
allow from all”

I moved my gallery from Infinology.net to dreamhost.com, because Infinology has a 20 Meg limit on MySQL, $0.15 per meg over the limit, in which I was exceeding weekly.

Has anybody else had this problem with their hosting provider? Banning a bot from their site?

OBTW, I’m still building the site!!!

Thanks, Toddman
www.toddman.org




Kitka -> RE: Google Bot (2/22/2007 23:53:35)

If the high load is caused by Google indexing your images - you can easily control that with a robots.txt file in your root directory and still allow the normal Googlebot to index the pages themselves.

User-agent: Googlebot-Image
Disallow: /

User-agent: * 
Disallow: /images/


To learn more about robots.txt go here:

http://www.outfront.net/tutorials_02/adv_tech/robots.htm

and here:

http://www.robotstxt.org/wc/robots.html

And for more specific Google info, look here:

http://www.google.com/support/webmasters/bin/answer.py?answer=40360&topic=8846




jaybee -> RE: Google Bot (2/23/2007 5:51:34)

Not had that problem as far as I know but I do use robots.txt to stop bots from getting at all the directories. As Kitka has noted above, I block the images directory as there are hundreds on one of my sites.




TODDMAN -> RE: Google Bot (2/23/2007 10:17:42)

Thanks Kitka & jaybee, I'll give that a try [:)]




Thomas Brunt -> RE: Google Bot (3/5/2007 8:58:13)

Not sure if I'm talking about the same thing.

I have seen a similar problem with a couple of clients, but it was not the Google spidering that was the problem. It was the Google searches that returned their images. The image downloads were eating up all the allocated bandwidth, and they were not providing the site owners with anything useful.

The robots.txt file was helpful, but the first thing we did was to change the file names of all the images and all of the pages displaying the images. This seemed to do the trick.

t




BobbyDouglas -> RE: Google Bot (3/5/2007 14:53:44)

Your website loads incredibly slow. Places like dreamhost, 1&1, ipowerweb, all have fancy looking packages for hosting, at great prices. The problem these hosts run into, is that there are just too many clients on a single server. Everything runs slower, and they have many more clients to deal with. What happens when a severe issue rises? They have 5,000 people down, many calling in and wasting the host's time. Imagine if there were only 2,500 people on the server? Although their prices are low, they still have to make up for it by loading up the servers with lots of clients.

Your host should have told you what was causing the loads, since they didn't, you are now sitting with your website while googlebot is being completely banned from the entire site. Luckily, they only banned a single IP address. Any idea how much bandwidth is actually being used from googlebot? Do you have any type of stats software on your site?




onekgguy -> RE: Google Bot (3/5/2007 16:06:32)

Yes, your site loads very slow...I gave up on it. Web space is quite cheap and good companies are out there who can handle your traffic without limiting access from the googlebots. I've never heard of that happening...consider finding a host better able to meet your demands. I would think you would want the bots indexing your site so people can find your content through a Google search.

Kevin g




TODDMAN -> RE: Google Bot (3/5/2007 16:14:08)

quote:

Your host should have told you what was causing the loads,

They did, my gallery.

quote:

now sitting with your website while googlebot is being completely banned from the entire site.

So much for being SEO friendly!

quote:

Luckily, they only banned a single IP address.

How do you know this? I removed the .htaccess file & placed a robots.txt file in its place, but that didn't help a different google bot came by & ignored the robots.txt file. Dreamhost had another cow & without telling me added another .htaccess file banning goggle. In the last 7 days Dreamhost had a magor melt down & that didn't help much. Currently looking to move again to puppy power hosting. I know the owners!

quote:

Any idea how much bandwidth is actually being used from googlebot? Do you have any type of stats software on your site?

Next to none, with everything I have going 4 domains & 4 subs, my bandwidth was .02GB out of 2TB limit. The stats software is analog, I can post it if you like.




Kitka -> RE: Google Bot (3/5/2007 16:21:50)

quote:

but that didn't help a different google bot came by & ignored the robots.txt file.


It is extremely rare (i.e. almost unknown) for a real Googlebot to ignore robots.txt. So it was either a fake Googlebot (many dodgy bots spoof the Googlebot UA) or it might be because your current robots.txt gives all bots carte blanche to take anything and everything:

User-agent: *
Disallow: 


If you want to ban all bots you need:
User-agent: *
Disallow: /




BobbyDouglas -> RE: Google Bot (3/5/2007 17:19:59)

quote:

They did, my gallery.

- There are many parts to the gallery. Most likely it was the images, but you need to know for sure, and your stats software should tell you.

quote:

How do you know this?

- It shows it here:
quote:

Below is a copy of the file inserted by my host.
“.htaccess
order allow,deny
deny from 66.249.66.143
allow from all”


The deny from 66.249.66.143 line is the Googlebot IP address.

quote:

Currently looking to move again to puppy power hosting. I know the owners!

- You've moved quite a bit already, make sure they provide a good service before you switch again. As long as you are NOT hosting a personal website, then you want to avoid these cheapo hosts. Expect to pay around $10/month for a low amount of space, from a place that offers good support.

quote:

Next to none, with everything I have going 4 domains & 4 subs, my bandwidth was .02GB out of 2TB limit.

- So Googlebot is "maxing out the connections on the server and causing load to go up" and you have only used .02gb of bandwidth? Are you sure that wasn't your bandwidth for March? Sounds very very low to cause issues with even a cheap host's server.




TODDMAN -> RE: Google Bot (3/5/2007 17:55:12)

quote:

The deny from 66.249.66.143 line is the Googlebot IP address.

My fault, I forgot I posted the 1st .htaccess file. I thought you had some way of viewing someone's .htaccess files.

The bandwidth is for Feb., in fact I started this post on 2/22. I was surprised to see it make the news letter [sm=yikes.gif]




BobbyDouglas -> RE: Google Bot (3/5/2007 18:11:33)

quote:

The bandwidth is for Feb.

- You should ask them how one IP address was able to make such an impact on the server, yet use less than .02gb of bandwidth. It would be interesting to see their response.

Websites that have issues with the Googlebot, are ones that are pushing out bandwidth in the amounts of multiple GBs.




tinaalice -> RE: Google Bot (3/5/2007 22:21:35)

I'd look at your stats also for referring sites .. and see who is hotlinking to your images .. I don't think it's just the google bot .. I have this problem with my Abstract Art site and oinks hotlinking without permission which rockets my bandwidth .. i have a high one because after three domains I decided to get a reseller account for just myself .. as i have about 20 now that was a good idea and pay $35 a month which is about £20 ... my host has different stats but one of the free ones ewstats is pretty good and gives me good references ... keeping up with contacing these people to get them to take my graphics off their servers is time consuming however...

Tina




Kitka -> RE: Google Bot (3/5/2007 22:35:42)

quote:

keeping up with contacing these people to get them to take my graphics off their servers is time consuming however


That is one good reason for hosting web sites on an Apache server. You can stymie hot-linking by adding a few lines of code to your .htaccess file and then forget about it. I don't think Windoze servers offer anything even remotely similar. [;)]




tinaalice -> RE: Google Bot (3/6/2007 10:49:57)

I can't remember which plan on my account I put the domain on win or unix .. I can flip any domain over to any other plan any time I want (thats the beauty of having a reseller account for one's own domains .. once you get over three domains it's makes finanical sense to get one... and one where you can pick and choose what features you using and which type of server you have the plan on... ) but it does not really matter does it?

I said art site ...

think about it ....

Tina





TODDMAN -> RE: Google Bot (3/6/2007 12:07:11)

Here are my stats for Feb.

Web Server Statistics for toddman.org
Analyzed requests from Thu, Feb 01 2007 at 3:36 PM to Wed, Feb 28 2007 at 8:06 PM (27.19 days).

Successful requests: 1,704 (491)
Average successful requests per day: 60 (70)
Successful requests for pages: 1,462 (364)
Average successful requests for pages per day: 52 (51)
Failed requests: 44 (5)
Redirected requests: 4 (0)
Distinct files requested: 108 (8)
Distinct hosts served: 44 (4)
Data transferred: 3.17 megabytes (1.72 megabytes)
Average data transferred per day: 116.06 kilobytes (251.33 kilobytes)

Web Server Statistics for gallery.toddman.org
Successful requests: 38,698 (9,932)
Average successful requests per day: 1,391 (1,418)
Successful requests for pages: 30,424 (8,047)
Average successful requests for pages per day: 1,094 (1,149)
Failed requests: 459 (5)
Redirected requests: 8 (1)
Distinct files requested: 15,199 (324)
Distinct hosts served: 50 (7)
Data transferred: 106.24 megabytes (53.69 megabytes)
Average data transferred per day: 3.82 megabytes (7.67 megabytes)

Web Server Statistics for sportz.toddman.org
Successful requests: 3,495 (358)
Average successful requests per day: 128 (51)
Successful requests for pages: 922 (96)
Average successful requests for pages per day: 33 (13)
Failed requests: 4 (0)
Redirected requests: 3 (0)
Distinct files requested: 446 (25)
Distinct hosts served: 48 (3)
Data transferred: 9.00 megabytes (1.13 megabytes)
Average data transferred per day: 339.01 kilobytes (165.57 kilobytes)




Page: [1]

Valid CSS!




Forum Software © ASPPlayground.NET Advanced Edition 2.4.5 ANSI
0.0625