Hi,The call for robot.txt is from search engine spiders. They, should, look for this file evertime they come to your site. It will tell them what files or directories are "allowed" for spidering. Now some spiders do not pay any attention to the file and what it allows them to hit, grrrr, I guess you have to take the good with the bad. To be extra sure on pages that I do not want crawled I put a META tag for "no follow" to feed the spider to stop the page from being indexed.
You can create this file in Windows notepad or any text editor. Here is what mine looks like..
User-agent: *
Disallow: /logs
Disallow: /cgi-bin
Disallow: /css
Disallow: /download
Disallow: /fpdb
Disallow: /images
Disallow: /js
Disallow: /reference
Disallow: /searchback
Disallow: /sflib
Disallow: /ssl
Disallow: /stats
Now this is a VERY baasic robots.txt. I plan on going much further into it. You can specify spiders and then specify what the particular spider is allowed to visit.
On the 404 errors. What I did was make a sitemap. Once my latest revamp of my e-comm site is done I will call my web host and ask them to make my sitemap the 404/not found page. That way people have an open book to my site. It also, hopefully, will give spiders getting a 404 error to spider my site from all the links on the sitemap page.
Sorry for the long winded post,
Brian
------------------
Work hard, play fair, stay sane