Website Error Files (Full Version)

All Forums >> [Web Development] >> Microsoft FrontPage Help



Message


Ed Mazur -> Website Error Files (7/24/2001 21:12:00)

In my website statistics reports that I receive about my website, there is a section listing files that cause errors on my site, mainly because they are requested but do not exist.

I consistently get errors for a file named "/robots.txt". I have no clue what that is but I am guessing it has something to do with search engine spiders. Does anyone know the correct answer?

This was a new one for yesterday, the file was "/test404response-739366982.html". I really have no clue what that was! Any ideas?

Almost everyday, pages that were once apart of my site but are no longer in it, try to get loaded. I have checked my entire site for errors or pages that could lead to these files but there are NONE! How is it possible that people try to load them when it is not linked at all, anywhere! And I am pretty sure that they aren't bookmarked because about 15 of them are attempted to load everyday and my "favicon.ico" file only gets loaded five times/month. Could someone tell me what is going on here?

Lastly, I work on and publish my site using my laptop and when I view it live on the internet after publishing, using my laptop still, I see what it should be, with what I have changed. However when I go to my desktop and look at it, it shows me the old version, WITHOUT the updates. Seems pretty strange to me, does anyone know why this happens?

Thanks for any help.

------------------
Ed Mazur
racquetballx.com





GeorgiaR -> RE: Website Error Files (7/25/2001 20:51:00)

Your last problem first - When you load a changed file/page up to Internet, you see whatever you've changed it to be. And that's as it should be. Back on your computer, you have the original, but when you look at it in your browser, you see the old file before the changes. Look at the URL in the "file location" box, and if the name is the same as the file, then click on the "Reload" or "Refresh" button, depending on which browser you are in. If the name extension is different from what it ought to be, then you must do a "File/Open/" and load the desired file in again.

As to files reported there but are not, could be caused by several things. 1- somewhere a navigation link still refers to the deleted file. 2- Did you delete the file while in FrontPage, or out of it. If it was the later, then FrtPg still has references to the deleted file(s). You must delete files or images within FrontPage itself in order for FrontPage to completely eliminate all links AND pieces of that file or image scattered around the FrontPage system. Remember the _derived and other numerous folders and sub-folders holding parts of files which only FrontPage seems to have a purpose for? When you delete a page outside of FP then those scraps remain in the systme, wanting acknowlegement like a lost child.

If you have a reference to a file anywhere, it helps to have a page to preserve the links - and have the page with a short notice such as "This Products page is under construction." Then when you're ready to put that page into circulation, delete the construction page, and put the new page into the site folder. Recalculate the links while in the Navigation or Folders view and everything should be linked and okay. Hope that helps.

------------------
Georgia





Reflect -> RE: Website Error Files (7/25/2001 20:46:00)

Hi,

The call for robot.txt is from search engine spiders. They, should, look for this file evertime they come to your site. It will tell them what files or directories are "allowed" for spidering. Now some spiders do not pay any attention to the file and what it allows them to hit, grrrr, I guess you have to take the good with the bad. To be extra sure on pages that I do not want crawled I put a META tag for "no follow" to feed the spider to stop the page from being indexed.

You can create this file in Windows notepad or any text editor. Here is what mine looks like..

User-agent: *
Disallow: /logs
Disallow: /cgi-bin
Disallow: /css
Disallow: /download
Disallow: /fpdb
Disallow: /images
Disallow: /js
Disallow: /reference
Disallow: /searchback
Disallow: /sflib
Disallow: /ssl
Disallow: /stats

Now this is a VERY baasic robots.txt. I plan on going much further into it. You can specify spiders and then specify what the particular spider is allowed to visit.

On the 404 errors. What I did was make a sitemap. Once my latest revamp of my e-comm site is done I will call my web host and ask them to make my sitemap the 404/not found page. That way people have an open book to my site. It also, hopefully, will give spiders getting a 404 error to spider my site from all the links on the sitemap page.

Sorry for the long winded post,

Brian

------------------
Work hard, play fair, stay sane





Page: [1]

Valid CSS!




Forum Software © ASPPlayground.NET Advanced Edition 2.4.5 ANSI
0.0625