| ary to popular belief, the search engine spiders sent | | | | fool proof. Search engine spiders have a habit of going |
| out by the major search engines do not have to | | | | away from a page and then coming back and looking |
| search everything on a site. You can actually | | | | at the page a second time later. As a search engine |
| technically keep a search engine spider away from a | | | | crawler may keep a cached copy of this file, it may on |
| page by instructed it through a certain robots meta tag | | | | occasion crawl pages a webmaster does not wished |
| or a file not to come near the page. | | | | crawled. |
| Webmasters can instruct spiders not to crawl certain | | | | Pages that most webmasters prefer not be crawled |
| files or directories through the standard robots.txt file in | | | | include login specific pages such as shopping carts and |
| the root directory of the domain. Additionally, a page | | | | user-specific content such as search results from |
| can be explicitly excluded from a search engine's | | | | internal searches. Other pages that you might not |
| database by using a robots meta tag. If for some | | | | want crawled, depending on the content might be a |
| reason you do not want a search engine spider to | | | | guest book that you expect to be filled with spam or a |
| crawl a page you do have the means to do so. | | | | feedback system that is not very flattering to you. It is |
| When a search engine visits a site, the robots.txt | | | | also a good idea to instruct the spiders not to crawl a |
| located in the root folder is the first file crawled. The | | | | page with a lot of animation or flash on it as this can |
| robots.txt file is then parsed, and only pages not | | | | be mistakenly read by a spider as a malfunctioning site. |
| disallowed will be crawled. However this is not always | | | | |