| Sometimes we rank well on one engine for a | | | | index. |
| particular keyphrase and assume that all | | | | |
| search engines will like our pages, and hence | | | | You have to start a new batch of code for |
| we will rank well for that keyphrase on a | | | | each engine, but if you want to list multiply |
| number of engines. Unfortunately this is | | | | disallow files you can one under another. For |
| rarely the case. All the major search engines | | | | example |
| differ somewhat, so what's get you ranked | | | | |
| high on one engine may actually help to lower | | | | User-Agent: Slurp (Inktomi's spider) |
| your ranking on another engine. | | | | |
| | | | Disallow: xyz-gg.html |
| It is for this reason that some people like | | | | |
| to optimize pages for each particular search | | | | Disallow: xyz-al.html |
| engine. Usually these pages would only be | | | | |
| slightly different but this slight difference | | | | Disallow: xxyyzz-gg.html |
| could make all the difference when it comes | | | | |
| to ranking high. | | | | Disallow: xxyyzz-al.html |
| | | | |
| However because search engine spiders crawl | | | | The above code disallows Inktomi to spider |
| through sites indexing every page it can | | | | two pages optimized for Google (gg) and two |
| find, it might come across your search engine | | | | pages optimized for AltaVista (al). If |
| specific optimizes pages and because they are | | | | Inktomi were allowed to spider these pages as |
| very similar, the spider may think you are | | | | well as the pages specifically made for |
| spamming it and will do one of two things, | | | | Inktomi, you may run the risk of being banned |
| ban your site altogether or severely punish | | | | or penalized. Hence, it's always a good idea |
| you in the form of lower rankings. | | | | to use a robots.txt file. |
| | | | |
| The solution is this case is to stop specific | | | | The robots.txt file resides on your webspace, |
| Search Engine spiders from indexing some of | | | | but where on your webspace? The root |
| your web pages. This is done using a | | | | directory! If you upload your file to |
| robots.txt file which resides on your | | | | sub-directories it will not work. If you |
| webspace. | | | | wanted to disallow all engines from indexing |
| | | | a file, you simply use the "*" character |
| A Robots.txt file is a vital part of any | | | | where the engines name would usually be. |
| webmasters battle against getting banned or | | | | However beware that the "*" character won't |
| punished by the search engines if he or she | | | | work on the Disallow line. |
| designs different pages for different search | | | | |
| engine's. | | | | Here are the names of a few of the big |
| | | | engines: |
| The robots.txt file is just a simple text | | | | |
| file as the file extension suggests. It's | | | | Excite - ArchitextSpider |
| created using a simple text editor like | | | | |
| notepad or WordPad, complicated word | | | | AltaVista - Scooter |
| processors such as Microsoft Word will only | | | | |
| corrupt the file. | | | | Lycos - Lycos_Spider_(T-Rex) |
| | | | |
| You can insert certain code in this text file | | | | Google - Googlebot |
| to make it work. This is how it can be done. | | | | |
| | | | Alltheweb - FAST-WebCrawler |
| User-Agent: (Spider Name) | | | | |
| | | | Be sure to check over the file before |
| Disallow: (File Name) | | | | uploading it, as you may have made a simple |
| | | | mistake, which could mean your pages are |
| The User-Agent is the name of the search | | | | indexed by engines you don't want to index |
| engines spider and Disallow is the name of | | | | them, or even worse none of your pages might |
| the file that you don't want that spider to | | | | be indexed. |