| Sometimes we rank well on one engine for a particular | | | | The User-Agent is the name of the search engines |
| keyphrase and assume that all search engines will like | | | | spider and Disallow is the name of the file that you |
| our pages, and hence we will rank well for that | | | | don't want that spider to index. |
| keyphrase on a number of engines. Unfortunately this | | | | You have to start a new batch of code for each |
| is rarely the case. All the major search engines differ | | | | engine, but if you want to list multiply disallow files you |
| somewhat, so what's get you ranked high on one | | | | can one under another. For example |
| engine may actually help to lower your ranking on | | | | User-Agent: Slurp (Inktomi's spider) |
| another engine. | | | | Disallow: xyz-gg.html |
| It is for this reason that some people like to optimize | | | | Disallow: xyz-al.html |
| pages for each particular search engine. Usually these | | | | Disallow: xxyyzz-gg.html |
| pages would only be slightly different but this slight | | | | Disallow: xxyyzz-al.html |
| difference could make all the difference when it | | | | The above code disallows Inktomi to spider two pages |
| comes to ranking high. | | | | optimized for Google (gg) and two pages optimized |
| However because search engine spiders crawl | | | | for AltaVista (al). If Inktomi were allowed to spider |
| through sites indexing every page it can find, it might | | | | these pages as well as the pages specifically made |
| come across your search engine specific optimizes | | | | for Inktomi, you may run the risk of being banned or |
| pages and because they are very similar, the spider | | | | penalized. Hence, it's always a good idea to use a |
| may think you are spamming it and will do one of two | | | | robots.txt file. |
| things, ban your site altogether or severely punish you | | | | The robots.txt file resides on your webspace, but |
| in the form of lower rankings. | | | | where on your webspace? The root directory! If you |
| The solution is this case is to stop specific Search | | | | upload your file to sub-directories it will not work. If you |
| Engine spiders from indexing some of your web | | | | wanted to disallow all engines from indexing a file, you |
| pages. This is done using a robots.txt file which resides | | | | simply use the "*" character where the engines name |
| on your webspace. | | | | would usually be. However beware that the "*" |
| A Robots.txt file is a vital part of any webmasters | | | | character won't work on the Disallow line. |
| battle against getting banned or punished by the | | | | Here are the names of a few of the big engines: |
| search engines if he or she designs different pages | | | | Excite - ArchitextSpider |
| for different search engine's. | | | | AltaVista - Scooter |
| The robots.txt file is just a simple text file as the file | | | | Lycos - Lycos_Spider_(T-Rex) |
| extension suggests. It's created using a simple text | | | | Google - Googlebot |
| editor like notepad or WordPad, complicated word | | | | Alltheweb - FAST-WebCrawler |
| processors such as Microsoft Word will only corrupt | | | | Be sure to check over the file before uploading it, as |
| the file. | | | | you may have made a simple mistake, which could |
| You can insert certain code in this text file to make it | | | | mean your pages are indexed by engines you don't |
| work. This is how it can be done. | | | | want to index them, or even worse none of your |
| User-Agent: (Spider Name) | | | | pages might be indexed. |
| Disallow: (File Name) | | | | |