| Do you need a Robots.txt file? When you have a | | | | To create a simple robots.txt file to allow all robots to |
| small site, you are probably under the false assumption | | | | spider your site you can create the following info: |
| that you really don't need a robots.txt file. In fact, you | | | | User-agent: * |
| may be saying to yourself, "I don't need a robots.txt file | | | | Disallow: |
| because, my site is, small, it's simple for the search | | | | That's it. This will allow all robots to index all your |
| engines to find, and since I want all pages indexed | | | | pages. |
| anyway, why bother." That was my thoughts in the | | | | If you don't want a specific robot to have access to |
| beginning, as well as, not being aware of what a | | | | any of your pages, you can do the following: |
| robots.txt file is/was or what it could do for my site. | | | | User-agent: specificbadbot |
| Thus, I'll try to give you a little insight as to what a | | | | Disallow: / |
| robots.txt is, how to use them, why you need them | | | | Here you would have to name the robot or specific |
| and some basic instructions on creating a robots.txt file. | | | | substring. And you will need the "/" because that |
| Define Robot.txt File | | | | means "all directories". |
| To begin we need to know what a web robot is, and | | | | For example, let say you do not want the Googlebot |
| is not. Thus, a Web robot is sometimes called spiders | | | | to index a page called "donotenter: and your directory |
| or web crawlers. These should not be confused with | | | | is "nogoprivate". In the disallow section you would put: |
| your normal web browser, for a web browser is not a | | | | User-agent: Googlebot |
| web robot because a human being manually | | | | Disallow: /nogoprivate/donotenter.html |
| maneuvers it. | | | | Now if it's a complete directory you do not want |
| The main use of a robots.txt file is to give robots | | | | indexed you would put: |
| instructions to what they can crawl and what they | | | | User-agent: Googlebot |
| should not crawl. This gives you a little more control | | | | Disallow: /nogoprivate/ |
| over the robots. And since this gives you a little more | | | | By putting the forward slashing at the beginning and at |
| control over the robots, which means you can issue | | | | the end, you tell the search engine not to include any |
| indexing instructions to specific search engines. | | | | of the directories. |
| Do you really need a Robots.txt file? | | | | Getting Your Code Right |
| Do you really need a robots.txt even if you're not | | | | If your Robots.txt file is a more complex piece of |
| excluding any robots? It's a good idea. Why? First and | | | | code, than it's always wise to do a quick check on the |
| foremost, it's an invite to the search engines. In addition, | | | | syntax. There are some nice online Robots.txt checks |
| some of the good bots may step away from your | | | | that are free, that you can use to check your syntax. |
| website if you do not have a robots.txt created in the | | | | One such free checker is called Robots Text Tester |
| top level of your website. | | | | which is free to use through Search Engine Promotion |
| Sometimes you may want to exclude some pages | | | | ( or go to ClockWatchers ( and they can help you |
| from the search engine's eye. What type of pages? | | | | create a robots.txt file, as well as, give you info how to |
| 1. Pages that are still under construction | | | | create a file to eliminate bad bots. |
| 2. Directories that you would prefer not to have | | | | To conclude, a Robots.txt file can help you to increase |
| indexed | | | | the number of search engines that spider your site, |
| 3. Or you may want to exclude those search engines | | | | which means increased traffic and better indexing. In |
| whose sole purpose is to collectemail addresses or | | | | fact, this small file also helps you to control what is and |
| who you do not what to have your website appear in. | | | | is not indexed by search engines. and which search |
| What does a Robots.txt file look like? | | | | engines can spider your site. So, let me ask you now- |
| The robots.txt file is a simple text file, which can be | | | | is a robots.txt file an important asset to have for your |
| created in Notepad. It needs to be saved to the root | | | | website? I'm sure you have to admit, that yes it is |
| directory of your site-that is the directory where your | | | | important, even for the small website. |
| home page or index page is located. | | | | |