| When optimizing your web site most webmasters | | | | The above would let all spiders index all content. |
| don't consider using the robot.txt file. This is a very | | | | Here another |
| important file for your site. It let the spiders and | | | | User-agent: * |
| crawlers know what they can and can not index. This | | | | Disallow: /cgi-bin/ |
| is helpful in keeping them out of folders that you do not | | | | The above would block all spiders from indexing the |
| want index like the admin or stats folder. | | | | cgi-bin directory. |
| Here is a list of variables that you can include in a | | | | User-agent: googlebot |
| robot.txt file and there meaning: | | | | Disallow: |
| 1. User-agent: In this field you can specify a specific | | | | User-agent: * |
| robot to describe access policy for or a "*" for all | | | | Disallow: /admin.php |
| robots more explained in example. | | | | Disallow: /cgi-bin/ |
| 2. Disallow: In the field you specify the files and folders | | | | Disallow: /admin/ |
| not to include in the crawl. | | | | Disallow: /stats/ |
| 3. The # is to represent comments | | | | In the above example googlebot can index everything |
| Here are some examples of a robot.txt file | | | | while all other spiders can not index admin.php, cgi-bin, |
| User-agent: * | | | | admin, and stats directory. Notice that you can block |
| Disallow: | | | | single files like admin.php. |