| When optimizing your web site most
| |
| | User-agent: *
|
| webmasters don't consider using the
| |
| | Disallow:
|
| robot.txt file. This is a very important
| |
| | The above would let all spiders index all
|
| file for your site. It let the spiders
| |
| | content.
|
| and crawlers know what they can and can
| |
| | Here another
|
| not index. This is helpful in keeping
| |
| | User-agent: *
|
| them out of folders that you do not want
| |
| | Disallow: /cgi-bin/
|
| index like the admin or stats folder.
| |
| | The above would block all spiders from
|
| Here is a list of variables that you can
| |
| | indexing the cgi-bin directory.
|
| include in a robot.txt file and there
| |
| | User-agent: googlebot
|
| meaning:
| |
| | Disallow:
|
| 1. User-agent: In this field you can
| |
| | User-agent: *
|
| specify a specific robot to describe
| |
| | Disallow: /admin.php
|
| access policy for or a "*" for all robots
| |
| | Disallow: /cgi-bin/
|
| more explained in example.
| |
| | Disallow: /admin/
|
| 2. Disallow: In the field you specify the
| |
| | Disallow: /stats/
|
| files and folders not to include in the
| |
| | In the above example googlebot can index
|
| crawl.
| |
| | everything while all other spiders can
|
| 3. The # is to represent comments
| |
| | not index admin.php, cgi-bin, admin, and
|
| Here are some examples of a robot.txt
| |
| | stats directory. Notice that you can
|
| file
| |
| | block single files like admin.php.
|