| When optimizing your web site most webmasters | | | | content. |
| don't consider using the robot.txt file. This | | | | |
| is a very important file for your site. It | | | | Here another |
| let the spiders and crawlers know what they | | | | |
| can and can not index. This is helpful in | | | | User-agent: * |
| keeping them out of folders that you do not | | | | |
| want index like the admin or stats folder. | | | | Disallow: /cgi-bin/ |
| | | | |
| Here is a list of variables that you can | | | | The above would block all spiders from |
| include in a robot.txt file and there | | | | indexing the cgi-bin directory. |
| meaning: | | | | |
| | | | User-agent: googlebot |
| 1. User-agent: In this field you can specify | | | | |
| a specific robot to describe access policy | | | | Disallow: |
| for or a "*" for all robots more explained in | | | | |
| example. | | | | User-agent: * |
| | | | |
| 2. Disallow: In the field you specify the | | | | Disallow: /admin.php |
| files and folders not to include in the | | | | |
| crawl. | | | | Disallow: /cgi-bin/ |
| | | | |
| 3. The # is to represent comments | | | | Disallow: /admin/ |
| | | | |
| Here are some examples of a robot.txt file | | | | Disallow: /stats/ |
| | | | |
| User-agent: * | | | | In the above example googlebot can index |
| | | | everything while all other spiders can not |
| Disallow: | | | | index admin.php, cgi-bin, admin, and stats |
| | | | directory. Notice that you can block single |
| The above would let all spiders index all | | | | files like admin.php. |