| When optimizing your web site most webmasters | | | | content. |
| don't consider using the robot.txt file. This | | | | |
| is a very important file for your site. It | | | | Here another |
| let the spiders and crawlers know what they | | | | |
| can and can not index. This is helpful in | | | | User-agent: * |
| keeping them out of folders that you do not | | | | |
| want index like the admin or stats folder or | | | | Disallow: /cgi-bin/ |
| content that they can not index. | | | | |
| | | | The above would block all spiders from |
| Here is a list of variables that you can | | | | indexing the cgi-bin directory. |
| include in a robot.txt file and there | | | | |
| meaning: | | | | User-agent: googlebot |
| | | | |
| 1)User-agent: In this field you can specify a | | | | Disallow: |
| specific robot to describe access policy for | | | | |
| or a "*" for all robots more explained in | | | | User-agent: * |
| example. | | | | |
| | | | Disallow: /admin.php |
| 2)Disallow: In the field you specify the | | | | |
| files and folders not to include in the | | | | Disallow: /cgi-bin/ |
| crawl. | | | | |
| | | | Disallow: /admin/ |
| 3)# the number sign represents comments | | | | |
| | | | Disallow: /stats/ |
| Here are some examples of a robot.txt file | | | | |
| for User-agent: * | | | | In the above example googlebot can index |
| | | | everything while all other spiders can not |
| Disallow: | | | | index admin.php, cgi-bin, admin, and stats |
| | | | directory. Notice that you can block single |
| The above would let all spiders index all | | | | files like admin.php. |