How robots work
 

Welcome to our robotics Archive. Have fun browsing!

 

Article #26: The Proper Way To Use The robot.txt File

(Browse for more articles)

 
When optimizing your web site most User-agent: *
webmasters don't consider using the Disallow:
robot.txt file. This is a very important The above would let all spiders index all
file for your site. It let the spiders content.
and crawlers know what they can and can Here another
not index. This is helpful in keeping User-agent: *
them out of folders that you do not want Disallow: /cgi-bin/
index like the admin or stats folder. The above would block all spiders from
Here is a list of variables that you can indexing the cgi-bin directory.
include in a robot.txt file and there User-agent: googlebot
meaning: Disallow:
1. User-agent: In this field you can User-agent: *
specify a specific robot to describe Disallow: /admin.php
access policy for or a "*" for all robots Disallow: /cgi-bin/
more explained in example. Disallow: /admin/
2. Disallow: In the field you specify the Disallow: /stats/
files and folders not to include in the In the above example googlebot can index
crawl. everything while all other spiders can
3. The # is to represent comments not index admin.php, cgi-bin, admin, and
Here are some examples of a robot.txt stats directory. Notice that you can
file block single files like admin.php.






1 - A - B - C - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11 - 12 - 13 - 14 - 15 - 16 - 17 - 18 - 19 - 20 - 21 - 22 - 23 - 24 - 25 - 26 - 27 - 28 - 29 - 30 - 31 - 32 - 33 - 34 - 35 - 36 - 37 - 38 - 39 - 40 - 41 - 42 - 43 - 44 - 45 - 46 - 47 - 48 - 49 - 50 - 51 - 52 - 53 - 54 - 55 - 56 -