How robots work
 

Welcome to our robotics Archive. Have fun browsing!

 

Article #24: The Proper Way To Use The robots.txt File Update

(Browse for more articles)

 
In my last article about the file I had Here are some examples of a file
spelled it wrong. It should have been User-agent: *
instead of . The article should read like Disallow:
this: The above would let all spiders index
When optimizing your web site most all content.
webmasters don't consider using the s is Here another example
a very important file for your site. It User-agent: *
let the spiders and crawlers know what Disallow: /cgi-bin/
they can and can not index. This is The above would block all spiders from
helpful in keeping them out of folders indexing the cgi-bin directory.
that you do not want index like the admin User-agent: googlebot
or stats folder. Disallow:
Here is a list of variables that you can User-agent: *
include in a file and there meaning: Disallow:
1) User-agent: In this field you can Disallow: /cgi-bin/
specify a specific robot to describe Disallow: /admin/
access policy for or a "*" for all robots Disallow: /stats/
more explained in example. In the above example googlebot can index
2) Disallow: In the field you specify everything while all other spiders can
the files and folders not to include in not index , cgi-bin, admin, and stats
the crawl. directory. Notice that you can block
3) The # is to represent comments single files like .






1 - A - B - C - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9 - 10 - 11 - 12 - 13 - 14 - 15 - 16 - 17 - 18 - 19 - 20 - 21 - 22 - 23 - 24 - 25 - 26 - 27 - 28 - 29 - 30 - 31 - 32 - 33 - 34 - 35 - 36 - 37 - 38 - 39 - 40 - 41 - 42 - 43 - 44 - 45 - 46 - 47 - 48 - 49 - 50 - 51 - 52 - 53 - 54 - 55 - 56 -