How To Keep Robots Out Of Your Web Site

THE ROBOTS.TXT FILEMETA tag to the head section of any HTML
You know that search engines have been created todocument.
help people find information quickly on the Internet, andIn example, a tag like the following tells robots not to
the search engines acquire much of their informationindex and not to follow links on a particular page:meta
through robots (also known as spiders or crawlers),name="ROBOTS" content="NOINDEX, NOFOLLOW"
that look for web pages for them.Support for the META tag among robots is not so
The spiders or crawlers robots explore the webfrequent as the Robots Exclusion Protocol, but most
looking for and recording all kinds of information. Theyof major web indexes currently support it.
usually start with URL submitted by users, or from linksNEWS POSTINGS
they find on the web sites, the sitemap files or the topIf you want to keep the search engines out of your
level of a site.news postings, you can create an an "X-no-archive"
Once the robot accesses the home page thenline in of your postings' headers:
recursively accesses all pages linked from that page.X-no-archive: yes
But the robot can also check out all the pages thatBut although common news clients allow you to add
can find on a particular server.an X-no-archive line to the headers of your news
After the robot finds a web page it works indexing thepostings, some of them don´t permit you to do so.
title, the keywords, the text, etc. But sometimes youThe problem is that most search engines assume that
might want to prevent search engines from indexingall information they find is public unless marked
some of your web pages like news postings, andotherwise.
specially marked web pages (in example: affiliate´sSo be careful because though the robot and archive
pages), but whether individual robots comply to theseexclusion standards may help keep your material out
conventions is pure voluntary.of major search engines there are some others that
ROBOTS EXCLUSION PROTOCOLrespect no such rules.
So if you want robots to keep out from some of yourIf you're highly concerned about the privacy of your
web pages, you can ask robots to ignore the webe-mail and Usenet postings, you must use some
pages that you don´t want indexed, and to do thatanonymous remailers and PGP. You can read about it
you can place a robots.txt file on the local root serverhere: www dot well dot com/user/abacard/remail.html
of your web site.www dot io dot com/~combs/htmls/crypto.html world
In example if you have a directory called e-books anddot std dot com/~franl/pgp/
you want to ask robots to keep out of it, yourEven if you are not particularly concerned about
robots.txt file should read:privacy, remember that anything you write will be
User-agent: * Disallow: e-books/indexed and archived somewhere for eternity, so use
When you don´t have enough control over yourthe robots.txt file as much as you need it.
server to set up a robots.txt file, you can try adding aWritten by Dr. Roberto A.