What is a Robots.txt File?

Search engines look at millions of Web pages to comeThis allows all spiders to spider all pages on your site.
up with search results. They do this with what we callThe * is a wildcard that means "all spiders."
"search engine spiders." This makes sense - spidersUser-agent: *
crawling around on the Web. But another word forDisallow:
them is "robots" because they are simply unmannedThis is the opposite of the above example. This one
programs gathering data automatically.tells all spiders to NOT spider your whole site. You
In the beginning, these robots spidered every page,might want this if you have a test site, for example,
every file, attached to the Web. This caused problemsthat is not live yet.
for both the search engines and the people using them.User-agent: *
Pages that really weren't worth looking at, such as,Disallow: /
say, header files to be included in all pages on a site,This example tells all robots to stay out of the cgi-bin
were being spidered and showed up in search results.and images folders.
Have you ever searched on Google and gotten aUser-agent: *
partial page as a result?Disallow: /cgi-bin/
The solution was for Google and other search enginesDisallow: /images/
to begin looking for a robots.txt file in the root folder ofThis example tells only the WebFerret robot to not
each site (http:// www. mydomain. com/ robots.txt) tospider the page ferret.htm. It's only an example. I have
determine what should and shouldn't be searched. Thisnothing against WebFerret. The user agent code for
is named, "The Robots Exclusion Standard." ThisGoogle is googlebot.
simple text file, created with Notepad or other simpleUser-agent: WebFerret
text editor, gives you complete control by telling theDisallow: ferret.htm
robots not to spider certain folders in your site. TheIt is important that the file is a simple text file - do not
result is happier visitors who come to your site fromuse Microsoft Word to create it. And be careful of
search engines and get only full pages that you wanthow you type - it must look exactly like the above
them to see, not partial, test or script pages you don'texamples, with caps only for the first letter, just the
want them to see.right spacing, etc. A poorly done robots.txt file could
Let's look at some examples to get started:harm your site more than help it.