Search Engine Spiders And Your Robots.txt File

In this article we will discuss search engine spiders andDisallow for that resource. If you wish to exclude
what they do. You will also learn how to create asome of your pages from search engine indexing, this
robots.txt file and why you might need one.is the tool approved by the search engines. Creating a
Search engine spiders are automated softwarerobots.txt file that guides spiders is simple.
programs that crawl the Web looking for pages toIf you want to allow the spiders to crawl your site but
feed to search engines. They are also called crawlers,exclude directories of your choice, copy and paste the
robots and bots. Spiders are one of the most usefulfollowing into a blank txt file:
programs on the internet. They are a key part in howUser-agent: *
the search engines operate. Spiders allow your site toDisallow: /directory1/
be found by the millions of people who use searchDisallow: /directory2/
engines. Feed the spiders right and they will tell theDisallow: /directory3/
search engines about your site.To exclude files of your choice, type in the path to the
How Spiders Workfiles you want to exclude:
A search engine is an index to the Internet, searchUser-agent: *
engines point to relevant web sites depending on yourDisallow: /directory1/page1.html
search. Search engines need a tool that is able to visitDisallow: /directory2/page2.html
websites, navigate the websites, decide what theDisallow: /directory3/page3.html
website is about and add that data to the searchTo exclude all the search engine spiders from your
engine.entire web site, copy and paste the following into the
Spiders are essentially programs that "crawl" sites andtxt file:
report back to their boss their findings. Their purpose inUser-agent: *
life is to make it easy for your site to get listed inDisallow: /
search engines.This will keep a specific search engine spider from
Spiders work by finding links to web sites, visiting thoseindexing your site:
web sites, going through the content of a web site andUser-agent: Name_of_Robot
then reporting the content of the site back to theDisallow: /
database of the search engine they work for. FromTo allow a single robot and exclude all other robots:
there, the information is added to the search engine,User-agent: Googlebot
and the site then shows up in search results.Disallow:
The robots.txt fileUser-agent: *
By defining a few rules, you can tell robots to notDisallow: /
crawl certain directories or files, within your site. WebThere can only be one robots.txt on a site, and you
sites do not absolutely have to have a robots.txt file,may not have blank lines in a record. Once you have it
they can get along just fine without one. Most spidersthe way you want, save the file as "robots" and as a
look for a robots.txt file as soon as they arrive on your.txt file. Uploading the file to the root directory of your
site. Take a look at your site statistics. If your statisticssite, that is the directory where your home page or
has a "files not found" section, you may see manyindex page is. Put the robots.txt file right alongside the
entries where spiders failed to find the file on your site.index file.
The default behavior is to allow all unless you have a