| As a practicing SEO consultant I heard one doubt | | | | even full of his website. |
| from the webmasters as why should they insert the | | | | Writing robots file is an easy task. Any one can write |
| robots.txt file on the root directory of their site. Their | | | | his own robots file for their website. To write robots |
| point is they don't need to restrict spiders from any | | | | file we need to know some basic formats. Here I am |
| part of their website and why should they insert a file | | | | giving four formats or examples for writing robots. |
| which gives full permission to spiders. What they said is | | | | They are: |
| true in developers point of view as we cannot tell or | | | | 1. Block spiders from crawling your entire website |
| force Search Engine spiders to crawl my website and | | | | To disallow a spider from crawling your website, the |
| index it fully. If you really want to do that you need to | | | | format should be. |
| go for sitemap submission in Google webmaster tools , | | | | User-agent: * |
| but that is a different story. Then why should they | | | | Disallow: / |
| upload one text file which tells the spiders "you are | | | | 2. Giving access to your website |
| allowed to access my website" where it is their normal | | | | To make it reverse we should change it to either |
| duty? | | | | User-agent: * |
| The answer to the question is very simple. When | | | | Disallow: |
| spiders look for a particular page on your website | | | | Or |
| where that is not available the normal result is error | | | | User-agent: *allow: / |
| 404. Unfortunately robots.txt file is a well known name | | | | Please note that allowing a spider to your website |
| for Search Engine spiders and they will will look in to | | | | doesn't make any sense other than avoiding 404 error |
| the file to check if any barrier is set on the site for | | | | if spiders look for robots.txt file on your website. |
| them. If there is no robots.txt file created it will end to | | | | 3. Block spiders from accessing certain files on your |
| an error 404 page. The error will appear to spiders | | | | site |
| and they may report it as broken link. This broken link | | | | To block spiders from accessing certain files from |
| report may reduce the importance of your website in | | | | your website create a robots.txt file like below. |
| Search Engine's view. So to avoid this situation SEO | | | | User-agent: * |
| consultants advice their clients to upload this simple | | | | Disallow: /cgi-bin/ |
| text file on to their server. | | | | Disallow: /wusage/ |
| So what is this robots file? The robots.txt is a text file | | | | Disallow: /textures/ |
| which would be uploaded to the root directory of your | | | | 4. Block certain spiders from accessing your website |
| website where it contains a set of rules for the | | | | To block certain spider from accessing your website |
| Search Engine spiders. Robots.txt is mainly used to tell | | | | we need to write the robots.txt as: |
| the web spiders to don't crawl the following (given) | | | | User-agent: " spider name" |
| links. One thing we do mind that robots.txt files cannot | | | | Disallow: / |
| tell a spider to crawl and index the following page as | | | | Eg: |
| indexing is the normal duty of a spider. I think you got | | | | User-agent: Googlebot-Image |
| the point. So no one can force the spiders to crawl | | | | Disallow: / |
| their website as it is purely depends upon spiders. But | | | | I hope this article is enough for the readers to get a |
| one can block spiders from accessing certain part or | | | | basic idea about robots.txt file. |