| Optimizing robots text file is really important, if it's not | | | | is using robots.txt now let's start writing in robots.txt |
| done then search engine bots might crawl your | | | | Basic syntax of robots.txt is |
| website for confidential information and can display | | | | User-Agent: [Spider or Bot name] |
| them in their respective search engines or if you are | | | | Disallow: [Directory or File Name] |
| using robots .txt file but it's not written properly then it | | | | You can repeat these lines for blocking different |
| can be one of the scenarios that it might block search | | | | directories or giving different instructions to different |
| engine from crawling the website. | | | | spiders. Let's get in few examples that will make it |
| We can block spiders to crawl restricted parts of our | | | | more clear. |
| website. Restricted parts of our website means those | | | | Example 1) Exclude a file named private.html in private |
| links of our website which we don't want to be | | | | folder from being crawled by Google-bot. |
| indexed in search engines and getting some unwanted | | | | Solution 1) In this scenario you can write the following |
| visitors. For example:- | | | | code in robots.txt |
| How many of you would be interested in indexing your | | | | User-Agent: Googlebot |
| administration page in search engine? | | | | Disallow: /private/private.html |
| In past while I have seen some scenarios in which | | | | Example 2) Exclude a folder named private from being |
| website owners were paranoid in using robots.txt in | | | | crawled by search engine. |
| their website as they were scared that this will harm | | | | Solution 2) In this scenario you can write the following |
| SEO. Well this is not true, If we can use robots .txt | | | | code in robots.txt |
| properly then we can stop the crawler from | | | | User-Agent: * |
| particularly crawling restricted links and it will crawl | | | | Disallow: /private/ |
| every other link which we will not restrict and it will not | | | | Example 3) Instruct Search Engine bots to crawl and |
| even harm SEO of our website. We can accomplish | | | | index everything on the website. |
| this task by using robots .txt. | | | | Solution 3) In this scenario you can write the following |
| Before we can discuss optimization and can take the | | | | code in robots.txt |
| full advantage of robots .txt, we should first discuss | | | | User-Agent: * |
| the basic concepts of robots .txt | | | | Disallow: |
| A robots .txt is a text file that has to be placed in the | | | | Example 4) Instruct Search Engine bots that they |
| root folder of your web server (where you place | | | | should not crawl or index any part of the website. |
| index page of your website). You can simple create | | | | Solution 4) In this scenario you can write the following |
| this file in a notepad. It tells various search engine bots | | | | code in robots.txt |
| that which part of website should not be crawled or | | | | User-Agent: * |
| should not be indexed. By using this we can instruct | | | | Disallow: / |
| bots to prevent our website from being crawled or we | | | | Example 5) Exclude multiple folders |
| can instruct them that they should not crawl or index | | | | (private1,private2,private3) from being crawled by |
| certain areas of the website. Even we can use same | | | | search engines. |
| robots .txt to give different instructions to different | | | | Solution 5) In this scenario you can write the following |
| bots. | | | | code in robots.txt |
| Even if you don't want to protect any area of your | | | | User-Agent: * |
| website from indexing or crawling, still you should use | | | | Disallow: /private1/ |
| robots .txt as it can act as a open invitation for search | | | | Disallow: /private2/ |
| engines to crawl your complete website. | | | | Disallow: /private3/ |
| There can be several scenarios in which you might be | | | | Example 6) Instruct Google-bot to crawl everything on |
| interested in blocking Search Engine Bots from | | | | the website and instruct Alexa bot that it should not |
| crawling certain parts of your website. | | | | crawl any part of the website |
| For example | | | | Solution 6) In this scenario you can write the following |
| 1) Protecting your administration panel of your website. | | | | code in robots.txt |
| 2) Protecting your under construction pages from | | | | User-Agent: Googlebot |
| getting indexed in search engines. | | | | Disallow: |
| 3) Protecting directory that you don't want to be | | | | User-Agent: Alexa |
| indexed like cgi-bin | | | | Disallow: / |
| 4) Protecting pages that have email addresses as | | | | I am sure that after reading this article you have got |
| they can be used by spammers if got indexed in | | | | the fair idea about robots .txt and you can now use |
| search engines. | | | | robots .txt to aid SEO of your site. |
| Reasons can be various but the solution is same, that | | | | |