| About robots.txt files and search engines. | | | | You have a client file, client1.html, in a directory called |
| All websites are searched. When and where your site | | | | 'clients' that you do not want to be indexed by Google. |
| is searched is controlled by a simple text file called the | | | | The spider from Google is called 'Googlebot'. This is |
| robots.txt. The name is a web-standard and you can | | | | the exact code for a robots.txt file: |
| make a simple file in any text editor. A robot.txt file is | | | | User-Agent: Googlebot |
| not a security file. It may stop your specified pages | | | | Disallow: /client/client1.html |
| from appearing in search engines, but it will not make | | | | The Altavista bot is called Scooter. |
| them unavailable. There are thousands of robots and | | | | User-Agent: Scooter |
| spiders searching and indexing the Internet, most | | | | Disallow: / |
| programmers respect your robot.txt file, others are | | | | The Lycos spider is called T-Rex): |
| designed specifically to visit the very pages you are | | | | User-Agent: T-Rex |
| trying to keep from search engines. | | | | Disallow: /content1.html |
| To allow all robots complete access to your server | | | | And you can exclude any single html or document |
| User-agent: * | | | | page |
| Disallow: | | | | User-agent: * |
| To exclude all robots from the entire server (place file | | | | Disallow: /~client/abc/website1.html |
| with your index.html file) | | | | Disallow: /~client/abc/website2.html |
| User-agent: * | | | | Disallow: /~client/abc/website3.html |
| Disallow: / | | | | Save your file as a simple.txt file and place at root |
| Exclude a file from an individual Search Engine | | | | level in your hosting folder. |