| Understanding and Using Robots.txt Files! | | | | Where you position your robots.txt file is vitally |
| We all get a bit excited when the search engines visits | | | | important. It must be in the main directory or the |
| our web site frequently and indexes our content. | | | | search engines will not find the file. The engines do not |
| However, there are certain things that we don't want | | | | search the whole site, they look in the main directory |
| the search engines to spider because of private | | | | and if they don't find the file there, the engine would |
| information that we don't want the world to see. | | | | assume that such a file does not exist. As a result, |
| Another scenario would be that we may have more | | | | then the engine would index everything they find in |
| than one version of a page on our site. We can tell the | | | | your site. Even though this file is not required by the |
| search engines in our robot.txt file which page to crawl | | | | engines, if you don't put the file in the right place the |
| and which ones to ignore. We definitely don't want | | | | search engines will likely index the entire site, including |
| both of these pages to be crawled and end up with | | | | your private information you wanted to keep |
| the search engine nailing you for spam because of the | | | | confidential. |
| duplicate content in the two similar versions of one | | | | The structure of the robots.txt file has little to no |
| page. | | | | flexibility. Learning the function and structure is pretty |
| Another reason you may want to tell the spiders not | | | | simple if you do a bit of study and learn it's function |
| to spider a page would be to save some bandwidth | | | | and purpose. There are program that are available |
| by excluding some of the images, style sheets or | | | | online that will help you in this process. By filling in a few |
| javascript. With the robot.txt file you can be very | | | | blanks and a click of the mouse you can construct a |
| specific about what you want spidered and not | | | | very effective text file that will be very specific to the |
| spidered. | | | | search engines. Don't attempt to get creative here, it |
| What does Robots.txt file really mean? The robots.txt | | | | will hurt you in the long run. |
| is a text file (not html) you put on your web site to | | | | When you start trying to manipulate these files and try |
| inform the search robots which pages of your site you | | | | to allow different engines or directories you can get |
| would like crawled and which ones you don't want the | | | | into trouble rather quickly. Make sure you type your |
| spiders to crawl. Placing a robots.txt file in your site is | | | | commands very carefully...check and double check |
| not mandated by the search engines, however, the | | | | your spelling, positioning of colons, slashes and make |
| search engines will normally follow your instructions | | | | sure the spelling of the engines is correct. Even though |
| you would put in this file. This process is similar to | | | | this file is rather simple in it's intent, making simple |
| putting a sign on your web site saying "Do Not Enter" | | | | mistakes can be devastating. This may be where it |
| on an unlocked door. This file is not a fire wall so the | | | | would be wise to use some form of a validator to |
| search engine may still spider your site. | | | | check your entries for accuracy. This author does not |
| Another way you could tell the engines which files and | | | | recommend or endorse any particular product here. |
| folders to not spider would be with the use of a robots | | | | Do your research...then decide if this type of validation |
| metatag. Some engines don't read metatags, so the | | | | system is for you. Good Luck in your marketing |
| information in the robots metatag would not be seen | | | | endeavors. |
| at all by certain engines. The preferred way to be | | | | The eBiz Solutions Team is standing by to assist you |
| specific to all the engines would be with the use of the | | | | with any questions you may have. Call for your free |
| robots.txt file....not robots metatags. | | | | 30 minute consultation today. |