| What is website indexability? In a nutshell, indexability | | | | not meant for visitors. This could include web pages |
| means the efficiency in how Internet search engine | | | | that have been added to the site in order to calibrate |
| robots, spiders, crawlers, worms, and/or ants are able | | | | the appearance of the pages but are not ready to be |
| to read the web pages for a website and determine | | | | shared. Pages that can't be accessed directly by a |
| their rank in the list of search results that they return to | | | | user conducting a search undermines the credibility of |
| a user. One way to do this is with the addition of a | | | | the results and will thus begin to degrade the page |
| robots.txt file. Add the file to the root of the website to | | | | ranking. |
| instruct an SEO (Search Engine Optimization) robot | | | | In order to increase website indexability, the robots.txt |
| which pages to index. This will make sure that only the | | | | file can also be used to provide instructions to specific |
| most relevant pages are indexed. | | | | SEO robots. Different algorithms are used by the |
| Not all SEO robots will read a robots.txt file. | | | | search engines to index web pages. For example, one |
| Unfortunately, most malware and malicious crawlers | | | | set of pages may need to be restricted for one |
| will blow right by it because their intent is malicious and | | | | search engine while allowing access to another. Lines |
| they don't care if they're supposed to only access | | | | can be added to the robots.txt file that lists out specific |
| certain pages or not. The main purpose of the | | | | instructions by including the actual name of the search |
| robots.txt file is to tell friendly SEO crawlers which | | | | agent such as Google, Yahoo!, and Bing. This could |
| pages they should ignore while indexing the site. This is | | | | actually be very important in determining the rank of |
| helpful in the case of an infinite domain space. An | | | | your pages for specific keywords, depending on the |
| example of an infinite domain namespace might be | | | | rules and methods used by the search agent. |
| one where users upload files into an online document | | | | The purpose of a robots.txt file is to give important |
| repository. These documents are considered to be | | | | information to the robots & crawlers on how the |
| media and not content so the webmaster should add | | | | page should be indexed. The main purpose of the file |
| a line to the robots.txt file to disallow access to the | | | | is to keep the lesser pages from being indexed to the |
| root URL for this document repository. The more | | | | more pertinent pages are. Another important function |
| results returned from a site with the same URL will | | | | is to communicate instructions to specific robots on |
| often degrade the result ranking of a page. | | | | how to proceed while indexing the website. This |
| Test pages are another example of parts of a | | | | insures that the most important pages are indexed, |
| website that should be passed over. Also any content | | | | which will hopefully increase page rankings. |