| A search engine operates, in the following order: 1) | | | | pages, whereas some store every word of every |
| Crawling; 2) Deep Crawling Depth-first search (DFS); 3) | | | | page it finds, such as AltaVista. This cached page |
| Fresh Crawling Breadth-first search (BFS); 4) Indexing; | | | | always holds the actual search text since it is the one |
| 5) Searching. | | | | that was actually indexed, so it can be very useful |
| Web search engines work by storing information | | | | when the content of the current page has been |
| about a large number of web pages, which they | | | | updated and the search terms are no longer in it. This |
| retrieve from the WWW itself. These pages are | | | | problem might be considered to be a mild form of |
| retrieved by a web crawler (also known as a spider) | | | | linkrot, and Google's handling of it increases usability by |
| an automated web browser which follows every | | | | satisfying user expectations that the search terms will |
| link it sees, exclusions can be made by the use of | | | | be on the returned web page. This satisfies the |
| robots.txt. The contents of each page are then | | | | principle of least astonishment since the user normally |
| analyzed to determine how it should be indexed. Data | | | | expects the search terms to be on the returned |
| about web pages is stored in an index database for | | | | pages. Increased search relevance makes these |
| use in later queries. Some search engines, such as | | | | cached pages very useful, even beyond the fact that |
| Google, store all or part of the source page (referred | | | | they may contain data that may no longer be available |
| to as a cache) as well as information about the web | | | | elsewhere. |