| Thousands of servers ...billions of web pages.... the | | | | or impossible to use a keyword search, especially if |
| possibility of individually sifting through the WWW is null. | | | | the vocabulary of the subject is unfamiliar. Similarly, the |
| The search engine gods cull the information you need | | | | concept based search of Excite (instead of individual |
| from the Internet...from tracking down an elusive expert | | | | words, the words that you enter into a search are |
| for communication to presenting the most | | | | grouped and attempted to determine the meaning) is a |
| unconventional views on the planet. Name it and click it. | | | | difficult task and yields inconsistent results. |
| Beyond all the hype created about the web heavens | | | | |
| they rule, let's attempt to keep the argument balanced. | | | | |
| From Google to Voice of the Shuttle (for humanities | | | | |
| research) these ubiquitous gods that enrich the net, | | | | Besides who reviews or evaluates these sites for |
| can be unfair ...and do wear pitfalls. And considering the | | | | quality or authority? They are simply compiled by a |
| rate at which the Internet continues to grow, the | | | | computer program. These active search engines rely |
| problems of these gods are only exacerbated further. | | | | on computerized retrieval mechanisms called "spiders", |
| | | | "crawlers", or "robots", to visit Web sites, on a regular |
| | | | basis and retrieve relevant keywords to index and |
| | | | store in a searchable database. And from this huge |
| Primarily, what you need to digest is the fact that | | | | database yields often unmanageable and |
| search engines fall short of Mandrake's magic | | | | comprehensive results....results whose relevance is |
| mechanism! They simply don't create URLs out of thin | | | | determined by their computers. The irrelevant sites |
| air but instead send their spiders crawling across those | | | | (high percentage of noise, as it's called), questionable |
| sites that have rendered prayers (and expensive | | | | ranking mechanisms and poor quality control may be |
| offerings!) to them for consideration. Even when sites | | | | the result of less human involvement to weed out junk. |
| like Google claim to have a massive 3 billion web | | | | Thought human intervention would solve all |
| pages in its database, a large portion of the web | | | | probes....read on. |
| nation is invisible to these spiders. To think they are | | | | |
| simply ignorant of the Invisible Web. This invisible web | | | | |
| holds that content, normal search engines can't index | | | | |
| because the information on many web sites is in | | | | From the very first search engine - Yahoo to |
| databases that are only searchable within that site. | | | | about.com, Snap.com, Magellan, NetGuide, Go Network, |
| Sites like - The Internet Movie Database , - IncyWincy, | | | | LookSmart, NBCi and Starting Point, all subject |
| the invisible web search engine and - The Complete | | | | directories index and review documents under |
| Planet that cover this area are perhaps the only way | | | | categories - making them more manageable. Unlike |
| you can access content from that portion of the | | | | active search engines, these passive or |
| Internet, invisible to the search gods. Here, you don't | | | | human-selected search engines like don't roam the |
| perform a direct content search but search for the | | | | web directly and are human controlled, relying on |
| resources that may access the content. (Meaning - be | | | | individual submissions. Perhaps the easiest to use in |
| sure to set aside considerable time for digging.) | | | | town, but the indexing structure these search engines |
| | | | cover only a small portion of the actual number of |
| | | | WWW sites and thus is certainly not your bet if you |
| | | | intend specific, narrow or complex topics. Subject |
| None of the search engines indexes everything on the | | | | designations may be arbitrary, confusing or wrong. A |
| Web (I mean none). Tried research literature on | | | | search looks for matches only in the descriptions |
| popular search engines? AltaVista to Yahoo, will list | | | | submitted. Never contains full text of the web they link |
| thousands of sources on education, human resource | | | | to - you can only search what you see titles, |
| development, etc. etc. but mostly from magazines, | | | | descriptions, subject categories, etc. Human-labor |
| newspapers, and various organizations' own Web | | | | intensive process limits database currency, size, rate of |
| pages, rather than from research journals and | | | | growth and timeliness. You may have to branch |
| dissertations- the main sources of research literature. | | | | through the categories repeatedly before arriving at |
| That's because most of the journals and dissertations | | | | the right page. They may be several months behind |
| are not yet available publicly on the Web. Thought | | | | the times because of the need for human organization. |
| they'll get you all that's hosted on the web? Think | | | | Try looking for some obscure topic....chances for the |
| again. | | | | people that maintain the directory to have excluded |
| | | | those pages. Obviously, machines can blindly count |
| | | | keywords but they can't make common-sense |
| | | | judgement as humans can. But then why does |
| The Web is huge and growing exponentially. Simple | | | | human-edited directories respond with all this junk?! |
| searches, using a single word or phrase, will often yield | | | | |
| thousands of "hits", most of which will be irrelevant. A | | | | |
| layman going in for a piece of info to the internet has | | | | |
| to deal with a more severe issue - too much | | | | And here's about those meta search engines. A |
| information! And if you don't learn how to control the | | | | comprehensive search on the entire WWW using The |
| information overload from these websites, returned by | | | | Big Hub, Dogpile, Highway61, Internet Sleuth or |
| a search result, roll out the red carpet for some | | | | Savvysearch , covering as many documents as |
| frustration. A very common problem results from sites | | | | possible may sound as good an idea as a one stop |
| that have a lot of pages with similar content. For e.g., if | | | | shopping.Meta search engines do not create their own |
| a discussion thread (in a forum) goes on for a hundred | | | | databases. They rely on existing active and passive |
| posts there will be a hundred pages all with similar titles, | | | | search engine indexes to retrieve search results. And |
| each containing a wee bit of information. Now instead | | | | the very fact that they access multiple keyword |
| of just one link, all hundred of those darn pages will | | | | indexes reduces their response time. It sure does save |
| crop up your search result, crowding out other relevant | | | | your time by searching several search engines at |
| site. Regardless of all the sophistication technology has | | | | once but at the expense of redundant, unwanted and |
| brought in, many well thought-out search phrases | | | | overwhelming results....much more - important misses. |
| produce list after list of irrelevant web pages. The | | | | The default search mode differs from search site to |
| typical search still requires sifting through dirt to find the | | | | search site, so the same search is not always |
| gold. If you are not specific enough, you may get too | | | | appropriate in different search engine software. The |
| many irrelevant hits. | | | | quality and size of the databases vary widely. |
| | | | |
| | | | |
| | | | |
| As said, these search engines do not actually search | | | | Weighted Search Engines like Ask Jeeves and |
| the web directly but their centralized server instead. | | | | RagingSearch allows the user to type queries in plain |
| And unless this database is updated continually to | | | | English without advanced searching knowledge, again |
| index modified, moved, deleted or renamed | | | | at the expense of inaccurate and undetailed searching. |
| documents, you will land yourself amidst broken links | | | | Review or Ranking Sources like Argus Clearinghouse ( |
| and stale copies of web pages. So if they | | | | (eblast.com) and Librarian's Index to the Internet (lii.org). |
| inadequately handle dynamic web pages whose | | | | They evaluate website quality from sources they find |
| content changes frequently, chances are for the | | | | or accept submissions from but cover a minimal |
| information they reference to quickly go out-of-date. | | | | number of sites. |
| After they wage their never ending war with | | | | |
| over-zealous promoters (spamdexers rather), where | | | | |
| do they have time to keep their databases current | | | | |
| and their search algorithms tuned? No surprise if a | | | | As a webmaster, your site registration with the biggest |
| perfectly worthwhile site may go unlisted! | | | | billboards in Times Square can get you closer to bingo! |
| | | | for the searcher. Those who didn't even know you |
| | | | existed before are in your living room in New York |
| | | | time! |
| Similarly, many of the Web search engines are | | | | |
| undergoing rapid development and are not well | | | | |
| documented. You will have only an approximate idea | | | | |
| of how they are working, and unknown shortcomings | | | | Your URL registration is a no-brainer, considering the |
| may cause them to miss desired information. Not to | | | | generation of flocking traffic to your site. Certainly a |
| mention, amongst the first class information, the web | | | | quick and inexpensive method, yet is only a |
| also houses false, misleading, deceptive and dressed | | | | component of the overall marketing strategy that in |
| up information actually produced by charlatans. The | | | | itself offers no guarantees, no instant results and |
| Web itself is unstable and tomorrow they may not find | | | | demands continued effort for the webmaster. |
| you the site they found you today. Well if you could | | | | Commerce rules the web. Like how a notable Internet |
| predict them, they would not be god!...would they?! The | | | | caveman put it, "Web publishers also find dealing with |
| syntax (word order and punctuation) for various types | | | | search engines to be a frustrating pursuit. Everybody |
| of complex searches varies some from search engine | | | | wants their pages to be easy for the world to find, but |
| to search engine, and small errors in the syntax can | | | | getting your site listed can be tough. Search sites may |
| seriously compromise the search. For instance, try the | | | | take a long time to list your site, may never list it at all, |
| same phrase search on different search engines and | | | | and may drop it after a few months for no reason. If |
| you'll know what I mean. Novices... read this line - using | | | | you resubmit often, as it is very tempting to do, you |
| search engines does involve a learning curve. Many | | | | may even be branded a spamdexer and barred from |
| beginning Internet users, because of these | | | | a search site. And as for trying to get a good ranking, |
| disadvantages, become discouraged and frustrated. | | | | forget it! You have to keep up with all the arcane and |
| Like a journalist put it, "Not showing favoritism to its | | | | ever-changing rules of a dozen different search |
| business clients is certainly a rare virtue in these times." | | | | engines, and adjust the keywords on your pages just |
| Search engines have increasingly turned to two | | | | so...all the while fighting against the very plausible theory |
| significant revenue streams. Paid placement: In addition | | | | that in fact none of this stuff matters, and the search |
| to the main editorial-driven search results, the search | | | | sites assign rankings at random or by whim. |
| engines display a second - and sometimes third - listing | | | | |
| that's usually commercial in nature. The more you pay, | | | | |
| the higher you'll appear in the search results. Paid | | | | |
| inclusion: An advertiser or content partner pays the | | | | "To make the best use of Web search engines--to |
| search engine to crawl its site and include the results in | | | | find what you need and avoid an avalanche of |
| the main editorial listing. So?...more likely to be in the hit | | | | irrelevant hits-- pick search engines that are well suited |
| list but then again - no guarantees. Of course those | | | | to your needs. And lest you'd want to cry "Ye |
| refusing to favor certain devotees are industry leaders | | | | immortal gods! where in the world are we?", spend a |
| like Google that publishes paid listings, but clearly marks | | | | few hours becoming moderately proficient with each. |
| them as 'Sponsored Links.' | | | | Each works somewhat differently, most importantly in |
| | | | respect to how you broaden or narrow a search. |
| | | | |
| | | | |
| The possibility of these 'for-profit' search gods (which | | | | |
| haven't yet made much profit) for taking fees to skew | | | | Finding the appropriate search engine for your |
| their searches, can't be ruled out. But as a searcher, | | | | particular information need, can be frustrating. To |
| the hit list you are provided with by the engine should | | | | effectively use these search engines, it is important to |
| obviously rank in the order of relevancy and interest. | | | | understand what they are, how they work, and how |
| Search command languages can often be complex | | | | they differ. For e.g. while using a meta search engine, |
| and confusing and the ranking algorithm is unique to | | | | remember that each engine has its own methods of |
| each god based on the number of occurrences of the | | | | displaying and ranking results. Remember, search |
| search phrase in a page, if it appears in the page title, | | | | strategies affect the results. If the user is unaware of |
| or in a heading, or the URL itself, or the meta tag etc. | | | | basic search strategies, results may be spotty. |
| or on a weighted average of a number of these | | | | |
| relevance scores. E.g. Google ( uses its patented | | | | |
| PageRank TM and ranks the importance of search | | | | |
| results by examining the links that lead to a specific | | | | Quoting Charlie Morris (the former editor of The Web |
| site. The more links that lead to a site, the higher the | | | | developer's journal) - "Search engines and directories |
| site is ranked. Pop on popularity! | | | | survive, and indeed flourish, because they're all we've |
| | | | got. If you want to use the wealth of information that is |
| | | | the Web, you've got to be able to find what you want, |
| | | | and search engines and directories are the only way |
| Alta Vista, HotBot, Lycos, Infoseek and MSN Search | | | | to do that. Getting good search results is a matter of |
| use keyword indexes - fast access to millions of | | | | chance. Depending on what you're searching for, you |
| documents. The lack of an index structure and poor | | | | may get a meaty list of good resources, or you may |
| accuracy of the size of the WWW, will not make | | | | get page after page of irrelevant drivel. By laboriously |
| searching any easier. Large number of sites indexed. | | | | refining your search, and using several different search |
| Keyword searching can be difficult to get right.In reality, | | | | engines and directories (and especially by using |
| however, the prevalence of a certain keyword is not | | | | appropriate specialty directories), you can usually find |
| always in proportion to the relevance of a page. Take | | | | what you need in the end." |
| this example. A search on sari - the national costume | | | | |
| of India -in a popular search engine, returned among it's | | | | |
| top sites, the following links: | | | | |
| | | | Search engines are very useful, no doubt. Right from |
| ? of the Scottish Crop research Institute | | | | getting a quick view of a topic to finding expert |
| | | | contact info...verily certain issues lie in their lap. Now the |
| ? -a health resort in Indonesia | | | | very reason we bother about these search engines |
| | | | so much is because they're all we've got! Though |
| ? - The South Asia Regional Initiative for Energy | | | | there sure is a lot of room for improvement, the hour's |
| Cooperation and Development | | | | need is to not get caught in the middle of the road. By |
| | | | simply understanding what, how and where to seek, |
| | | | you'd spare yourself the fate of chanting that old |
| | | | Jewish proverb "If God lived on earth, people would |
| Pretty useful sites for someone very much interested | | | | break his windows." |
| in knowing how to drape or the tradition of the sari?! | | | | |
| (Well, no prayer goes unanswered...whether you like | | | | |
| the answer or not!) By using keywords to determine | | | | |
| how each page will be ranked in search results and | | | | Happy searching!Liji is a PostGraduate in Software |
| not simply counting the number of instances of a word | | | | Science, with a flair for writing on anything under the |
| on a page, search engines are attempting to make the | | | | sun. She puts her dexterity to work, writing technical |
| rankings better by assigning more weight to things like | | | | articles in her areas of interest which include Internet |
| titles, subheadings, and so on.Now, unless you have a | | | | programming, web design and development, |
| clear idea of what you're looking for, it may be difficult | | | | ecommerce and other related issues. |