Author :
Ismail, Amirah ; Sembok, Tengku Mohd T. ; Zaman, Halimah Badioze
Abstract :
The Internet has become a huge store of distributed documents. A user of the Internet, at times, seeks information which he may not know to solve a problem. He therefore has to express his information needs as a request for information in one form or another using a search engine. The search engine then tries to infer and retrieve relevant documents and presents the results in a hit list. But, the relevant documents from the hit list can only be determined by the user. The quality of hit lists very depending on the effectiveness of the indexing process which generate the surrogates from the original documents. Usually, the quality of the hit list can be measured by the precision measure, i.e. the ratio of the number of retrieved and relevant documents over the number of retrieved documents. This measure has been used to evaluate ten major search engines using ten queries at cutoff points of 10, 20, 30, 40 and 50. We have also introduced an overlap measure to determine the commonality of documents between the hit lists of various search engines. With these two measures we can evaluate the performance of the search engines. The search engines chosen for study are Altavista, Hotbot, Excite, Lycos, Webcrawler, Infoseek, Magellan, Northernlight, SavvySearch and Metacrawler
Keywords :
information retrieval system evaluation; search engines; software performance evaluation; Altavista; Excite; Hotbot; Infoseek; Internet; Lycos; Magellan; Metacrawler; Northernlight; SavvySearch; Webcrawler; cutoff points; distributed document; document-overlap; indexing process; overlap measure; precision; precision measure; retrieved documents; search engine; search engine evaluation; Indexing; Information retrieval; Information science; Internet; Search engines;