• DocumentCode
    1407995
  • Title

    Anchor point indexing in Web document retrieval

  • Author

    Kao, Ben ; Lee, Joseph ; Ng, Chi-Yuen ; Cheung, David

  • Author_Institution
    Dept. of Comput. Sci. & Inf. Syst., Hong Kong Univ., China
  • Volume
    30
  • Issue
    3
  • fYear
    2000
  • fDate
    8/1/2000 12:00:00 AM
  • Firstpage
    364
  • Lastpage
    373
  • Abstract
    Traditional World Wide Web search engines, such as AltaVista.com, index and recommend individual Web pages to assist users in locating relevant documents. As the Web grows, however, the number of matching pages increases at a tremendous rate. Users are often overwhelmed by the large answer set recommended by the search engines. Also, if a matching document is a hypertext, the document structure is destroyed and the individual pages that compose the document are returned instead. The logical starting point of the hyperdocument is thus hidden among the large basket of matching pages. Users need to spend a lot of effort browsing through the pages to locate the starting point, a very time consuming process. This paper studies the anchor point indexing problem. The set of anchor points of a given user query is a small set of key pages from which the larger set of documents that are relevant to the query can be easily reached. The use of anchor points helps solve the problems of huge answer set and low precision suffered by most search engines by considering the hyperlink structures of the relevant documents, and by providing a summary view of the result set.
  • Keywords
    Internet; hypermedia; indexing; information resources; information retrieval; search engines; AltaVista; Web document retrieval; World Wide Web search engines; anchor point indexing; hyperdocument; hyperlink structures; hypertext; matching document; result set; search engines; summary view; user query; Computer networks; Helium; IP networks; Indexing; Information retrieval; Internet; Search engines; Web pages; Web sites; World Wide Web;
  • fLanguage
    English
  • Journal_Title
    Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1094-6977
  • Type

    jour

  • DOI
    10.1109/5326.885118
  • Filename
    885118