• DocumentCode
    2226409
  • Title

    Querying and clustering Web pages about persons and organizations

  • Author

    Ye, Shiren ; Chua, Tat-Seng ; Kei, Jeremy R.

  • Author_Institution
    Sch. of Comput., Nat. Univ. of Singapore, Singapore
  • fYear
    2003
  • fDate
    13-17 Oct. 2003
  • Firstpage
    344
  • Lastpage
    350
  • Abstract
    One of the most frequent Web surfing tasks is to search for names of persons and organizations. Such names are often not distinctive, commonly occurring, and nonunique. Thus, a single name may be mapped to several entities. We describe a methodology to cluster the Web pages returned by the search engine so that pages belonging to different entities are clustered into different groups. The algorithm uses a combination of named entities, link-based and structure-based information as features to partition the document set into direct and indirect pages using a decision model. It then uses the distinct direct pages as seeds to cluster the document set into different clusters. The algorithm has been found to be effective for Web-based applications.
  • Keywords
    Internet; pattern clustering; query formulation; search engines; Internet; Web page clustering; Web surfing; decision model; query formulation; search engine; statistical analysis; Biographies; Books; Clustering algorithms; Home computing; Internet; Partitioning algorithms; Resumes; Search engines; Tellurium; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Web Intelligence, 2003. WI 2003. Proceedings. IEEE/WIC International Conference on
  • Print_ISBN
    0-7695-1932-6
  • Type

    conf

  • DOI
    10.1109/WI.2003.1241214
  • Filename
    1241214