DocumentCode
2226409
Title
Querying and clustering Web pages about persons and organizations
Author
Ye, Shiren ; Chua, Tat-Seng ; Kei, Jeremy R.
Author_Institution
Sch. of Comput., Nat. Univ. of Singapore, Singapore
fYear
2003
fDate
13-17 Oct. 2003
Firstpage
344
Lastpage
350
Abstract
One of the most frequent Web surfing tasks is to search for names of persons and organizations. Such names are often not distinctive, commonly occurring, and nonunique. Thus, a single name may be mapped to several entities. We describe a methodology to cluster the Web pages returned by the search engine so that pages belonging to different entities are clustered into different groups. The algorithm uses a combination of named entities, link-based and structure-based information as features to partition the document set into direct and indirect pages using a decision model. It then uses the distinct direct pages as seeds to cluster the document set into different clusters. The algorithm has been found to be effective for Web-based applications.
Keywords
Internet; pattern clustering; query formulation; search engines; Internet; Web page clustering; Web surfing; decision model; query formulation; search engine; statistical analysis; Biographies; Books; Clustering algorithms; Home computing; Internet; Partitioning algorithms; Resumes; Search engines; Tellurium; Web pages;
fLanguage
English
Publisher
ieee
Conference_Titel
Web Intelligence, 2003. WI 2003. Proceedings. IEEE/WIC International Conference on
Print_ISBN
0-7695-1932-6
Type
conf
DOI
10.1109/WI.2003.1241214
Filename
1241214
Link To Document