DocumentCode
2708358
Title
An Efficient Token-based Approach for Web-Snippet Clustering
Author
Li, Jianchao ; Yao, Tianfang
Author_Institution
Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China
fYear
2006
fDate
1-3 Nov. 2006
Firstpage
13
Lastpage
13
Abstract
Online clustering of the results returned by search engines becomes prevailing in recent times. It addresses the problem of too many records returned by current search engines, which renders the manual search of actually desired information difficult, especially if the query encompasses several subtopics. Clustering is a useful technique to group records to clusters and thereby make it more convenient to retrieve information of interest. We first propose an innovative approach by using tokens as basic units for clustering, which avoids segmentation for oriental languages and can be applied to any language. Second, we introduce a Directed Probability Graph (DPG) model that identifies meaningful phrases as cluster labels using statistical methods without any external knowledge. The clustering procedure is performed without calculating the similarity between pair-wise documents. As shown by our experiments, our clustering algorithm is very efficient and suitable for online Web-snippet clustering.
Keywords
Web services; directed graphs; document handling; pattern clustering; probability; query processing; search engines; token networks; DPG; cluster label; clustering algorithm; directed probability graph; group record; information retrieval; innovative approach; manual search rendering; online Web snippet clustering; pairwise document; query processing; search engine; statistical method;
fLanguage
English
Publisher
ieee
Conference_Titel
Semantics, Knowledge and Grid, 2006. SKG '06. Second International Conference on
Conference_Location
Guilin
Print_ISBN
0-7695-2673-X
Type
conf
DOI
10.1109/SKG.2006.21
Filename
5727650
Link To Document