Title :
An Efficient Token-based Approach for Web-Snippet Clustering
Author :
Li, Jianchao ; Yao, Tianfang
Author_Institution :
Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., Shanghai, China
Abstract :
Online clustering of the results returned by search engines becomes prevailing in recent times. It addresses the problem of too many records returned by current search engines, which renders the manual search of actually desired information difficult, especially if the query encompasses several subtopics. Clustering is a useful technique to group records to clusters and thereby make it more convenient to retrieve information of interest. We first propose an innovative approach by using tokens as basic units for clustering, which avoids segmentation for oriental languages and can be applied to any language. Second, we introduce a Directed Probability Graph (DPG) model that identifies meaningful phrases as cluster labels using statistical methods without any external knowledge. The clustering procedure is performed without calculating the similarity between pair-wise documents. As shown by our experiments, our clustering algorithm is very efficient and suitable for online Web-snippet clustering.
Keywords :
Web services; directed graphs; document handling; pattern clustering; probability; query processing; search engines; token networks; DPG; cluster label; clustering algorithm; directed probability graph; group record; information retrieval; innovative approach; manual search rendering; online Web snippet clustering; pairwise document; query processing; search engine; statistical method;
Conference_Titel :
Semantics, Knowledge and Grid, 2006. SKG '06. Second International Conference on
Conference_Location :
Guilin
Print_ISBN :
0-7695-2673-X
DOI :
10.1109/SKG.2006.21