Title :
The New Clustering Strategy and Algorithm Based on Latent Semantic Indexing
Author :
Yan, Bing ; Du, Yajun ; Li, ZhanShen
Author_Institution :
Sch. of Math. & Comput. Sci., Xihua Univ., Chengdu
Abstract :
Currently, the technology of search engine is a hot in IR research. Clustering according to the themes of the search results will be well to help user to find the information. In this paper, the new clustering algorithm, which named MyCluster and based on the phrase and latent semantic indexing, is proposed. The result of MyCluster is composed of class labels and class contents. The class contents is a entry for users getting the information. Each class label corresponding to some class contents. The readability of cluster labels will effect the efficiency of finding a useful information. We adopt a method of singular value decomposition to induce class labels and find class contents, so that the clusters have the characteristic that objects belonging to the same cluster are "similar", while objects from different clusters are "dissimilar". Lastly, we incorporate and sort the clusters. By experiments, our MyCluster has some advantages of the readability of class labels and the relevance of class contents.
Keywords :
indexing; information retrieval; search engines; MyCluster; class contents; class label; clustering algorithm; information retrieval; latent semantic indexing; search engine; singular value decomposition; Clustering algorithms; Computer science; Indexing; Information retrieval; Mathematics; Optical computing; Optical scattering; Search engines; Web pages; Web search; Clustering Strategy; Latent Semantic Indexing; class contents; class label;
Conference_Titel :
Natural Computation, 2008. ICNC '08. Fourth International Conference on
Conference_Location :
Jinan
Print_ISBN :
978-0-7695-3304-9
DOI :
10.1109/ICNC.2008.699