Title :
Analysis of Web Clustering Based on Genetic Algorithm with Latent Semantic Indexing Technology
Author :
Song, Wei ; Park, Soon Cheol
Abstract :
This paper constructed a latent semantic text model using genetic algorithm (GA) for web clustering. The main difficulty in the application of GA for text clustering is thousands or even tens of thousands of dimensions in the feature space. Latent semantic indexing (LSI) is a successful technology which attempts to explore the latent semantics structure in textual data, and furthermore, it reduces this large space to smaller one and provides a robust space for clustering. GA belongs to search techniques that efficiently evolve the optimal solution for the problem. Evolved in the reduced latent semantic indexing model, GA can improve clustering accuracy and speed which is typically suitable for real time clustering. We used SSTRESS criteria to analyze the dissimilarity between original term-by-document corpus matrix and the approximate decomposition matrix with different ranks corresponding to the performance of our algorithm evolved in the reduced space. The superiority of GA applied in LSI model over K-means and conventional GA in the vector space model (VSM) is demonstrated by providing good Reuter text clustering results.
Keywords :
Algorithm design and analysis; Clustering algorithms; Computational efficiency; Genetic algorithms; Indexing; Knowledge management; Large scale integration; Matrix decomposition; Performance analysis; Space technology; Web ClusteringGenetic AlgorithmLatent Semantic Indexing;
Conference_Titel :
Advanced Language Processing and Web Information Technology, 2007. ALPIT 2007. Sixth International Conference on
Conference_Location :
Luoyang, Henan, China
Print_ISBN :
978-0-7695-2930-1
DOI :
10.1109/ALPIT.2007.77