DocumentCode
1936521
Title
A New Algorithm for Text Clustering Based on Projection Pursuit
Author
Gao, Mao-Ting ; Wang, Zheng-Ou
Author_Institution
Shanghai Maritime Univ., Shanghai
Volume
6
fYear
2007
fDate
19-22 Aug. 2007
Firstpage
3401
Lastpage
3405
Abstract
Vector Space Model ( VSM ) is usually used to express text features in text mining with huge dimension, but it can not show the structure of the text set obviously and costs much in computing. A new pursuit projection based text clustering algorithm is proposed. With minimizing (or maximizing) a projecting index, Projection Pursuit searches for an optimal projection direction and projects text feature vectors from high-dimensional into low-dimensional ( 1 to 3 dimensions ) space. The linear and non-linear structures and features of the original high-dimensional data can be expressed by its projection weights in the optimal projection direction. The optimal projection direction is looked for by genetic algorithm, and the distribution of texts can be visualized. Pursuit projection based text clustering does not need to set cluster number previously like in k-means clustering, and opens out non-linear structure not like in latent semantics analysis only discovering linear structure. Experiments demonstrated that this algorithm is effective to cluster texts.
Keywords
genetic algorithms; pattern clustering; text analysis; vectors; dimension reduction; genetic algorithm; optimal projection direction; projecting index; projection pursuit; text clustering; text feature vectors; Clustering algorithms; Cybernetics; Data mining; Data visualization; Feature extraction; Genetic algorithms; Machine learning; Machine learning algorithms; Pursuit algorithms; Text mining; Dimension reduction; Genetic algorithm; Projection pursuit; Text clustering;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Learning and Cybernetics, 2007 International Conference on
Conference_Location
Hong Kong
Print_ISBN
978-1-4244-0973-0
Electronic_ISBN
978-1-4244-0973-0
Type
conf
DOI
10.1109/ICMLC.2007.4370736
Filename
4370736
Link To Document