• DocumentCode
    1936521
  • Title

    A New Algorithm for Text Clustering Based on Projection Pursuit

  • Author

    Gao, Mao-Ting ; Wang, Zheng-Ou

  • Author_Institution
    Shanghai Maritime Univ., Shanghai
  • Volume
    6
  • fYear
    2007
  • fDate
    19-22 Aug. 2007
  • Firstpage
    3401
  • Lastpage
    3405
  • Abstract
    Vector Space Model ( VSM ) is usually used to express text features in text mining with huge dimension, but it can not show the structure of the text set obviously and costs much in computing. A new pursuit projection based text clustering algorithm is proposed. With minimizing (or maximizing) a projecting index, Projection Pursuit searches for an optimal projection direction and projects text feature vectors from high-dimensional into low-dimensional ( 1 to 3 dimensions ) space. The linear and non-linear structures and features of the original high-dimensional data can be expressed by its projection weights in the optimal projection direction. The optimal projection direction is looked for by genetic algorithm, and the distribution of texts can be visualized. Pursuit projection based text clustering does not need to set cluster number previously like in k-means clustering, and opens out non-linear structure not like in latent semantics analysis only discovering linear structure. Experiments demonstrated that this algorithm is effective to cluster texts.
  • Keywords
    genetic algorithms; pattern clustering; text analysis; vectors; dimension reduction; genetic algorithm; optimal projection direction; projecting index; projection pursuit; text clustering; text feature vectors; Clustering algorithms; Cybernetics; Data mining; Data visualization; Feature extraction; Genetic algorithms; Machine learning; Machine learning algorithms; Pursuit algorithms; Text mining; Dimension reduction; Genetic algorithm; Projection pursuit; Text clustering;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2007 International Conference on
  • Conference_Location
    Hong Kong
  • Print_ISBN
    978-1-4244-0973-0
  • Electronic_ISBN
    978-1-4244-0973-0
  • Type

    conf

  • DOI
    10.1109/ICMLC.2007.4370736
  • Filename
    4370736