• DocumentCode
    2139074
  • Title

    An Efficient Method of Genetic Algorithm for Text Clustering Based on Singular Value Decomposition

  • Author

    Song, Wei ; Park, Soon Cheol

  • Author_Institution
    Chonbuk Nat. Univ., Jeonju
  • fYear
    2007
  • fDate
    16-19 Oct. 2007
  • Firstpage
    53
  • Lastpage
    58
  • Abstract
    In this paper, we propose a method of genetic algorithm (GA) for text clustering based on singular value decomposition technique. The main difficulty in the application of GA to text clustering is its long string representation in high dimensional space. Because the most straightforward and popular approach represents texts with vector space model (VSM), that is, each unique term in the vocabulary represents one dimension. Singular value decomposition (SVD) is a successful technique arising from numerical linear algebra that is used in latent semantic indexing (LSI). Employing the SVD-based document representation, LSI can overcome the problems by using statistically derived conceptual indices instead of individual words and provide a dimension reduced space. Genetic algorithm belongs to search techniques which could automatically exploit the optimal solution for objective or fitness function of an optimization problem. GA can be used in conjunction with the reduced latent semantic structure and improve clustering efficiency and accuracy. Our algorithm is performed on Reuter documents collection. The results show that the performance of SVD-based GA is significantly superior to that of conventional GA in vector space model.
  • Keywords
    genetic algorithms; pattern clustering; singular value decomposition; text analysis; Reuter document collection; genetic algorithm; latent semantic indexing; numerical linear algebra; optimization problem; singular value decomposition; statistical analysis; string representation; text clustering; vector space model; Clustering algorithms; Computational efficiency; Genetic algorithms; Genetic engineering; Indexing; Information technology; Large scale integration; Singular value decomposition; Vectors; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer and Information Technology, 2007. CIT 2007. 7th IEEE International Conference on
  • Conference_Location
    Aizu-Wakamatsu, Fukushima
  • Print_ISBN
    978-0-7695-2983-7
  • Type

    conf

  • DOI
    10.1109/CIT.2007.197
  • Filename
    4385056