• DocumentCode
    476201
  • Title

    Chinese text categorization based on CCIPCA and SMO

  • Author

    Li, Xin-fu ; He, Hai-bin ; Zhao, Lei-lei

  • Author_Institution
    Coll. of Math. & Comput. Sci., Hebei Univ., Baoding
  • Volume
    5
  • fYear
    2008
  • fDate
    12-15 July 2008
  • Firstpage
    2514
  • Lastpage
    2518
  • Abstract
    Vector space model is usually used to express text for text categorization. How to reduce the dimensionality of feature space is a very key problem for practical text classification. The classical decomposition algorithms are incapable of dealing with the high-dimensional and large-scale text categorization problems. In this paper an approach to improving the performance of text categorization is presented by using candid incremental principal component analysis and sequential minimization optimization algorithm. The experimental result shows that the proposed method for Chinese text categorization is practicable and effective.
  • Keywords
    minimisation; natural language processing; principal component analysis; text analysis; Chinese text categorization; candid incremental principal component analysis; dimensionality reduction; sequential minimization optimization algorithm; text classification; Covariance matrix; Cybernetics; Feature extraction; Frequency; Indexing; Large-scale systems; Machine learning; Minimization methods; Principal component analysis; Text categorization; Candid incremental principal component analysis (CCIPCA); Dimension reduction; Sequential minimization optimization algorithm (SMO); Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Learning and Cybernetics, 2008 International Conference on
  • Conference_Location
    Kunming
  • Print_ISBN
    978-1-4244-2095-7
  • Electronic_ISBN
    978-1-4244-2096-4
  • Type

    conf

  • DOI
    10.1109/ICMLC.2008.4620831
  • Filename
    4620831