Title :
Chinese text categorization based on CCIPCA and SMO
Author :
Li, Xin-fu ; He, Hai-bin ; Zhao, Lei-lei
Author_Institution :
Coll. of Math. & Comput. Sci., Hebei Univ., Baoding
Abstract :
Vector space model is usually used to express text for text categorization. How to reduce the dimensionality of feature space is a very key problem for practical text classification. The classical decomposition algorithms are incapable of dealing with the high-dimensional and large-scale text categorization problems. In this paper an approach to improving the performance of text categorization is presented by using candid incremental principal component analysis and sequential minimization optimization algorithm. The experimental result shows that the proposed method for Chinese text categorization is practicable and effective.
Keywords :
minimisation; natural language processing; principal component analysis; text analysis; Chinese text categorization; candid incremental principal component analysis; dimensionality reduction; sequential minimization optimization algorithm; text classification; Covariance matrix; Cybernetics; Feature extraction; Frequency; Indexing; Large-scale systems; Machine learning; Minimization methods; Principal component analysis; Text categorization; Candid incremental principal component analysis (CCIPCA); Dimension reduction; Sequential minimization optimization algorithm (SMO); Text categorization;
Conference_Titel :
Machine Learning and Cybernetics, 2008 International Conference on
Conference_Location :
Kunming
Print_ISBN :
978-1-4244-2095-7
Electronic_ISBN :
978-1-4244-2096-4
DOI :
10.1109/ICMLC.2008.4620831