Title : 
A Study on Automatic Chinese Text Classification
         
        
            Author : 
Luo, Xi ; Ohyama, Wataru ; Wakabayashi, Tetsushi ; Kimura, Fumitaka
         
        
            Author_Institution : 
Grad. Sch. of Eng., Mie Univ., Tsu, Japan
         
        
        
        
        
        
            Abstract : 
In this paper, we perform Chinese text classification using N-gram (uni-gram, bi-gram and mixed uni-gram/bi-gram) frequency feature instead of word frequency feature to represent documents and propose the use of mixed uni-gram/bi-gram after feature transformation. We further propose a serial approach based on feature transformation and dimension reduction techniques to improve the performance. Experimental results show that our proposed approach is efficient and effective for improving the performance of Chinese text classification. Furthermore, we present several experiments evaluating the selection of features based on part-of-speech analysis and the results show that suitable combination of part-of-speech can lead to better classification performance.
         
        
            Keywords : 
classification; grammars; natural language processing; text analysis; N-gram frequency feature; ]part-of-speech analysis; automatic Chinese text classification; bi-gram; classification performance; dimension reduction techniques; document representation; feature transformation; uni-gram; word frequency; Kernel; Machine learning; Principal component analysis; Support vector machine classification; Text categorization; Vectors; Chinese text classification/categorization; N-gram; dimension reduction; feature selection; part-of-speech; principal component analysis; support vector machines;
         
        
        
        
            Conference_Titel : 
Document Analysis and Recognition (ICDAR), 2011 International Conference on
         
        
            Conference_Location : 
Beijing
         
        
        
            Print_ISBN : 
978-1-4577-1350-7
         
        
            Electronic_ISBN : 
1520-5363
         
        
        
            DOI : 
10.1109/ICDAR.2011.187