• DocumentCode
    3486490
  • Title

    Automatic Chinese Text Classification Using Character-Based and Word-Based Approach

  • Author

    Xi Luo ; Ohyama, Wataru ; Wakabayashi, Tetsushi ; Kimura, Fumitaka

  • Author_Institution
    Grad. Sch. of Eng., Mie Univ. Tsu, Tsu, Japan
  • fYear
    2013
  • fDate
    25-28 Aug. 2013
  • Firstpage
    329
  • Lastpage
    333
  • Abstract
    In this paper, we study on Chinese text classification using character-based approach (N-gram) and word-based approach and propose the use of uni-gram, bi-gram and word features of length greater than or equal to three. A weight coefficient which can be used to give higher weights to word features is also introduced. We further investigate a serial approach based on feature transformation and dimension reduction techniques to improve the performance. Experimental results show that our proposed approach is efficient and effective for improving the performance of Chinese text classification.
  • Keywords
    document image processing; natural language processing; text detection; automatic Chinese text classification; character-based approach; feature transformation; reduction techniques; serial approach; weight coefficient; word-based approach; Eigenvalues and eigenfunctions; Feature extraction; Principal component analysis; Support vector machine classification; Text categorization; Vectors; Chinese Text Classification/Categorization; Dimension Reduction; Feature Transformation; N-gram; Principal Component Analysis; Support Vector Machine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2013.73
  • Filename
    6628638