• DocumentCode
    3243113
  • Title

    Research on the Methods of Chinese Text Classification using Bayes and Language Model

  • Author

    Yan, Tao ; Gao, Guang-Lai

  • Author_Institution
    Coll. of Comput. Sci., Inner Mongolia Univ., Hohhot
  • fYear
    2008
  • fDate
    22-24 Oct. 2008
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    With the increase of information on Internet, how to gain useful information fleetly and effectively becomes an important task, and information automatic classification emerges as the times require. Bayes has been used in many fields as one of the classification methods. This paper applies the classification model which Bayes classifier combines with language model to Chinese text classification. On the Chinese Corpus of FuDan University, our experiments show that the improved classifiers which used the four smoothing methods have better performance than naive Bayes classifier model. In particular with the method Jelinek-Mercer of adopting modified smoothing scale, the performance of classifier improves a lot.
  • Keywords
    Bayes methods; Internet; classification; natural language processing; text analysis; Chinese text classification; Internet; information automatic classification; language model; naive Bayes classifier; Computer science; Educational institutions; Electronic mail; Internet; Natural languages; Niobium; Smoothing methods; Support vector machine classification; Support vector machines; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Pattern Recognition, 2008. CCPR '08. Chinese Conference on
  • Conference_Location
    Beijing
  • Print_ISBN
    978-1-4244-2316-3
  • Type

    conf

  • DOI
    10.1109/CCPR.2008.88
  • Filename
    4663041