DocumentCode
3243113
Title
Research on the Methods of Chinese Text Classification using Bayes and Language Model
Author
Yan, Tao ; Gao, Guang-Lai
Author_Institution
Coll. of Comput. Sci., Inner Mongolia Univ., Hohhot
fYear
2008
fDate
22-24 Oct. 2008
Firstpage
1
Lastpage
6
Abstract
With the increase of information on Internet, how to gain useful information fleetly and effectively becomes an important task, and information automatic classification emerges as the times require. Bayes has been used in many fields as one of the classification methods. This paper applies the classification model which Bayes classifier combines with language model to Chinese text classification. On the Chinese Corpus of FuDan University, our experiments show that the improved classifiers which used the four smoothing methods have better performance than naive Bayes classifier model. In particular with the method Jelinek-Mercer of adopting modified smoothing scale, the performance of classifier improves a lot.
Keywords
Bayes methods; Internet; classification; natural language processing; text analysis; Chinese text classification; Internet; information automatic classification; language model; naive Bayes classifier; Computer science; Educational institutions; Electronic mail; Internet; Natural languages; Niobium; Smoothing methods; Support vector machine classification; Support vector machines; Text categorization;
fLanguage
English
Publisher
ieee
Conference_Titel
Pattern Recognition, 2008. CCPR '08. Chinese Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-2316-3
Type
conf
DOI
10.1109/CCPR.2008.88
Filename
4663041
Link To Document