DocumentCode :
3243113
Title :
Research on the Methods of Chinese Text Classification using Bayes and Language Model
Author :
Yan, Tao ; Gao, Guang-Lai
Author_Institution :
Coll. of Comput. Sci., Inner Mongolia Univ., Hohhot
fYear :
2008
fDate :
22-24 Oct. 2008
Firstpage :
1
Lastpage :
6
Abstract :
With the increase of information on Internet, how to gain useful information fleetly and effectively becomes an important task, and information automatic classification emerges as the times require. Bayes has been used in many fields as one of the classification methods. This paper applies the classification model which Bayes classifier combines with language model to Chinese text classification. On the Chinese Corpus of FuDan University, our experiments show that the improved classifiers which used the four smoothing methods have better performance than naive Bayes classifier model. In particular with the method Jelinek-Mercer of adopting modified smoothing scale, the performance of classifier improves a lot.
Keywords :
Bayes methods; Internet; classification; natural language processing; text analysis; Chinese text classification; Internet; information automatic classification; language model; naive Bayes classifier; Computer science; Educational institutions; Electronic mail; Internet; Natural languages; Niobium; Smoothing methods; Support vector machine classification; Support vector machines; Text categorization;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 2008. CCPR '08. Chinese Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-2316-3
Type :
conf
DOI :
10.1109/CCPR.2008.88
Filename :
4663041
Link To Document :
بازگشت