مرکز منطقه ای اطلاع رساني علوم و فناوري - Research on the Methods of Chinese Text Classification using Bayes and Language Model

DocumentCode :

3243113

Title :

Research on the Methods of Chinese Text Classification using Bayes and Language Model

Author :

Yan, Tao ; Gao, Guang-Lai

Author_Institution :

Coll. of Comput. Sci., Inner Mongolia Univ., Hohhot

fYear :

2008

fDate :

22-24 Oct. 2008

Firstpage :

Lastpage :

Abstract :

With the increase of information on Internet, how to gain useful information fleetly and effectively becomes an important task, and information automatic classification emerges as the times require. Bayes has been used in many fields as one of the classification methods. This paper applies the classification model which Bayes classifier combines with language model to Chinese text classification. On the Chinese Corpus of FuDan University, our experiments show that the improved classifiers which used the four smoothing methods have better performance than naive Bayes classifier model. In particular with the method Jelinek-Mercer of adopting modified smoothing scale, the performance of classifier improves a lot.

Keywords :

Bayes methods; Internet; classification; natural language processing; text analysis; Chinese text classification; Internet; information automatic classification; language model; naive Bayes classifier; Computer science; Educational institutions; Electronic mail; Internet; Natural languages; Niobium; Smoothing methods; Support vector machine classification; Support vector machines; Text categorization;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Pattern Recognition, 2008. CCPR '08. Chinese Conference on

Conference_Location :

Beijing

Print_ISBN :

978-1-4244-2316-3

Type :

conf

DOI :

10.1109/CCPR.2008.88

Filename :

4663041

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3243113