Title :
An algorithm for selecting Chinese features based on TF-NIDF weight
Author :
Li Yongli ; Liu Yanheng ; Shi Mo ; Dong Liyan ; Li Zhen ; Liu Lixiang ; Yan Pengfei
Author_Institution :
Coll. of Comput. Sci. & Technol., Jilin Univ., Changchun, China
Abstract :
This article discusses the problem of selecting Chinese features based on TF-IDF weight in text categorization. TF-IDF weight is commonly used in text categorization for its simplexes. However, it can not express the relationship between a feature appearance frequency in one class and appearance frequency in other classes. To solve the problem, we designed TF-NIDF weighting method to express the relationship and computer feature weight. We also incorporated the weight into Naïve Bayesian classifier and tested it on Chinese text data. Experiments showed that Naïve Bayesian classifier with features selection based on TF-NIDF weight have a higher categorization precision than Naïve Bayesian classifier with features selection based on traditional TF-IDF weight.
Keywords :
Bayes methods; feature extraction; natural language processing; pattern classification; text analysis; Chinese feature; Naïve Bayesian classifier; TF-NIDF weight; feature selection; text categorization; Automation; Bayesian methods; Computer science; Design methodology; Frequency; Laboratories; Performance evaluation; Testing; Text categorization; Training data; Feature Weight; TF-IDF; TF-NIDF; Text Categorization;
Conference_Titel :
Information and Automation (ICIA), 2010 IEEE International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4244-5701-4
DOI :
10.1109/ICINFA.2010.5512348