DocumentCode
615260
Title
Improved mutual information method for text feature selection
Author
Ding Xiaoming ; Tang Yan
Author_Institution
Coll. of Comput. & Inf. Sci., Southwest Univ., Chongqing, China
fYear
2013
fDate
26-28 April 2013
Firstpage
163
Lastpage
166
Abstract
Reducing the dimensions of high-dimensional feature set is one of the difficulties of text categorization. Feature selection has been effectively applied in text classification, because of its low complexity of computing. Research works show that mutual information is a good feature selection method but doesn´t consider the term frequency in each category of the corpus and the connections between terms. To remedying the defects of traditional mutual information method, this article improved measure of mutual information by introducing the feature frequency in class and the dispersion of feature in class, and built a experimental platform by constructing a Chinese text classification system, and did a multi-set of experiments base on this system. The results show that the new feature selection approach has a more excellent effect in text categorization.
Keywords
computational complexity; feature extraction; natural language processing; pattern classification; text analysis; Chinese text classification system; computing complexity; corpus category; feature frequency; high-dimensional feature set dimension reduction; improved mutual information method; text categorization; text classification; text feature selection approach; Art; Complexity theory; Computers; Text categorization; feature selection; mutual information; text classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science & Education (ICCSE), 2013 8th International Conference on
Conference_Location
Colombo
Print_ISBN
978-1-4673-4464-7
Type
conf
DOI
10.1109/ICCSE.2013.6553903
Filename
6553903
Link To Document