Title :
A new feature selection method based on distributional information for Text Classification
Author :
Shi, Nianyun ; Liu, Lingling
Author_Institution :
Coll. of Comput. & Commun. Eng., China Univ. of Pet. (East China), Dongying, China
Abstract :
Feature Selection (FS) is one of the most important issues in Text Classification (TC). A good feature selection can improve the efficiency and accuracy of a text classifier. Based on the analysis of the feature´s distributional information, this paper presents a feature selection method named DIFS. In DIFS a new estimation mechanism is proposed to measure the relevance between feature´s distribution characteristics and contribution to categorization. In addition, two kinds of algorithms are designed to implement DIFS. Experiments are carried out on a Chinese corpus and by comparison the proposed approach shows a better performance.
Keywords :
classification; estimation theory; natural language processing; text analysis; Chinese corpus; DIFS; distributional information; estimation mechanism; feature selection method; text classification; text classifier; Estimation; Text categorization; Distributional Information; Feature Selection (FS); Text Classification(TC);
Conference_Titel :
Progress in Informatics and Computing (PIC), 2010 IEEE International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-6788-4
DOI :
10.1109/PIC.2010.5687404