Title :
A Combined Feature Selection Method for Chinese Text Categorization
Author :
Zhang, Xiang ; Zhou, Mingquan ; Geng, Guohua ; Ye, Na
Author_Institution :
Coll. of Inf. Sci. & Technol., Northwest Univ., Xi´´an, China
Abstract :
Feature selection is an important application in the field of Chinese text categorization. However, the traditional Chinese feature selection methods are based on conditional independence assumption; therefore there are many redundancies in feature subsets. In this paper a combined feature selection method of Chinese text is proposed and this method is designed by the regularized mutual information (RMI) and distribute information among classes (DI). It takes two steps to execute feature selection. In the first step, Distribute Information algorithm is used to remove features which are irrelevant of text category and redundant features are eliminated by regularized mutual information in the second step. The experimental results show that this combined feature selection method can improve the quality of classification.
Keywords :
information retrieval; learning (artificial intelligence); text analysis; Chinese feature selection methods; Chinese text categorization; combined feature selection method; distribute information algorithm; regularized mutual information; Control engineering; Design methodology; Educational institutions; Electronic mail; Frequency; Indexing; Information filtering; Information science; Mutual information; Text categorization;
Conference_Titel :
Information Engineering and Computer Science, 2009. ICIECS 2009. International Conference on
Conference_Location :
Wuhan
Print_ISBN :
978-1-4244-4994-1
DOI :
10.1109/ICIECS.2009.5363464