Title :
Combining Multiple Feature Selection Methods for Text Categorization by Using Rank-Score Characteristics
Author :
Li, Yanjun ; Hsu, D. Frank ; Chung, Soon M.
Author_Institution :
Dept. of Comput. & Inf. Sci., Fordham Univ., Bronx, NY, USA
Abstract :
Feature selection is an important method for improving the efficiency and accuracy of text categorization algorithms by removing redundant and irrelevant terms from the corpus.Extensive researches have been done to improve the performance of individual feature selection methods, but not much on their combinations.In this paper, we propose a method of combining multiple feature selection methods by using the combinatorial fusion analysis (CFA). A rank-score function and its graph, called rank-score graph,are adopted to measure the diversity of different feature selection methods.We have shown that a combination of multiple feature selection methods can outperform a single method only if each individual feature selection method has unique scoring behavior and relatively high performance. Moreover, it is shown that the rank-score function and rank-score graph are useful for the selection of a combination of feature selection methods.
Keywords :
data mining; graph theory; text analysis; combinatorial fusion analysis; multiple feature selection methods; rank-score graph; text categorization; Artificial intelligence; Computer science; Diversity reception; Frequency estimation; Functional analysis; Information science; Mutual information; Text categorization; Text mining; USA Councils; Feature selection; combinatorial fusion analysis (CFA); rank combination; rank-score function; score combination; text categorization;
Conference_Titel :
Tools with Artificial Intelligence, 2009. ICTAI '09. 21st International Conference on
Conference_Location :
Newark, NJ
Print_ISBN :
978-1-4244-5619-2
Electronic_ISBN :
1082-3409
DOI :
10.1109/ICTAI.2009.129