DocumentCode
2773992
Title
Feature Selection for Maximizing the Area Under the ROC Curve
Author
Wang, Rui ; Tang, Ke
Author_Institution
Nature Inspired Comput. & Applic. Lab. (NICAL), Univ. of Sci. & Technol. of China, Hefei, China
fYear
2009
fDate
6-6 Dec. 2009
Firstpage
400
Lastpage
405
Abstract
Feature selection is an important pre-processing step for solving classification problems. A good feature selection method may not only improve the performance of the final classifier, but also reduce the computational complexity of it. Traditionally, feature selection methods were developed to maximize the classification accuracy of a classifier. Recently, both theoretical and experimental studies revealed that a classifier with the highest accuracy might not be ideal in real-world problems. Instead, the Area Under the ROC Curve (AUC) has been suggested as the alternative metric, and many existing learning algorithms have been modified in order to seek the classifier with maximum AUC. However, little work was done to develop new feature selection methods to suit the requirement of AUC maximization. To fill this gap in the literature, we propose in this paper a novel algorithm, called AUC and Rank Correlation coefficient Optimization (ARCO) algorithm. ARCO adopts the general framework of a well-known method, namely minimal redundancy- maximal-relevance (mRMR) criterion, but defines the terms ¿relevance¿ and ¿redundancy¿ in totally different ways. Such a modification looks trivial from the perspective of algorithmic design. Nevertheless, experimental study on four gene expression data sets showed that feature subsets obtained by ARCO resulted in classifiers with significantly larger AUC than the feature subsets obtained by mRMR. Moreover, ARCO also outperformed the Feature Assessment by Sliding Thresholds algorithm, which was recently proposed for AUC maximization, and thus the efficacy of ARCO was validated.
Keywords
data mining; learning (artificial intelligence); ARCO algorithm; area under the ROC curve algorithm; feature selection; learning algorithms; minimal redundancy-maximal-relevance method; rank correlation coefficient optimization; Computational complexity; Computational efficiency; Computer applications; Conferences; Costs; Data mining; Laboratories; Learning systems; Support vector machine classification; Support vector machines;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining Workshops, 2009. ICDMW '09. IEEE International Conference on
Conference_Location
Miami, FL
Print_ISBN
978-1-4244-5384-9
Electronic_ISBN
978-0-7695-3902-7
Type
conf
DOI
10.1109/ICDMW.2009.25
Filename
5360438
Link To Document