DocumentCode
589124
Title
Evaluation of Feature Ranking Ensembles for High-Dimensional Biomedical Data: A Case Study
Author
Kuncheva, Ludmila I. ; Smith, C.J. ; Syed, Y. ; Phillips, C.O. ; Lewis, K.E.
Author_Institution
Sch. of Comput. Sci., Bangor Univ., Bangor, UK
fYear
2012
fDate
10-10 Dec. 2012
Firstpage
49
Lastpage
56
Abstract
Developing accurate, reliable and easy to use diagnostic tests is based upon identifying a small set of highly discriminative biomarkers. This task can be cast as feature selection within a pattern recognition context. Medical data are usually of the "wide" type where the number of features is substantially larger than the number of instances. With the abundance of feature ranking methods, it is difficult to pick the most suitable one and choose a final consistent feature subset. Ensembles of ranking methods have been recommended for the task but their stability and accuracy have not been evaluated across different ranking methods. Here we present a case study consisting of 429 samples of exhaled air from smokers, 83% of whom suffer from COPD (chronic obstructive pulmonary disease). The task is to identify a discriminative subset of the 1929 volatile organic compounds (VOCs) measured through mass spectrometry. Using Pareto analysis, 16 feature ranking ensembles were evaluated with respect to three criteria: classification accuracy, area under the ROC curve and the stability of the ensemble selection. The t-statistic was rated the best among the 16 feature rankers, outperforming the currently favourite SVM ranker.
Keywords
Pareto analysis; data handling; feature extraction; medical diagnostic computing; pattern classification; COPD; Pareto analysis; VOC; area-under-the ROC curve; chronic obstructive pulmonary disease; classification accuracy; diagnostic tests; discriminative biomarkers; ensemble selection stability; feature ranking ensemble evaluation; feature ranking methods; feature selection; high-dimensional biomedical data; mass spectrometry; pattern recognition context; t-statistic; volatile organic compounds; Accuracy; Educational institutions; Indexes; Stability criteria; Support vector machines; Vegetation; COPD; Feature selection; classifier ensembles; feature ranking; stability index;
fLanguage
English
Publisher
ieee
Conference_Titel
Data Mining Workshops (ICDMW), 2012 IEEE 12th International Conference on
Conference_Location
Brussels
Print_ISBN
978-1-4673-5164-5
Type
conf
DOI
10.1109/ICDMW.2012.12
Filename
6406422
Link To Document