Title :
Enhanced feature selection for biomarker discovery in LC-MS data using GP
Author :
Ahmed, Shehab ; Mengjie Zhang ; Lifeng Peng
Author_Institution :
Sch. of Eng. & Comput. Sci., Victoria Univ. of Wellington, Wellington, New Zealand
Abstract :
Biomarker detection in LC-MS data depends mainly on feature selection algorithms as the number of features is extremely high while the number of samples is very small. This makes classification of these data sets extremely challenging. In this paper we propose the use of genetic programming (GP) for subset feature selection in LC-MS data which works by maximizing the signal to noise ratio of the selected features by GP. The proposed method was applied to eight LC-MS data sets with different sample sizes and different levels of concentration of the spiked biomarkers. We evaluated the accuracy of selection from the list of biomarkers and also using the classification accuracy of the selected features via the support vector machines (SVMs) and Naive Bayes (NB) classifiers. Features selected by the proposed GP method managed to achieve perfect classification accuracy for most of the data sets. The results show that the proposed method strikes a reasonable compromise between the detection rate of the biomarkers and the classification accuracy for all data sets. The method was also compared to linear Support Vector Machine-Recursive Features Elimination (SVM-RFE) and t-test for feature selection and the results show that the biomarker detection rate of the proposed approach is higher.
Keywords :
Bayes methods; genetic algorithms; medical computing; pattern classification; support vector machines; GP method; LC-MS data; NB; SVM-RFE; biomarker discovery; data sets classification; enhanced feature selection; genetic programming; linear support vector machine-recursive features elimination; naive Bayes classifiers; spiked biomarkers; support vector machines; Accuracy; Compounds; Feature extraction; Niobium; Signal to noise ratio; Sociology; Statistics;
Conference_Titel :
Evolutionary Computation (CEC), 2013 IEEE Congress on
Conference_Location :
Cancun
Print_ISBN :
978-1-4799-0453-2
Electronic_ISBN :
978-1-4799-0452-5
DOI :
10.1109/CEC.2013.6557621