DocumentCode :
3265186
Title :
Feature Selection for Classification with Proteomic Data of Mixed Quality
Author :
Marchiori, Elena ; Heegaard, Niels H H ; West-Nielsen, Mikkel ; Jimenez, Connie R.
Author_Institution :
Department of Computer Science Vrije Universiteit Amsterdam The Netherlands, Email: elena@cs.vu.nl
fYear :
2005
fDate :
14-15 Nov. 2005
Firstpage :
1
Lastpage :
7
Abstract :
In this paper we assess experimentally the performance of two state-of-the-art feature selection methods, called RFE and RELIEF, when used for classifying pattern proteomic samples of mixed quality. The data are generated by spiking human sera to artificially create differentiable sample groups, and by handling samples at different storage temperature. We consider two type of classifiers: support vector machines (SVM) and k-nearest neighbour (kNN). Results of leave-one-out cross validation (LOOCV) experiments indicate that RELIEF selects more stable feature subsets than RFE over the runs, where the selected features are mainly spiked ones. However, RFE outperforms RELIEF in terms of (average LOOCV) accuracy, both when combined with SVM and kNN. Perfect LOOCV accuracy is obtained by RFE combined with 1NN. Almost all the samples that are wrongly classified by the algorithms have high storage temperature. The results of experiments on this data indicate that when samples of mixed quality are analyzed computationally, feature selection of only relevant (spiked) features does not necessarily correspond to highest accuracy of classification.
Keywords :
Bioinformatics; Cancer; Computer science; Humans; Laboratories; Machine learning; Proteomics; Support vector machine classification; Support vector machines; Temperature;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence in Bioinformatics and Computational Biology, 2005. CIBCB '05. Proceedings of the 2005 IEEE Symposium on
Print_ISBN :
0-7803-9387-2
Type :
conf
DOI :
10.1109/CIBCB.2005.1594944
Filename :
1594944
Link To Document :
بازگشت