Title :
Multiple SVM-RFE for gene selection in cancer classification with expression data
Author :
Duan, Kai-Bo ; Rajapakse, Jagath C. ; Wang, Haiying ; Azuaje, Francisco
Author_Institution :
BioInformatics Res. Centre, Nanyang Technol. Univ., Singapore
Abstract :
This paper proposes a new feature selection method that uses a backward elimination procedure similar to that implemented in support vector machine recursive feature elimination (SVM-RFE). Unlike the SVM-RFE method, at each step, the proposed approach computes the feature ranking score from a statistical analysis of weight vectors of multiple linear SVMs trained on subsamples of the original training data. We tested the proposed method on four gene expression datasets for cancer classification. The results show that the proposed feature selection method selects better gene subsets than the original SVM-RFE and improves the classification accuracy. A Gene Ontology-based similarity assessment indicates that the selected subsets are functionally diverse, further validating our gene selection method. This investigation also suggests that, for gene expression-based cancer classification, average test error from multiple partitions of training and test sets can be recommended as a reference of performance quality.
Keywords :
cancer; genetics; medical diagnostic computing; molecular biophysics; statistical analysis; support vector machines; cancer classification; feature ranking score; feature selection; gene expression; gene ontology-based similarity assessment; gene selection; multiple support vector machine recursive feature elimination; statistical analysis; Biology computing; Cancer; Educational technology; Gene expression; Ontologies; Statistical analysis; Support vector machine classification; Support vector machines; Testing; Training data; Cancer classification; feature selection; gene expression; gene ontology; semantic similarity; support vector machine recursive feature elimination (SVM-RFE); Algorithms; Artificial Intelligence; Databases, Protein; Diagnosis, Computer-Assisted; Gene Expression Profiling; Humans; Neoplasm Proteins; Neoplasms; Oligonucleotide Array Sequence Analysis; Pattern Recognition, Automated; Reproducibility of Results; Sensitivity and Specificity; Tumor Markers, Biological;
Journal_Title :
NanoBioscience, IEEE Transactions on
DOI :
10.1109/TNB.2005.853657